Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

append from tab delimited - intersting effect (or bug)

Status
Not open for further replies.

bs6600

Programmer
Sep 23, 2005
57
GB
Using append from myfile delimited with tab

I have found that if a cell contains "X"Y I get two fields filled with "X" then Y

this moves all subsequent fields right by one with sad side effects.

I don't suppose there's a fix for this and I've written my own unpack so this is just a 'be aware'

Before you ask, I think csv has similar problems with embedded quote marks but I can't use csv so didn't explore further.
 
Well, there's a possible misunderstanding that's ver command about "delimited with tab" meaning delimiting of fields with tabs. It's onlöy half true. The help says specifically about DELIMITED WITH TAB that it appends fields separated by tabs rather than commas.

The delimited is all about string delimiting, though, and the tab is a field separator, not a delimiter. The overall help documentation is confusing these t things also in the description of the clause you should instead use to specify the separator: WITJH CHARACTER.

There is a problem though with the functioning of specifing the delimiter and separator character not ass strings but as direct unquoted characters. So for example you could let VFP append texts delimited by underscore and separated by tab with
Code:
APPEND FROM some.txt DELIMITED WITH _ WITH CHARACTER TAB

So as a demo:
Code:
Strtofile(["x"y]+Chr(9)+[z],"tabseparated.txt")
Create Cursor Import (xy c(10), z c(10))
Append From tabseparated.txt Delimited With _ With Character Tab
Browse

A problem I have no solution for is that you can't turn off the delimited feature, looking for a string delimiter that indirectly also becomes a field separator because append does not store more than one string into a field and so changes to the next field even without a separator tab after encountering the end delimiter. To be clear that means while "x"y -tab-> z now only separates by tab, an input line _x_y-tab->z would still be separated into 3 values and the more destructive effect of a single underscore being interpreted as the rest of the line being one string missing the second _ as its end delimiter.

The best you can do is specifying any unexpected character as the delimiter (string delimiter) character used and avoid the indirect value seaparations that way, I don't see a way of turning off the delimited string part of the parsing. Or you reformat the input texts to the more commonly used CSV - comma separated values. Well, as afterthough: CSV also has the delimited feature included.

Part of problem of the documentation therefore also is the mangling of meanings of delimiter and separator, tab delimited file is also a commonly used term actually meaning tab separated. There is TSV, rarley in use, too - tab separated values.
 
Last edited:
Hi,

Delimited has more options: delimited with ... and delimited with character ... Please have a look at the Help file or see below (From Hacker's Guide to VFP 7)

DELIMITED is for files that contain data with delimiters and separators. There's a lot of confusion about those two, particularly since the command itself mixes them up. Delimiters are characters that surround the data in a field. For example, the word "snazzlefritz" is delimited by quote marks in this sentence. Separators come between two data items. For example, the following list of colors uses commas as separators: "chartreuse, lavender, fuchsia, taupe, teal." A delimited file normally contains both delimiters and separators. Here's an example (generated from the Labels.DBF that comes with VFP):

"DATAW","LABELLYT","Avery 4143",F,4869,/ /
"DATAW","LABELLYT","Avery 4144",F,39266,/ /
"DATAW","LABELLYT","Avery 4145",F,24620,/ /
"DATAW","LABELLYT","Avery 4146",F,32961,/ /

Each character field is surrounded by quotes (the delimiters), and fields are separated by commas. This is the default format for a delimited file.

TYPE DELIMITED can handle several other options. DELIMITED WITH BLANK and DELIMITED WITH TAB use quotes for delimiters, and fields are separated by spaces or tabs. DELIMITED WITH DELIMITER lets you specify the delimiter—fields are separated by commas.

VFP 5 introduced the DELIMITED WITH CHARACTER clause that lets you specify the separator. So DELIMITED WITH CHARACTER ! means that there's an exclamation point between each pair of fields. We can't see why they couldn't have improved the situation here by giving this option a useful name like SEPARATED BY. Nonetheless, we're very grateful to have this option because it increases the number of files we can handle without having to break out the low-level file functions.

You can combine DELIMITED WITH and DELIMITED WITH CHARACTER to specify both the delimiter and the separator.

TYPE CSV ("comma-separated values") is like DELIMITED, except VFP assumes the first line in the file contains field names, so it skips that line.

On the whole, the choices are quite a mess. If you're still confused by this, we suggest you do what we did to figure all this out. Try COPY TO with each option on a small data sample.

hth

MarK
 
Writing your own cleanup before appending is definitely a good idea because just about every file type (TXT, SDF, TAB, CSV, XLS) has some challenges.

So, I'll add to your list of things to look out for by saying you should also be careful of formatting of individual columns, especially if the original file came from a spreadsheet.

For example, numeric columns formatted as currency may come through as zero unless you remove formatting in advance. You'd think that would be assumed, but things like dollar signs terminate the value to zero.

Also, when importing from certain versions of Excel, you also need to be careful of columns with dates and times. In some versions of Excel, dates are an offset of the number of days since 1/1/1900 and times are stored in decimals, so a datetime will look something like 39278.456777, where the 29278 is the number of days since 1/1/1900 and the decimal part gives you the time.

Making matters worse, they have a bug because 1900 was not a leap year, so if you write a function to convert the number back to a date, it's only accurate converting dates after 3/1/1900 unless you implicitly add code to handle the bug.

It's always bothered me that they never quite refined the import and append from features, and that they also didn't include native support for XLSX imports or exports, even though that format was introduced in 2007 two years before the last update to VFP.
 
While you mention Excel as a source of problems, it can also be a solution. Excel is better at importing text files into a sheet and while an import from XLSX is missing, you can use tools from Vilhelm-Ion Praisach, see http://praisachion.blogspot.com/2017/08/importfromxlsx-40.html

Using text files for data exchange/export/import always has its pitfalls.

VFP lacks good delimited/separated values parsing in several aspects, especially in the string delimiter feature that was tripping you up. Another example is line feeds within a string causing VFP to take that as record separator:

Code:
StrTofile(""+Chr(13)+Chr(10),"commaseparated.txt",.f.)
StrTofile(["a,b,c",1,2,3,"x,y,z"]+Chr(13)+Chr(10),"commaseparated.txt",.t.)
StrTofile(["d,]+Chr(13)+Chr(10)+[e,]+Chr(13)+Chr(10)+[f",4,5,6,"g,h,i"],"commaseparated.txt",.t.)
Create Cursor import (c1 C(20),i1 i,i2 i,i3 i,c2 C(10))
Append From commaseparated.txt Type csv
Browse
The text file looks like this:
"a,b,c",1,2,3,"x,y,z"
"d,
e,
f",4,5,6,"g,h,i"

The first empty line is intended and caters for the fact that type csv append assumes line 1 is headings, not data, it's ignored so can also be empty.
From the import cursor you can see the simple case of commas within string delimiters not separating fields is processed correctly, but if it comes to multiline values as
which, delimited by quotes, should be read in as a single value, VFP reads "d," and "e," in separate records with all further fields zero/empty and when reading in f" includes the delimiter in the imported value, too.

Import that with Excel as csv and you get two rows, unless you mess with Excel settings.

As you explicitly want quotes to be ignored as string delimiters, that's also tricky for Excel, but at the same time not a rare case, like items from a computer hardware store using double quotes for specifing inches as in 21" monitor. 3.5'" floppy disk, etc. So I assume it's possible to configure Exel to import csv or also tab separated values correctly. And then you can go from a spreadsheet (range) to an array or cursor by Excel automation or importfromxlsx.
 
Last edited:
I've used quite a few XLSX import and export hacks over the years, but they all still have some quirks, such as some approaches returning columns as Memos by default making it tough to browse the values without casting each column implicitly, but at least the data is there.

Another big reason I wish Microsoft updated the Import / Export features is the fact that older XLS files have a limit of only 65,000 rows.
 
This is the result of the import of Chris' sample by the CSVProcessor class (the contents of Col_1 is messageboxed because the browse window represents it as a single line):

1731479873431.png
 
Nice, atlopes.

What bs6600 needs, though, is a pasrsing that takes quotes as part of a value, not as a delimiter. As his text files are tab separated the issue with tabs within values comes up, just like delimiters (quoted text) solve enabling commas within values AND linefeeds within values, tabs within values would need any other delimiter or his case is not using any delimiters and therefore not allwing multiline values or values with tabs in them.
 
Last edited:
Nice, atlopes.

What bs6600 needs, though, is a pasrsing that takes quotes as part of a value, not as a delimiter. As his text files are tab separated the issue with tabs within values comes up, just like delimiters (quoted text) solve enabling commas within values AND linefeeds within values, tabs within values would need any other delimiter or his case is not using any delimiters and therefore not allwing multiline values or values with tabs in them.

Thank you for pointing that out, Chris.

Setting the CSVProcessor ValueDelimiter property to an empty string will treat "X"Y as a single value. The "generally accepted" practice (something that will be open to discussion, of course) to insert double quotes into "-delimited values is to duplicate them, such as in """X""Y", or "21""" monitor".
 
I did know doubling the " is a thing within T-SQL, but I didn't know it's a CSV convention, too, thanks for that. The problem bs6600 then has is a file not following that convention, take aside it's not CSV but TSV. As in so many cases, you can't always expect exports to work by all conventions, so you have to work out importing from files with such problems in them, as you rarely have a good way to encourage a better input file, if only for diplomatic reasons.

Therefore the ValueDelimer is a fine feature including the possibilty to have an empty or no delimiter. I still haven't looked in detail into your project, I assume you can also set the field separator, likely in a FieldSeparator property, can you?
 
Last edited:
Wow!

Thanks guys. I'll enjoy reading all above slowly.

I wrote my own importer with fgets and unpacking anything up to at(chr(09),...)

The data comes from a non technical person who maintains a calendar in Google Sheets (because it's available and shared and our club can't afford to buy a shared Excel) so the only download is to TSV. (not CSV because he might well use commas). so I have to take what I get. I turn the tsv into a calling notice, a web page and a Google calendar to suit our members' varying tastes.

You seem to have a little dialogue going so I won't close this thread.

PS I'm 85, programming since 1964 and problems like this keep me alive - thanks for that!
 
Good, bs6600.

So one solution could be using atlopes library. Also you'll see when you read through all this changing to csv is not impossible with commas in the values.
 
Therefore the ValueDelimer is a fine feature including the possibilty to have an empty or no delimiter. I still haven't looked in detail into your project, I assume you can also set the field separator, likely in a FieldSeparator property, can you?

Yes, that would be a ValueSeparator, defaulting to ",".
 
Thank you all.

I have made my own importer (fgets and at(chr9),..) but for the record:-



Code:
APPEND FROM ? type DELIMITED WITH tab              &&.tsv X"Y <tab> "A"B 

APPEND FROM ? type delim                                        && .csv "X"Y <comma> "A"B 

APPEND FROM ? DELIMITED WITH _ WITH CHARACTER TAB         &&.tsv X"Y <tab> "A"B

Produces:-
with tab X *** Y *** A *** B ***
with csv X *** Y *** A *** B ***
with char "X"Y *** "A"B *** *** ***.

I don't understand the third option but it works.

The tab/comma is a red herring. I originally said "X"Y which has no tab or comma but produces two fields with either append.
 
you can't turn off the delimited feature, looking for a string delimiter that indirectly also becomes a field separator because append does not store more than one string into a field and so changes to the next field even without a separator tab after encountering the end delimiter.
(from my first reply)

Indeed it doesn't matter whether the field separator is comma or tab in this case, the problem is that the double quote is taken as delimiter of the actual value X. So "X"Y is interpreted as "X"<separator>Y, as if there is a separator (tab or comma, doesn't matter) between the end delimiter of "X" and the y.

Any character alternative delimiter than _ that's not in the file would also work, but in the end always could become problematic when that character actually appears in future data.

So your best solution is using atlopes CSVProcessor and turn off the delimiter by setting its ValueDelimiter empty instead of " or _ or anything else. The separator then can be whatever is used in the files you get.

The rest of the discussion between atlopes and me was about having the improvement from VFPs APPEND when using a delimiter to parse it correctly, including cases where not only the separator is within a value but even a linefeed, the "record separator".

REad the Hackers Guide quote in MarKs post, which welcomes the WITH CHARACTER clause and says it would be easier to understand if it would be called SEPARATED BY, you specify the separotor here. whereas purely using DELIMITED WITH TAB is stating both DELIMITED (in the usual way with double quotes) and SEPERATED WITH TABS, and the delimited feature is what's the showstopper for vlaues containing the delimiter.
 
Last edited:
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top