Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing .xls and .csv files

Status
Not open for further replies.

0x4B47

Programmer
Jun 22, 2003
60
US
Hi All,

I was wondering if someone could help me with this problem I have. Here is the scenario:

I have developed a Java application which allows users to upload two file types .xls and .csv. The files contain raw data. In a .csv a comma delimeter is used to determine the end of one field and the beginning of another. So for example:

12.78, sometxt,,, "123 street, city, state, zip", 13.45, "lastname, firstname",,,"company name, inc.",,

This csv file has to be converted to a tilde delimited file for another application so the result should be:

12.78~ sometxt~~~ "123 street, city, state, zip"~ 13.45~ "lastname, firstname"~~~"company name, inc."~~

The application has to verify that all the fields exist in the file, and it does this by counting the comma separators. If the criteria is not met, the file is rejected. For example if 10 fields are expected then 10 commas should be present.

Now heres the bit I'm having trouble with. When I'm doing the count for commas, how do I ignore the commas between the quotes? Also, how do I ignore the commas between the quotes when I'm replacing the commas outside the quotes with tilde (~)?
Is there a nice efficient way to do this? Is there a common algorithm already out there that helps with this process?

Any feedback, questions, suggestions, constructive criticisms, alternatives are extremely welcome.

Please help.

Thanks in advance.

KG

01001011 01000111
 
You'll probably need to do it in 3 steps:
1. Replace (\".*)(,)(.*\") (in regular expression notation) with something bizarre, say, "|".
2. Replace the commas with tildes
3. Replace the bizarre characters (|) with commas

_________________
Bob Rashkin
 
As shot in the dark, but can I do something like this? and is it efficient?

Code:
public boolean isQuote = false;
public int sizeOfFile = GetsizeOfFile( ) ;

// hold the file chars in some array and loop thru it

     for(int i=0; i<FileArray[sizeOfFile]; i++)
     { 
        if( FileArray[i] == ' " ')
           isQuote = true;
             
        if( isQuote )  // skip everything between quotes
        {
           do
           {
               i++;
           }while( FileArray[i] != ' " ')
                   
           i++;
           isQuote = False;
        }
        if (FileArray[i] == ' , ')
           FileArray[i] = ' ~ ';
    }

Thanks


01001011 01000111
 
I suppose that would work but it seems convoluted to me.
You can find all the "s with indexOf:
public int indexOf(String str,
int fromIndex)

Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

k >= Math.min(fromIndex, str.length()) && this.startsWith(str, k)


If no such value of k exists, then -1 is returned.

Parameters:
str - the substring for which to search.
fromIndex - the index from which to start the search.
Returns:
the index within this string of the first occurrence of the specified substring, starting at the specified index.
Then you can substring the parts around those indices, then replace the commas on the appropriate substrings, then put them back together (ss1 + ss2 + ss3...).

_________________
Bob Rashkin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top