Hi,
I have a problem that surpasses my knowledge of regular expressions. Can somehow help me to figure out the logic to solve this problem?
I am downloading a CSV file from a supplier which is "dirty", that is, it has quotes and commas within fields and has line breaks where there should be html tags. I've asked them to fix it, but they haven't, so I'm stuck with trying to fix it myself. Let me give an example record or two:
"SKU","Name","Description","Price","Quantity","Category"
"1234","Clock Radio","This is a clock radio, it measures 4" x 3" x 2"
Features:
Glows in the dark
Tells the time
Screws up my CSV parsing","$12.99","56","Clocks"
For this to work properly, each record should be on one line with comma seperated, double quoted fields (which contain no quotes or commas). I need to do three things:
1. Delete any commas in the descriptions
2. Change all " in the fields to "
3. Change all the \n in fields to <BR>
For 2, I think that I want to get rid of any quotes that are not of the format: ","
I don't know if that's the best way to go, but I think that every record is formatted that way. Any help or advice would be greatly appreciated. If it were up to me, we'd just find a new supplier!
I have a problem that surpasses my knowledge of regular expressions. Can somehow help me to figure out the logic to solve this problem?
I am downloading a CSV file from a supplier which is "dirty", that is, it has quotes and commas within fields and has line breaks where there should be html tags. I've asked them to fix it, but they haven't, so I'm stuck with trying to fix it myself. Let me give an example record or two:
"SKU","Name","Description","Price","Quantity","Category"
"1234","Clock Radio","This is a clock radio, it measures 4" x 3" x 2"
Features:
Glows in the dark
Tells the time
Screws up my CSV parsing","$12.99","56","Clocks"
For this to work properly, each record should be on one line with comma seperated, double quoted fields (which contain no quotes or commas). I need to do three things:
1. Delete any commas in the descriptions
2. Change all " in the fields to "
3. Change all the \n in fields to <BR>
For 2, I think that I want to get rid of any quotes that are not of the format: ","
I don't know if that's the best way to go, but I think that every record is formatted that way. Any help or advice would be greatly appreciated. If it were up to me, we'd just find a new supplier!