Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing a memo field

Status
Not open for further replies.

OldtTimerDon

Programmer
Oct 6, 2012
34
US
I have been away from VFP too long (11 years). I have a comma separated memo field that contains data like (Baltimore, Boston, Denver). Some records contain only three items, other can contain up to 12.
I want to create a table with one record for each location.
It should be simple but I'm baffled. I have cancer and the brain doesn't function like it used to.
Many thanks for your help.
 
You can parse a string in many ways:
- ALINES()
- GETWORDCOUNT() + GETWORDNUM()
- SUBSTR()
- STREXTRACT()
etc.

Code:
CREATE CURSOR cities (city M)
INSERT INTO cities VALUES ('Baltimore, Boston, Denver')
INSERT INTO cities VALUES ('New York, Berlin, Paris, Roma, Madrid')
INSERT INTO cities VALUES ('Boston, Berlin, London')
INSERT INTO cities VALUES ('Buoenos Aires')

* GETWORD
CREATE CURSOR city (city C(100))
SELECT MAX(GETWORDCOUNT(city,',')) as maxno FROM cities INTO CURSOR cno

FOR lni=1 TO cno.maxno
	INSERT INTO city SELECT CAST(GETWORDNUM(ALLTRIM(city),m.lni,',') as C(100)) FROM cities
NEXT

SELECT distinct ALLTRIM(city) FROM city WHERE !EMPTY(city) INTO CURSOR city READWRITE
SELECT city
BROWSE

* ALINES
CREATE CURSOR city (city C(100))
SELECT cities
SCAN
	lnCity=ALINES(laCity,city,1+4,',')
	FOR lni=1 TO lnCity
		INSERT INTO city VALUES (m.laCity[m.lni])
	NEXT
ENDSCAN

SELECT distinct city FROM city WHERE !EMPTY(city) INTO CURSOR city READWRITE
SELECT city
BROWSE

* STREXTRACT
CREATE CURSOR city (city C(100))
SELECT MAX(OCCURS(',',city)) as maxno FROM cities INTO CURSOR cno

INSERT INTO city SELECT CAST(STREXTRACT(city,'',',') as C(100)) FROM cities 
INSERT INTO city SELECT city FROM cities WHERE AT(',',city)=0
FOR lni=1 TO cno.maxno-1
	INSERT INTO city ;
		SELECT CAST(STREXTRACT(city,',',',',m.lni) as C(100)) FROM cities union ;
		SELECT CAST(STREXTRACT(city,',','',m.lni) as C(100)) FROM cities WHERE AT(',',STREXTRACT(city,',','',m.lni))=0
NEXT
INSERT INTO city SELECT CAST(STREXTRACT(city,',','',cno.maxno) as C(100)) FROM cities

SELECT distinct ALLTRIM(city) FROM city WHERE !EMPTY(city) INTO CURSOR city READWRITE
SELECT city
BROWSE

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
I Don,

I hope you're doing as well as you can in your condition. You have introduced yourself in June 2013, come here seldom, but you're remembered and always welcome.

I would use the ALINES variant in a simpler way to just append the array redimensioned two dimensional:
Code:
* ALINES
CREATE CURSOR city (city C(100))
SELECT cities
SCAN
   lnCity=ALINES(laCity,city,1+4,',')
[COLOR=red]   Dimension laCity[lnCity,1]
   Select city
   Append From Array laCity[/color]
ENDSCAN

This is just the first step of normalisation, building up a cities table. Next you'd create a table for n:m relation of the old data to the single cities. This means you won't split up a single record to many, you'd only split up the different cities. So if your old table is named OLDTABLE you'd split that into
OLDTABLE (minus the memo field): (someid,other fields,cities memo)
CITIES (all distinct cities found in all memos) (cityid, city)
OLDTABLELOCATIONS (someid,cityid) relating OLDTABLE DATA with cities, here both someid and cityid are foreign keys relating to OLDTABLE and CITIES.

Later you'd join this data by
Code:
SELECT OLDTABLE.*,CITIES.CITY FROM OLDTABLE ot 
  LEFT JOIN OLDTABLELOCATIONS otl on ot.someid=otl.someid 
  LEFT JOIN CITIES ct on otl.cityid = ct.cityid

And just another tip, if you'd not have ALINES to process a comma separated list, functions like STREXTRACT needing delimiters can easier process a list with added delimiters ','+list+','. So you modify a list with a prepended and an appended comma, to avoid the special case with only a trailing comma (first item) or only a leading comma (last item).

Bye, Olaf.
 
Olaf Doschke said:
use the ALINES variant in a simpler way
Good choice. This is even more simpler.
Code:
* ALINES
CREATE CURSOR city (city C(100))
SELECT cities
SCAN
	lnCity=ALINES(laCity,city,1+4,',')
	DIMENSION laCity[m.lnCity,1]
	INSERT INTO city FROM ARRAY laCity
ENDSCAN

Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
Don,

I can't add anything to the excellent answers you've already received, but I would like to add my welcome to you. We remember you well from when you were active here a while ago, and it's good to see you back.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
It is known that many DBMS allow to insert a set of values with a single INSERT, like
insert into aa values (val1,val2),(val3,val4),(val5,val6)
The most important thing is that is quicker than three INSERT
insert into aa values (val1,val2)
insert into aa values (val3,val4)
insert into aa values (val5,val6)

I made some tests because I wonder if isn't quicker in VFP to fill an array with the desired values and populate the cursor in a single INSERT from ARRAY command.
I guess Insert / Append from ARRAY are slow commands.
I made only a few tests, so I cannot be sure.
I do not pretend I have know this behavior until now. It was pure curiosity. Also I can't explain.
On the other hand, for a single INSERT with a reasonable amount of data (up to several hundreds of thousand of record) it is preferable INSERT / APPEND from ARRAY.

This is my first test
Code:
LOCAL cc,tt,lni,bb[1000000,1]
CREATE CURSOR aa (aa C(60))
cc=REPLICATE('Shazam',10)

tt=SECONDS()
FOR lni=1 TO ALEN(bb,1)
	bb[m.lni,1]=m.cc	
NEXT

CREATE CURSOR aa (aa C(60))
tt=SECONDS()
INSERT INTO aa FROM ARRAY bb
?'INSERT FROM ARRAY',SECONDS()-tt
BROWSE

CREATE CURSOR aa (aa C(60))
tt=SECONDS()
FOR lni=1 TO ALEN(bb,1)
	INSERT INTO aa VALUES (bb[m.lni,1])
NEXT
?TRANSFORM(ALEN(bb,1))+' INSERT',SECONDS()-tt
BROWSE

This is the second test.
The differences are very big.
I suspected redimensioning, but they took only a tiny fraction.

Code:
CLEAR
CREATE CURSOR cities (city M)
INSERT INTO cities VALUES ('Baltimore, Boston, Denver')
INSERT INTO cities VALUES ('New York, Berlin, Paris, Roma, Madrid')
INSERT INTO cities VALUES ('Boston, Berlin, London')
INSERT INTO cities VALUES ('Buoenos Aires')

* ALINES
tt=SECONDS()
CREATE CURSOR city (city C(100))
FOR lnt=1 TO 10000
	SELECT cities
	SCAN
		lnCity=ALINES(laCity,city,1+4,',')
		DIMENSION laCity[m.lnCity,1]
		INSERT INTO city FROM ARRAY laCity
	ENDSCAN
NEXT
?SECONDS()-tt
SELECT city
BROWSE

tt=SECONDS()
CREATE CURSOR city (city C(100))
FOR lnt=1 TO 10000
	SELECT cities
	SCAN
		lnCity=ALINES(laCity,city,1+4,',')
		DIMENSION laCity[m.lnCity,1]
		SELECT city
		APPEND FROM ARRAY laCity
	ENDSCAN
NEXT
?SECONDS()-tt
SELECT city
BROWSE

tt=SECONDS()
CREATE CURSOR city (city C(100))
FOR lnt=1 TO 10000
	SELECT cities
	SCAN
		lnCity=ALINES(laCity,city,1+4,',')
		FOR lni=1 TO lnCity
			INSERT INTO city VALUES (m.laCity[m.lni])
		NEXT
	ENDSCAN
NEXT
?SECONDS()-tt
SELECT city
BROWSE




Respectfully,
Vilhelm-Ion Praisach
Resita, Romania
 
Don, this is just a side talk about performance of the different solutions. It will only matter, if you have much data.

I did coverage profiling and in short the DIMENSION takes short. What takes long is APPEND FROM ARRAY and INSERT FROM ARRAY:

Code:
0 Hits    1st 0.000000   Avg 0.000000   Set coverage to d:\temp\insert.log
1 Hit     1st 0.002061   Avg 0.002061   CLEAR
1 Hit     1st 0.003150   Avg 0.003150   CREATE CURSOR cities (city M)
1 Hit     1st 0.002554   Avg 0.002554   INSERT INTO cities VALUES ('Baltimore, Boston, Denver')
1 Hit     1st 0.000154   Avg 0.000154   INSERT INTO cities VALUES ('New York, Berlin, Paris, Roma, Madrid')
1 Hit     1st 0.000081   Avg 0.000081   INSERT INTO cities VALUES ('Boston, Berlin, London')
1 Hit     1st 0.000076   Avg 0.000076   INSERT INTO cities VALUES ('Buoenos Aires')
                                        
                                        * ALINES
1 Hit     1st 0.000079   Avg 0.000079   tt=SECONDS()
1 Hit     1st 0.000381   Avg 0.000381   CREATE CURSOR city (city C(100))
1 Hit     1st 0.000227   Avg 0.000227   FOR lnt=1 TO 1000
1000 Hits 1st 0.000047   Avg 0.000041   	SELECT cities
1000 Hits 1st 0.000044   Avg 0.000058   	SCAN
4000 Hits 1st 0.000056   Avg 0.000071   		lnCity=ALINES(laCity,city,1+4,',')
4000 Hits 1st 0.000041   [COLOR=#CC0000]Avg 0.000046   		DIMENSION laCity[m.lnCity,1][/color]
4000 Hits 1st 0.000593   [COLOR=#CC0000]Avg 0.000642   		INSERT INTO city FROM ARRAY laCity[/color]
4000 Hits 1st 0.000077   Avg 0.000097   	ENDSCAN
1000 Hits 1st 0.000033   Avg 0.000044   NEXT
1 Hit     1st 0.024423   Avg 0.024423   ?SECONDS()-tt
1 Hit     1st 0.000083   Avg 0.000083   SELECT city
                                        
1 Hit     1st 0.000036   Avg 0.000036   tt=SECONDS()
1 Hit     1st 0.000347   Avg 0.000347   CREATE CURSOR city (city C(100))
1 Hit     1st 0.000055   Avg 0.000055   FOR lnt=1 TO 1000
1000 Hits 1st 0.000035   Avg 0.000040   	SELECT cities
1000 Hits 1st 0.000043   Avg 0.000049   	SCAN
4000 Hits 1st 0.000053   Avg 0.000067   		lnCity=ALINES(laCity,city,1+4,',')
4000 Hits 1st 0.000035   [COLOR=#CC0000]Avg 0.000043   		DIMENSION laCity[m.lnCity,1][/color]
4000 Hits 1st 0.000031   Avg 0.000037   		SELECT city
4000 Hits 1st 0.000565   [COLOR=#CC0000]Avg 0.000617   		APPEND FROM ARRAY laCity[/color]
4000 Hits 1st 0.000086   Avg 0.000096   	ENDSCAN
1000 Hits 1st 0.000035   Avg 0.000043   NEXT
1 Hit     1st 0.000244   Avg 0.000244   ?SECONDS()-tt
1 Hit     1st 0.000207   Avg 0.000207   SELECT city
                                        
1 Hit     1st 0.000073   Avg 0.000073   tt=SECONDS()
1 Hit     1st 0.000766   Avg 0.000766   CREATE CURSOR city (city C(100))
1 Hit     1st 0.000061   Avg 0.000061   FOR lnt=1 TO 30000
30000 Hits1st 0.000044   Avg 0.000035   	SELECT cities
30000 Hits1st 0.000045   Avg 0.000041   	SCAN
120000 Hit1st 0.000118   Avg 0.000046   		lnCity=ALINES(laCity,city,1+4,',')
120000 Hit1st 0.000172   Avg 0.000037   		FOR lni=1 TO lnCity
360000 Hit1st 0.000083   [COLOR=red]Avg 0.000041   			INSERT INTO city VALUES (m.laCity[m.lni])[/color]
360000 Hit1st 0.000042   Avg 0.000035   		NEXT
120000 Hit1st 0.000084   Avg 0.000036   	ENDSCAN
30000 Hits1st 0.000031   Avg 0.000035   NEXT
1 Hit     1st 0.000283   Avg 0.000283   ?SECONDS()-tt
1 Hit     1st 0.000041   Avg 0.000041   SELECT city
1 Hit     1st 0.000031   Avg 0.000031   Set Coverage To

I did do more iterations of the last part, so the three solutions take equally long without coverage. The last part with single inserts takes much longer in coverage due to writing much more log entries. Nevertheless the coverage logging measures the time of the code itself correctly and you can make coparisons of the average times needed. In average the APPEND/INSERT FROM ARRAY each only insert about 3 records (11/4), but even 0.000642 seconds/3= 0.000214 seconds is takeing about 5 times as long as the average single insert time of 0.000041 seconds.

It's not the redimensioning, though. Seems the insert/append FROM ARRAY operations take some preparation time.

Bye, Olaf.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top