Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Break up sentences into words 4

Status
Not open for further replies.

german12

Programmer
Nov 12, 2001
563
DE
I have a vfp-table with one column, called "Shares".
The field-type is character

Example:

Shares
IBM Call 12345
Call IBM 12345
Call Apple 4567
Apple Call 4567
Tesla Put 7891
Put 7891 Tesla
7891 Tesla Put
Tesla 1111

and so on...

As you can see, e.g the first 2 rows containing "IBM" are identical, however the sequence only is wrong.(and sometimes the blanks are different not allways the same length)
The same with lines containing "Apple" or "Tesla"

In general: the column was filled manually therefore the sequence and/or the blanks are arbitrarily for items which are identical.

This makes it too complicated to search and lookup with such a table, but how can I get a better structured table where the same words are in
separate columns eg.


col1 .... col2 .... col3
------------------------------
IBM...... call .... 12345
Apple.... call .... 4567
Tesla.... put .... 7891
Tesla.... ..........1111

Has anyone had a problem like that?





Peace worldwide - it starts here...
 
Two functions will get you where you need to go.

GETWORDCOUNT() will return the number of words in each line.
GETWORDNUM() will return each word.
you could then sort the words and match the records.

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
I agree completely with Griff's suggestion. But that said, you can avoid the problem by using three separate fields on your data-entry form: one for the company name, one for call/put, and one for the amount.

Going further, if the second of those columns can only ever contain "call", "put" or blank, then use an option group for the data entry of that field rather than have the user type the word in full. Or let them just type "C", "P" or nothing, and validate the field so that it only accepts those values. And if the third field is always numeric, validate that as well, perhaps by using ISNUM().

Having done all that, you can easily combine the three fields into a single string, if that's what you want.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Thank you very much Griff and Mike - your suggestions are exactly the solution I've been looking for.
Greetings from Germany.
Klaus


Peace worldwide - it starts here...
 
Just for fun:
Code:
CLEAR
RELEASE ALL
? render("IBM Call 12345")
? render("Call IBM 12345")
? render("Call Apple 4567")
? render("Apple Call 4567")
? render("Tesla Put 7891")
? render("Put 7891 Tesla")
? render("7891 Tesla Put")
? render("Tesla 1111")

FUNCTION RENDER
	PARAMETERS m.STRING
	PRIVATE i,m.STRING
	DIMENSION aryWords(3) 
	FOR i = 1 TO 3
		aryWords(i) = GETWORDNUM(m.string,i)
	NEXT
	M.STRING = ""
	ASORT(aryWords)
	FOR i = 1 TO 3

		m.string = m.string +TRIM(aryWords(i))+IIF(!EMPTY(aryWords(I))," ","")
	NEXT
		
	RETURN(TRIM(m.STRING))

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Thanks Mike

That whole bit was just to get rid of a leading space...

Had to look up EVL(), never used it!

I 'spect it's like my use of right("00000000"+alltrim(str(m.MyVal,8,0)),8) instead of the simpler padl()


Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Hi,

Nice piece of code - but that's not what german12 wanted. My suggestion based on your code - also just for fun

Code:
CLEAR
RELEASE ALL
? render("IBM Call 12345")
? render("Call IBM 12345")
? render("Call Apple 4567")
? render("Apple Call 4567")
? render("Tesla Put 7891")
? render("Put 7891 Tesla")
? render("7891 Tesla Put")
? render("Tesla 1111")

FUNCTION RENDER
	PARAMETERS pcSTRING
	LOCAL i
	PRIVATE pcSTRING
	DIMENSION laWords[3]
	 
	FOR i = 1 TO 3
		laWords[1] = ICASE(INLIST(GETWORDNUM(pcString, i), "Call", "Put") or ISDIGIT(GETWORDNUM(pcString, i)), "", ;
						GETWORDNUM(pcString, i))

		IF !EMPTY(laWords[1])
			EXIT 
		Endif
	NEXT
	
	FOR i = 1 TO 3
		laWords[2] = ICASE(INLIST(GETWORDNUM(pcString, i), "Call", "Put"), GETWORDNUM(pcString, i),"")

		IF !EMPTY(laWords[2])
			EXIT 
		Endif
	NEXT
	
	FOR i = 1 TO 3
		laWords[3] = ICASE(ISDIGIT(GETWORDNUM(pcString, i)), GETWORDNUM(pcString, i),"")
						
		IF !EMPTY(laWords[3])
			EXIT 
		Endif
	NEXT
	
	pcSTRING = ""
	
	FOR i = 1 TO 3

		pcString = pcString + ALLTRIM(laWords(i)) + " " + IIF(EMPTY(laWords(i))," - ","")
	NEXT
		
	RETURN(ALLTRIM(pcSTRING))

Enjoy

marK
 
For breaking strings into words, ALINES() is much faster than GetWordCount()/GetWordNum(). Probably not an issue for short strings like this, but people should always think of it as their first choice for parsing.

Tamar
 
hi,

Below a pimped up version of my code using ALINES() as suggested by Tamar

Code:
CLEAR
RELEASE ALL
CREATE CURSOR csrWords(cCompany C(50))
INSERT INTO csrWords Values("Mother IBM Call 12345")
INSERT INTO csrWords Values("Call IBM Father 12345")
INSERT INTO csrWords Values("Call Apple Brother 4567")
INSERT INTO csrWords Values("Apple Sister Call 4567")
INSERT INTO csrWords Values("Tesla Uncle Put 7891")
INSERT INTO csrWords Values("Put 7891 Tesla Aunt")
INSERT INTO csrWords Values("7891 Tesla Put")
INSERT INTO csrWords Values("Tesla 1111")

BROWSE 

replace ALL cCompany WITH render(cCompany)

BROWSE

CLOSE ALL
CLEAR ALL 

*!*

FUNCTION RENDER
	LPARAMETERS pcSTRING
	LOCAL i, lcRecomposed
	LOCAL ARRAY laWords[1]
	
	ALINES(laWords, pcString, " ")
	
	pcString = ""
	lcRecomposed = ""
	 
	FOR i = 1 TO ALEN(laWords)
		lcRecomposed = ICASE(INLIST(laWords[i], "Call", "Put", "Mother", "Father", "Brother", "Sister", "Uncle", "Aunt") ;
			or ISDIGIT(laWords[i]), "", laWords[i])

		IF !EMPTY(lcRecomposed)
			pcString = lcRecomposed + " - "
			lcRecomposed = ""
			EXIT 
		ENDIF
	NEXT
	
	FOR i = 1 TO ALEN(laWords)
		lcRecomposed = ICASE(INLIST(laWords[i], "Call", "Put"), laWords[i], "")

		IF !EMPTY(lcRecomposed)
			pcString = pcString + lcRecomposed + " - "
			lcRecomposed = ""
			EXIT 
		ENDIF
	NEXT	

	FOR i = 1 TO ALEN(laWords)
		lcRecomposed = ICASE(INLIST(laWords[i], "Mother", "Father", "Brother", "Sister", "Uncle", "Aunt"), laWords[i], "")

		IF !EMPTY(lcRecomposed)
			pcString = pcString + lcRecomposed + " - "
			lcRecomposed = ""
			EXIT 
		ENDIF
	NEXT	

	FOR i = 1 TO ALEN(laWords)
		lcRecomposed = ICASE(ISDIGIT(laWords[i]), laWords[i], "")

		IF !EMPTY(lcRecomposed)
			pcString = pcString + lcRecomposed
			lcRecomposed = ""
			EXIT 
		ENDIF
	NEXT
		
RETURN pcSTRING

Enjoy

marK
 
Thanks to all of you for additional hints.
However - I just now realized, that there is one question left:
All sentences can now be splitted into there words very fast due to your recommendations.
So it would be possible to have additional columns each containing one of the words.
Now, let's say the following were written via input: (example)

Record 1: IBM 123 Mother
Record 2: Mother 123 IBM

of course I can now split each sentences above in columns. (in this case three columns)
But that is not the end, as I can see.

how can that now be sorted to have that infos from record 1 and record 2 together?


Record 1 and 2 should be sortable (it doesn't matter whether they contain both "IBM 123 Mother" or they contain BOTH "Mother 123 IBM" because the meaning would be the same.

I think of a some sort of "checksum" of both expressions, but do not know, how that could be reached.

Regards
Klaus






Peace worldwide - it starts here...
 
If you use the render I gave you, and store that in a new field (cRender perhaps) you can index on that
to achieve what you want (that is why I sorted the entries so no matter what order the elements are in
two similar records will be adjacent).

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Uh, yes Griff,
your code is super. Stupid as I am, I added one more record

("IBM Mother Call 12345")to your INSERT code and saw only into your browse window, where there were of course nothing changed so far.
But now I tried your render function - and it works great!

I am very happy that this forum still exists. Last time I was here is about 10 years ago.
And I also see many names still here which gave so excellent advices at that time.
I even have the impression that the ideas keep getting better
the older you all get.
I'm particularly pleased.

Best regards
Klaus



Peace worldwide - it starts here...
 
For that you need this... (more than 3)

Code:
CLEAR
RELEASE ALL
? RENDER("IBM Call 12345")
? RENDER("Call IBM 12345")
? RENDER("Call Apple 4567")
? RENDER("Apple Call 4567 mother")
? RENDER("Tesla Put 7891")
? RENDER("Put 7891 Tesla")
? RENDER("7891 Tesla Put")
? RENDER("Tesla 1111")

FUNCTION RENDER
	PARAMETERS m.STRING
	PRIVATE I,m.STRING,m.WORDCOUNT
	m.WORDCOUNT =GETWORDCOUNT(m.STRING)
	DIMENSION ARYWORDS(m.WORDCOUNT)
	FOR I = 1 TO m.WORDCOUNT
		ARYWORDS(I) = GETWORDNUM(m.STRING,I)
	NEXT
	m.STRING = ""
	ASORT(ARYWORDS)
	FOR I = 1 TO m.WORDCOUNT
		m.STRING = m.STRING +TRIM(ARYWORDS(I))+IIF(!EMPTY(ARYWORDS(I))," ","")
	NEXT

	RETURN(TRIM(m.STRING))

Regards

Griff
Keep [Smile]ing

There are 10 kinds of people in the world, those who understand binary and those who don't.

I'm trying to cut down on the use of shrieks (exclamation marks), I'm told they are !good for you.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top