Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

which one the best way in searching database? memo field or character string with fix lenth? 1

Status
Not open for further replies.

farzad321

Programmer
Sep 14, 2019
53
FR
Hi all FOX MANS
I have DBF for costumer data. one of that is address. some of address can very long about 250 character. and [highlight #FCE94F]everyday append about 50 new records to that.[/highlight]
In my application sometimes need to search address. I want to know which method is the best? MEMO or character?
If I used a fixed lenth character, maybe have a big DBF and larger everyday, and if I used MEMO field , HOW can search within memo that also enough speed. Also which way is the best?

1- sele * from dbf where ...
2- index on address for ...
3- set filter to at(address,xxx) > 0
and so on...

[highlight #FCE94F]Did this functions work in MEMO?[/highlight]
Can you refer me some data about this problem?
thanks alot for your good idea

FARZAD
 
Hi Farzad,

First point is that you cannot create an index on a memo field.

But, in any case, you cannot use an index to search inside a field. So if the fields contains [tt]ABC 123 XYZ[/tt], you could use the index to search for [tt] ABC 123 XYZ [/tt] or just [tt] ABC [/tt], but not for [tt] 123 [/tt] or [tt] XYZ [/tt].

You can still search inside a field (character or memo), but not using an index. So to search for [tt] 123 [/tt], for example, you could do this: [tt] LOCATE FOR AT(["123", AddressField) > 0[/tt]. The problem is that it will be much slower, because you cannot use an index.

(In theory, you could set up a system that stores each word in the address separately, in a separate table, with pointers back to the original table; you could then index that new table. But that would be very complicated especially for someone who is less experienced in FoxPro.)

I suggest you start by running some sample searches on your live table (or a table the same size as your live table). Depending on the volumes of data and on the speed of your network, it might be faster than you think.

You could also consider breaking up the address field into three or four separate fields, one for each line of the address. I don't know what the address format is like in your part of the world, but it is common to store the house number and street name in one line, the district within the town in another line, the town or city in a third, and so on. If you did that, you could index each of those fields, and then do an indexed search for each of them in turn.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
That's a reason why addresses are not stored as text but in their separate details like name, street, house number, zip code, city...

Indexes are fine for performance optimization, but only when they already exist, not when you create them for querying, especially when you create one for every single query.

In detail: Doing INDEX ON will create an index by processing every record, the only chance to already profit from the creation of an index for searching is a filtered index with a FOR condition, which can be optimized to only access the records that fulfill the filter condition. But that only works if there are other preexisting indexes able to optimize this FOR clause. And if there are such indexes, you could also create the resultset right away instead of such a filtered index, so that's why even such ideas don't work.

The essence is you always depend on the indexes you create before searching, not for searching. You plan ahead what indexes you need and obviously indexes are not for a specific name or city or partial search term, those indexes will never be useful in general.

As Mike also already told you there is no index for in-string searches. The idea of a filter based on AT() expression doesn't work, as you can't index on all possible search terms at once, which would be necessary, as AT() will have different results depending on what you search, an index node must have one value per record though, so you can only create indexes for a specific search, and then you can also just put the found records into a resultset instead, such indexes are then not useful for anything, neither SET FILTER nor SQÖ-Selects.

This is the case for which you'd need a feature like Full-Text search, so one possibility would be going for MSSQL Server instead of trying to figure out a half good replacement solution in VFP.
Because it won't matter that addresses are still short texts, it's enough to ask for partial matches, that's a thing no usual index type can help you with.

You find a question similar to yours here at thread184-1553613 and there I wrote some code to full-text index the book "War and Peace", which I cut into paragraphs first. The whole War and Peace text is your address database and paragraphs are what a single address is, then.

Indeed a search with the $ operator (contains) with [tt]'search term' $ field[/tt] search isn't much slower than using the split into syllables and index on them. The structure I create there isn't really what MSSQL full-text search data structure behind the scenes is like. You could fine-tune to limit the number of syllables stored, start with a minimum length of 5 characters, or remove any syllable which contains spaces or other non-alphanumeric characters as those syllables are even off the extended meaning I used for syllables there as any substring you can possibly take out from a text. You can also define a maximum length for search terms, as no one will type in 100 characters for a search.

Last, not least the detailing of addresses is worth considering even if you still want to store the complete address for covering that international addresses are not composed in the same way for every country. If you have this single fields you can also still write a search searching in any of them simply by ORing the single conditions: [tt]WHERE name=?m.search OR street=?m.search OR city==?m.search[/tt] etc. would at least cover the "begins with" for every single detail, which is already much better than only being capable to search for "the whole address begins with", which usually would always only be the customer or company name, so splitting an address into details helps. Indexing those you can also help finding by sorting by these details. You're never able to sort by country, state or city, if you only store full addresses.

Bye, Olaf.

Olaf Doschke Software Engineering
 
I have to ask, are you putting the entire address in a memo field? If so, I highly suggest that you split it into the 5 common fields: addr1, addr2, city, state, zip (more if needed). Then you can index and search on whatever you want.

Larry (TPG)
 
Hi again
Thank you very much for your response.

Dear Friends, I want to say an example about my problem.

I have 15000 record now and bigger everyday for 50 records.

I was make a one memo field for address.

Now user want to for example find all costumer records that a street name in my memo field contain "somethings", in other words user want to know all costumer that exist in special street, therefore if i can search in address memo in less secounds, i will be happy, But if i have some limit in memo search OR search time bigger than 30 secounds, I most change address field type and convert that to one or two string, also we dont use standard address system like 'city, state, zip and so on'. Here we often say addresses like a short story!

Also I use arabic windows and font character set 178 and use SET COOLATE TO ARABIC. Plus that we have a huge server,(Advance server 2016 os) and with CAT-6 network and 6 end user. I hope this line don't make me slower than normal.

[highlight #FCE94F]So I have one last question, If I continue memo field method, "$" and "at()" functions work in memo field, like a strings or i must use some another functions?
[/highlight]
thank you again dear MIKE , Olaf and TheProgrammerGuy.
 
I think you didn't realize everything we said and were facing a language barrier here.

But the one thing you obviously can do is simply try it. That's the best thing to see what you get.

15000 is not really much to go through, so this will likely not take 30 seconds, not even 1, perhaps.
But you can't profit from indexing and Rushmore optimization, your searches will always remain full table scans.

So just try [tt]SELECT * FROM yourtable where 'something' $ address[/tt].

If you never try you never know.

In the long run, when this grows to 100K, 1 million, you're far best off with MSSQL full-text index, so it might be a good idea to move there in the first place.
And you might apply my syllable indexing routine and think of how this makes sense for your case best and how you update this data with each new address instead of recreating the syllable index from scratch.
You'll likely need to modify this to use string functions, which are available for multi-byte character collations like Arabic maybe is.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Farzad, you mention a search time of 30 seconds. Even with a very large table and a slow network, you are unlikely to need 30 seconds for the sort of search you have in mind. I agree with Olaf: 50,000 records is not that much. I still say you should do some test runs. You might be surprised at how fast they run.

By the way, the fact that this is a memo field (rather than a character field) is not relevant in this discussion. They will take almost exactly the same time to search.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
And also be aware that memo fields (which are actually a separate file from the table itself) are more prone to corruption. It might be a really good time to consider adding the fields to your table. You can write some code to extract the data from your memo field and populate the new fields, and with some tweaking and careful inspection this should work pretty well.

Larry (TPG)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top