Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing Odd Data

Status
Not open for further replies.

PCHomepage

Programmer
Feb 24, 2009
609
1
16
US
This may be a MySQL query question but I suspect that PHP would be needed instead. In a table with a number of columns, there are two as shown below that I'm having trouble with. The data is from 1930s, 1940s and 1950s automotive parts books so I cannot change it but I need to be able to search on it more efficiently and its formatting is preventing that so I am thinking that either a query that can work through the issues or something else (perhaps the one-time creation of a lookup table) is the way to go.

The problem now is that it is impossible to search on many models because only part of the model number is there. For example, if someone wants to find 2213 nothing here will show up and searching on only 13 or even -13 will bring up thousands of unrelated entries from the table.

The data was formatted only to make it printable (this was OCRed from hard-copy) and it looks straightforward but actually the second example is what it really means:
(These are simplified without extraneous columns but there is a primary key field, among others)

[pre]PartNo Models
_________________________________________________________________
391719 2206-13-26-33; 2302-06-13-32-33

G141111 2631 (BODY 2678); 5431 (BODY 5478)

L422445 2401 (BODIES 2462-65-92-15-98): 2501; 2601-11-33[/pre]

Meaning:

[pre]PartNo Models
_________________________________________________________________
391719 2206 2613 2626 2633; 2302 2306 2313 2332 2333

G141111 2631 (BODY 2678); 5431 (BODY 5478)

L422445 2401 (BODIES 2462 2465 2492 2415 2498): 2501; 2601 2611 2633[/pre]

I am open to suggestions but what I think I need is to extract the data into something like this but I have no idea how to do so:

[pre]PartNo ModelNo BodyNo
391719 2206
391719 2213
391719 2226
391719 2233
391719 2302
391719 2306
391719 2313
391719 2332
391719 2333
G141111 2631 2678
G141111 5431 5478
L422445 2401 2462
L422445 2401 2465
L422445 2401 2492
L422445 2401 2415
L422445 2401 2498
L422445 2501 2601
L422445 2501 2611
L422445 2501 2633[/pre]

Otherwise some way to make the search engine see 2462-65 (for example) as 2462 2465 would be okay and, in that event, it probably is not necessary to differentiate between model number and body number. There will ultimately be hundreds of thousands of rows of data and there may be other similar oddities that I've not yet come across.

I just spotted another and without going through adding it to the above, the Model column looks like this so clearly I'll need to add a generic model table with the needed numbers:

ALL 22ND; 23RD; 24TH; 25TH; 26TH; 5402-06-13-26-31
 
No problem as I'm leaving town this afternoon for a couple days.
 
As mentioned, I added echos so that I could see what the script had done but I was surprised to see that it writes it all upon completion. Is there a way to make it echo line by line as it does the work?

On the model entries, this may be a bit tricky but I came across an instance where ALL does not apply to every one. It applies to those with an ordinal entry but not the others

ALL 22ND; 23RD; 2413; 2513; 2613-33

This means ALL 22ND and ALL 23RD but only the specific models for 24TH, 25TH and 26TH.
 
I don't think that the parsing is difficult _technically_, it is more the searching that you will have trouble with now. Having said that, however, imposing structure on essentially free form text is seldom straightforward from a business rules context. Particularly so when your last post indicates that whomever wrote the original dataset did not even adhere to their own convention.

on searching, for example if someone says that they have a model 2401 your search will have to be
Code:
select * from parts
where modelNo REGEXP '24(01|[^\d])'
 
Thank you. Right now I'm more interested in getting the data cleanly parsed out, then I can use it on the search and it has other uses too. It's close but not quite there.
 
I was partially prevaricating as the nature of how you were going to search must lead the nature of how you parse out the data.

Given the lack of structure it is difficult to see a neat way of constructing a parser.

And I am concerned that your statement re the meaning of RHD as a qualifier to all models before and after semicolons completely breaks the rules of the rest of the data structure.

Will RHD always appear ONLY ever as the first three characters of a model string? Ditto ALL? If so perhaps some logical rigour can be applied.
 
My search engine has been online for some time so it not relevant to this discussion due to the way it works. Once this data has been parsed out I can use the new table to get the matches needed.

RHD it would apply to everything in the rest of the cell so each model number would be preceded with RHD. Due to the very nature of RHD vehicles, there are no parts I know of specifically for some RHD models that could possibly fit other non-RHD models. As far as I can tell so far, RHD is always at the beginning of the field.

ALL is also always at the beginning of the field. However, ALL does not necessarily apply to everything in the cell. It applies only to the ordinal numbers, like 22ND, 23RD etc. This seems to be working like the rest except that ALL is there only on the first one. However, these ALL entries need to be broken apart for individual models from the parts_modelno lookup table.
 
search
believe me - it is entirely relevant. how you want to search must completely lead how you structure parsed data.

RHD
so it is impossible to have, say
Code:
2401;RHD 2402-03-04-05
i.e. RHD must always be first in the field and apply to all models in the field.

ALL[/b}
i think I'd need to see a comprehensive set of examples of each use of the term ALL to derive a plan to parse sensibly for that.
 
I didn't really want to get into it but the search engine does not directly search the parts data since it is set up to search all entries and tables on the site. It searches a table into which all the data has been dumped en masse. It might be relevant but it's unrelated to my question and to what I am trying to accomplish by cleanly parsing out the data. Until the data has been parsed out, the search engine is not what I'm working on.

No, it is not possible to have an RHD as you described. It is always at the beginning, is always separated from the rest by a comma and will always apply to all entries:

This will not happen:

Code:
[COLOR=#CC0000][b]2401;RHD 2402-03-04-05[/b][/color]

This is the way it always will be:

Code:
RHD, 2401-02-03-04-05

There cannot possibly RHD parts that also fit LHD cars so it would apply to all the models listed.

Here are some examples of ALL:

Code:
ALL 54TH
ALL 22ND; ALL 23RD
ALL 24TH; 25th; 26th; 54th
ALL 22ND; 23RD; 24TH; 25TH; 26TH; 5402-06-13-26-31

There are several like ALL 22ND; ALL 23RD but if it helps I can change them to ALL 22ND; 23RD

There may also be a combination:

Code:
RHD, ALL 22ND; 23RD
 
What is the meaning of RHD ALL 23RD?

Does it mean that all models starting with a 23 are RHD?
 
What is the meaning of RHD ALL 23RD?

I didn't give an example like that but if you mean RHD, ALL 22ND; 23RD, it means ALL 22ND and ALL 23RD series RHD cars. To be honest, I don't know if it means that every 23RD series car had an RHD equivalent or if it means that RHD applies to ALL 23RD where they were RHD but for the sake of programming, we'll have to presume the former.

Does it mean that all models starting with a 23 are RHD?

Well, not exactly but, yes, when related to the particular partNo it does. In other words, not all 23RD cars are RHD (or course) but for the particular part number, that is what it means. The part is for ALL 23RD series right-hand drive cars.

(I would be curious if anyone else watching this forum can guess what made of car we're talking about!)
 
Are there any more ideas to finalize this? It's almost there but not quite.
 
jpadie requests you contact him privately to continue this, due to some unfortunate events.



----------------------------------
Phil AKA Vacunita
----------------------------------
Ignorance is not necessarily Bliss, case in point:
Unknown has caused an Unknown Error on Unknown and must be shutdown to prevent damage to Unknown.

Web & Tech
 
Yes, I know and saw his message here before someone deleted it.
 
o.k, perfect.

----------------------------------
Phil AKA Vacunita
----------------------------------
Ignorance is not necessarily Bliss, case in point:
Unknown has caused an Unknown Error on Unknown and must be shutdown to prevent damage to Unknown.

Web & Tech
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top