Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Importing a tagged format text file. 2

Status
Not open for further replies.

SimoFin

Technical User
Nov 20, 2002
6
FI
I need some advice on how to import the following kind of
text data into mysql: ( it's a long list, but bear with me)

**
IB 1859060196
BI PAPERBACK
AU SHINE, NORMAN
BC VXFN
CO UK
PD 19990923
NP 128
RP 9.99
RI 9.99
RE 9.99
PU CONNECTIONS BOOK PUBLISHING
YP 1999
TI NUMEROLOGY
TI YOUR CHARACTER AND FUTURE REVEALED IN NUMBERS
EA 9781859060193
RF R
SG 2
GC M01
DE A unique step-by-step visual approach to numerology, how to assess
DE characters and compatibility from names and birth dates.
**
IB 1898295395
BI SPIRAL BOUND
AU PRICE, ROBIN
BC ELX
CO UK
PD 19961231
RP 15.75
RI 15.75
RE 15.75
PU ENGLISH EXPERIENCE
YP 1996
SR BRAIN FRIENDLY RESOURCES
TI CONVERSATION 2: STUDENT TALK
EA 9781898295396
RF R
GC C01
**
IB 071483503X
BI PAPERBACK
AU IVRY, BENJAMIN
BC CVF
CO UK
PD 19961030
NP 240
RP 14.95
RI 14.95
RE 14.95
PU PHAIDON PRESS LTD
YP 1996
SR 20TH-CENTURY COMPOSERS
TI FRANCIS POULENC
EA 9780714835037
RF R
SG 1
GC Q00


As you can see, every row begins with ** and all the fields
are preceded with a special tag. Furthermore, not every book has the same number of fields, only a number books have DE descriptions, for example.

I know how to import a standard CSV text file using LOAD DATA INFILE, but with this tagged format I'm lost. Please help. My supplier wont offer the data in any other format!

Simo Savonen
sipesa@utu.fi
 
You need a scripting language to handle this. What languages do you have available? ______________________________________________________________________
TANSTAAFL!
 
The server that I'm using has support for PHP, Perl and Python... but I'm a total newbie on these, except on PHP, which I've just recently started to get familiar with.

Simo Savonen
sipesa@utu.fi
 
I'm going to take a wild guess and say that the data we see on the screen isn't the same data as you pasted into your post.

You said that "every row begins with **", but I only see that 3 times in the data you posted.

Also, you said that "all the fields
are preceded with a special tag". I don't see any of these special tags in the data you posted.

You'll probably want to write a PHP script that reads in your text input file and processes it line-by-line, splitting apart the line into an array every time it encounters that special tag, and inserting the individual elements of that array into the database as each separate row.

If you can give me a better example of that text input file, I can probably throw something together for you.
 
Actually, I just realize that you only listed 3 rows. :) I was assuming you were listing each row on a line by itself.

To understand the input better, what is the significance of the line breaks in your input text? Does that signify separate columns?
 
You quessed it right... let me explain a bit more:

yes, the ** begins a unique book,
IB is ISBN, of course
BI means Binding
AU stands for Authors...
etc.

The line breaks do count for each field/column ending.

Normally, I'm used to importing this type of data
as such:
"ISBN","BINDING","AUTHORS",etc...

So... did this help?

Simo Savonen
sipesa@utu.fi
 
Here, I fetched the tags and their meanings for you,
tell me if this helps. It just goes to showing that there
are quite a few of these tags, and only a number of them are actually used in each book. It very rare for a book to
include all tags.

** START OF RECORD
IB ISBN
AV UK AVAILABILITY
BI BINDING
AU AUTHOR
BC BIC CLASSIIFCATION
CO COUNTRY OF ORIGIN
ED EDITOR
IL ILLUSTRATOR
EI EDITION
IU ILLUSTRATIONS
CP CO-PUBLISHER
LA LANGUAGE OF BOOK
MP MIXED PACK COMPONENTS
NC NATIONAL CURRICULUM KEY STAGE
PD PUBLICATION DATE
PA PAGINATION
NP NUMBER OF PAGES
RP UK RRP SELLING PRICE
RI UK PRICE INCL VAT
RE UK PRICE EXCLUDING VAT
DI PAGE SIZE/DIMENSIONS
PU PUBLISHER NAME
YP YEAR OF PUBLICATION
RC BOOKDATA READERSHIP CODE
RS BOOKDATA READERSHIP TEXT
SR SERIES
SE SERIES EDITOR
TI TITLE
ST SUB-TITLE
PT PART-TITLE
TR TRANSLATOR
PN PART NUMBER
DE SHORT DESCRIPTION
EA EAN NUMBER
RF RETURNS FLAG
RD RETURNS DATE
SI NEW ISBN
WE WEIGHT (Grams)
SG STAR RATING
PI GARDNERS PUBLICITY DESCRIPTION
GC GARDNERS CLASSIFICATION CODE
 
For each line of that input, let's call the first two characters the "2-code" and the rest of the line after the space the "data".

Create a table which can contain every attribute you might be sent, even if a lot of them will be empty.

In PHP, create an array which matches 2-codes to column names of your table. Something like: $translation = array ("IB" => "ISBN", "BI" => "binding", "AU" => "author");

Read each line of the file, and split it at the first space.

If the 2-code that you get this time is the same as the one you got on the last line, add data top the data you got last line.

If the 2-code is different, add the array value of $translation to a $columns string, and the value to a $values string.

For the first three lines of the first record you posted, $columns will read "ISBN, binding, author", and $values will read "'1859060196', 'PAPERBACK', 'SHINE, NORMAN'".

When you get to "**", combine everything into an SQL query, execute it, then clear out $columns and $values.

$query = "INSERT INTO books(" . $columns . ") VALUES(" . $values . ")";
______________________________________________________________________
TANSTAAFL!
 
Thanks alot for these tips!

All I have to do now is pick up my PHP bible, and read up
on arrays and such.

Simo Savonen
sipesa@utu.fi
 
Dear Sleipnir... I'm lost.

If you could find in in you time to help me a bit more,
I would be more than thankfull.

I've been trying to find help on this from my books,
but they haven't been of any use.

Can you please give me more code?

Simo Savonen
sipesa@utu.fi

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top