Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Importing Data from a Text File 1

Status
Not open for further replies.

David Higgs

Programmer
May 6, 2012
392
GB
In Amateur Radio circles there are numerous Software Based Logbooks. There is a specification called ADIF (Amateur Data Interchange Format) which Software Authors are encouraged to follow.

In example, I have a Text File with the following content.

Text File Contents
<Callsign:5>A1ABC <name:3>Joe <frequency:8>7.163000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>
<Callsign:5>A2DEF <name:5>David <frequency:8>7.160000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<WAB_Area:4> SD55
<EOR>
<Callsign:5>A1GHI <name:4>Fred <frequency:8>7.126000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>

The format of the above is as follows:-
<Callsign:5>A1ABC
<Field Name:Length of Data>Data which gives a Field Name of Callsign and Data A1ABC (In this instance the Data = 5 Characters, it can vary in length)
Each Record in the Text File can have some or all of the Fields and the Fields could possibly be in random order.

The full specification of the ADIF file is available here.

How would I go about Importing and Converting the original Text file to achieve a Table with the following format?
Table Contents after 'Import"

Callsign Name Frequency Band Mode Date
A1ABC Joe 7.163000 40m SSB 20170220
A2DEF David 7.160000 40m SSB 20170220
A1GHI Fred 7.126000 40m SSB 20170220



Regards,
David
 
Something built on this...

Code:
CREATE CURSOR ADILog ;
	(Callsign Varchar(20), Name Varchar(50), Frequency B(2), Band Varchar(20), Mode Varchar(20), QSO_Date D)

LOCAL ADISource AS String

TEXT TO m.ADISource NOSHOW
<Callsign:5>A1ABC <name:3>Joe <frequency:8>7.163000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>
<Callsign:5>A2DEF <name:5>David <frequency:8>7.160000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<WAB_Area:4> SD55
<EOR>
<Callsign:5>A1GHI <name:4>Fred <frequency:8>7.126000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>
ENDTEXT

* Skip header, if it exists
IF LEFT(m.ADISource, 1) != "<"
	m.ADISource = SUBSTR(m.ADISource, AT_C("<EOH>", m.ADISource) + LEN("<EOH>"))
ENDIF

LOCAL ARRAY ADIRecords[1]
LOCAL ADIRecord AS String
LOCAL ADIField AS String
LOCAL ADIValue AS String
LOCAL RecordsIndex AS Integer
LOCAL ADIBuffer AS Object

* Go through each log entry
FOR m.RecordsIndex = 1 TO ALINES(m.ADIRecords, m.ADISource, 1+4+8, "<EOR>")

	m.ADIRecord = m.ADIRecords[m.RecordsIndex]
	
	* prepare an empty buffer to hold the data
	SCATTER BLANK NAME ADIBuffer

	DO WHILE !EMPTY(m.ADIRecord)
		* fetch each field of the log entry
		m.ADIField = STREXTRACT(m.ADIRecord, "<", ":")
		* and its value
		m.ADIValue = STREXTRACT(m.ADIRecord, ">", "<", 1, 2)
		
		* adjust the value to the data type of the cursor
		DO CASE
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "C"
			m.ADIBuffer.&ADIField. = m.ADIValue
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "N"
			m.ADIBuffer.&ADIField. = VAL(CHRTRAN(m.ADIValue, ".", SET("Point")))
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "D"
			m.ADIBuffer.&ADIField. = EVALUATE("{^" + TRANSFORM(m.ADIValue, "@R 9999-99-99") + "}")
		ENDCASE
		
		m.ADIRecord = IIF(AT("<", m.ADIRecord, 2) = 0, "", SUBSTR(m.ADIRecord, AT("<", m.ADIRecord, 2)))
		
	ENDDO

	* whatever it was found in the entry goes to the cursor
	INSERT INTO ADILog FROM NAME m.ADIBuffer

ENDFOR

BROWSE
 
Hi atlopes,

Thank you very much for writing the code, it was very much appreciated. As is, it runs fine and the end result is what I was looking for. I now have a starting point and need to expand the code to read an actual Text File which may contain additional Fields. I also need to carry out some further tests, as the order of the Fields in the Text File can be in any order depending on the author of the original software. Also not all fields are used in all Records. Basically I am using a third party Logbook from which I can Export the data in ADIF Format in readiness to Import into my own VFP Based Logbook.

Regards,
David
 
Great to know, David.

You don't have to worry about field order and occurrence in the log, these won't affect the code: only the columns that you decide to store in your table/cursor will be fetched, in whatever order they come in, all others will be ignored.

You may consider a small improvement regarding getting character columns. Instead of

[tt]m.ADIBuffer.&ADIField. = m.ADIValue[/tt]

you may prefer

[tt]m.ADIBuffer.&ADIField. = ALLTRIM(m.ADIValue, 0, " ", CHR(9), CHR(13), CHR(10))[/tt]

to avoid newlines and tabs getting into your data.


António
 
Hi António,

Thank you for your reply, much appreciated. I've been pondering over this issue for a while now and with your help I have some direction in which to head. No doubt I'll have a few more questions!!

Regards,
David
 
The file structure is very unusual in defining lengths within the xml tags. This is not the usual way of using attributes in tags or defining a schema of the data as a separate first section. So this is not strictly XML and you can't parse it with the usual XML helper functions and objects.

Atlopes doesn't even make use of the encoded length, as you can simply extract between > and <, whatever length that is. What stríkes me is, you have an extra space after the defined length, but indeed the length is not helpful at all, as it varies per row and doesn't give you a hint on dimensioning the cursor you want to populate with the data. For example look at Joe, David and Fred. I assume names can even get longer. Strange format. As you say the composition and order of fields can vary on top of the length, so there also is no way to turn this into good xml. I see the ADIF format has a history going back to before XML became a norm.

Very strange. I don't need to add my turn on this, atlopes has made a good solution, automatically adapting to whatever input data. All you need to predefine is the ADILog import cursor, so it has the fields you want to import in the length needed.

If you would like to use more common XML tools, you should look out for ADX files instead, ADIF seems a still valid and active format, but it didn't go the route of modern XML.

Bye, Olaf.

 
Hi Olaf,

Thank you for your input.

As you say the ADIF is a bit of a strange format I guess not helped by the policy of always being Upward Compatible. I did think about using the ADX Format but as most of the available Software is based on the ADI Format I would still have to cater for importing it. Until such time that the main Software Authors change over to ADX then I'm afraid we are stuck with it.

I've done a bit more with Atlopes code by adding Import form a TXT File which works very well.

Code:
*
*	Import Data
*

LOCAL ADISource AS String

	m.ADISource = FILETOSTR("C:\temp\adif.txt")


However the normal file extension is .ADI and not .TXT and when I change the extension to .ADI I get the error of "File not found". So I am looking at how I can achieve Importing of .ADI File Format.


Regards,
David
 
David said:
However the normal file extension is .ADI and not .TXT and when I change the extension to .ADI I get the error of "File not found". So I am looking at how I can achieve Importing of .ADI File Format.

Humm, there is nothing in VFP preventing access to a file that depends on its extension.

How exactly are you changing the filename extension?

I'm asking this because when we save a file within Notepad with the name [tt]adif.adi[/tt], it will be saved as [tt]adif.adi.txt[/tt] unless we enclose the name in quotes, as in [tt]"adif.adi"[/tt]. Or when the option to Explorer hide known extensions in file listings is set to on (which is the default), we don't actually see the full filename... I know this is pretty basic (sorry for that) but, from experience, it's also something quite easy to forget or disregard.

 
Hi António,

I think you were spot on with file name, I've managed to Import several different files with no problems. So I guess I had the wrong filename format!

I've been looking through your code and have come across a few new (new to me) commands, so I'm learning as we go along. Very pleased with your help which has moved me along quite nicely.

On Importing a couple of ADIF files I noticed that there is a Blank Entry in the Cursor. When I edited the file and removed the blank spaces at the end of the ADIF File, all was ok.

Regards,
David
 
I'm really glad to hear about how you're advancing on this.

David said:
On Importing a couple of ADIF files I noticed that there is a Blank Entry in the Cursor. When I edited the file and removed the blank spaces at the end of the ADIF File, all was ok.

If it is safe to assume that newlines and tabs are irrelevant in the log entries (and I think they are), then you could get rid of them before processing the log file:

[tt]m.ADISource = ALLTRIM(CHRTRAN(FILETOSTR("C:\temp\adif.txt"), CHR(13) + CHR(10) + CHR(9), ""))[/tt]

With this, pre-processing the log files would be unnecessary, probably, because there wouldn't be invisible characters after the last <EOR>.
 
António said:
I'm really glad to hear about how you're advancing on this.

I am really grateful for you help, I didn't expect you to provide a working solution.

António said:
If it is safe to assume that newlines and tabs are irrelevant

Yes they are irrelevant.

I tried your code as above and it did indeed strip all of the irrelevant Tabs etc, however it was noticeably a lot slower in processing 3000 Log Entries.

Regards,
David
 
Another way around would be testing for empty rows. This could be done by comparing the entry buffer with an empty one, and only insert if they do not match.

The original code revised (changes highlighted):

Code:
CREATE CURSOR ADILog ;
	(Callsign Varchar(20), Name Varchar(50), Frequency B(2), Band Varchar(20), Mode Varchar(20), QSO_Date D)

LOCAL ADISource AS String

TEXT TO m.ADISource NOSHOW
<Callsign:5>A1ABC <name:3>Joe <frequency:8>7.163000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>
<Callsign:5>A2DEF <name:5>David <frequency:8>7.160000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<WAB_Area:4> SD55
<EOR>
<Callsign:5>A1GHI <name:4>Fred <frequency:8>7.126000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
[highlight #FCE94F]<EOR>


ENDTEXT[/highlight]

* Skip header, if it exists
IF LEFT(m.ADISource, 1) != "<"
	m.ADISource = SUBSTR(m.ADISource, AT_C("<EOH>", m.ADISource) + LEN("<EOH>"))
ENDIF

LOCAL ARRAY ADIRecords[1]
LOCAL ADIRecord AS String
LOCAL ADIField AS String
LOCAL ADIValue AS String
LOCAL RecordsIndex AS Integer
LOCAL ADIBuffer AS Object
[highlight #FCE94F]LOCAL ADIEmpty AS Object

SCATTER BLANK NAME m.ADIEmpty[/highlight]

* Go through each log entry
FOR m.RecordsIndex = 1 TO ALINES(m.ADIRecords, m.ADISource, 1+4+8, "<EOR>")

	m.ADIRecord = m.ADIRecords[m.RecordsIndex]
	
	* prepare an empty buffer to hold the data
	SCATTER BLANK NAME m.ADIBuffer

	DO WHILE !EMPTY(m.ADIRecord)
		* fetch each field of the log entry
		m.ADIField = STREXTRACT(m.ADIRecord, "<", ":")
		* and its value
		[highlight #FCE94F]m.ADIValue = ALLTRIM(STREXTRACT(m.ADIRecord, ">", "<", 1, 2), 0, " ", CHR(13), CHR(10), CHR(9))[/highlight]
		
		* adjust the value to the data type of the cursor
		DO CASE
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "C"
			m.ADIBuffer.&ADIField. = m.ADIValue
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "N"
			m.ADIBuffer.&ADIField. = VAL(CHRTRAN(m.ADIValue, ".", SET("Point")))
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "D"
			m.ADIBuffer.&ADIField. = EVALUATE("{^" + TRANSFORM(m.ADIValue, "@R 9999-99-99") + "}")
		ENDCASE
		
		m.ADIRecord = IIF(AT("<", m.ADIRecord, 2) = 0, "", SUBSTR(m.ADIRecord, AT("<", m.ADIRecord, 2)))
		
	ENDDO

	* whatever it was found in the entry goes to the cursor
[highlight #FCE94F]	IF !COMPOBJ(m.ADIBuffer, m.ADIEmpty)
		INSERT INTO ADILog FROM NAME m.ADIBuffer
	ENDIF
[/highlight]
ENDFOR

BROWSE
 
Hi António,

Importing 3200 Log Entries:-
Version 1 takes approximately 5 Secs and Version 2 & 3 take about 55 secs to display the cursor contents, so a considerable difference. As the routine will be infrequently used this should not be too much of an issue.

Regards,
David
 
The last version should not take as long. Are you sure you have used it INSTEAD of changing the files and not additionally to changing the files? The operation taking long is removing all whitespace from the adi files, as all the files are read in once and rewritten once, then read in again into the cursor. The latest code of atlopes instead can work on the unmodified files and will simply not insert empty records into the import cursor.

Bye, Olaf.
 
Olaf said:
The last version should not take as long.

I've just rechecked everything. Using the same Data Source for each test, I inserted a Process Timer into the routine and achieved the following results.

Version 1 = 10 Secs
Version 2 = 76 Secs
Version 3 = 60 Secs

Regards,
David
 
Looked for a big log sample file to test before proposing another version, but found none, so this goes as guesswork.

Probably COMPOBJ() and the ALLTRIMming in every field, regardless, are slowing down the log processing.

Using a simpler emptiness verification, and moving the trimming to where it is (or appears to be) really necessary:

Code:
CREATE CURSOR ADILog ;
	(Callsign Varchar(20), Name Varchar(50), Frequency B(2), Band Varchar(20), Mode Varchar(20), QSO_Date D)

LOCAL ADISource AS String

TEXT TO m.ADISource NOSHOW
<Callsign:5>A1ABC <name:3>Joe <frequency:8>7.163000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>
<Callsign:5>A2DEF <name:5>David <frequency:8>7.160000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<WAB_Area:4> SD55
<EOR>
<Callsign:5>A1GHI <name:4>Fred <frequency:8>7.126000
<band:3>40m <mode:3>SSB <qso_date:8>20170220
<EOR>


ENDTEXT

* Skip header, if it exists
IF LEFT(m.ADISource, 1) != "<"
	m.ADISource = SUBSTR(m.ADISource, AT_C("<EOH>", m.ADISource) + LEN("<EOH>"))
ENDIF

LOCAL ARRAY ADIRecords[1]
LOCAL ADIRecord AS String
LOCAL ADIField AS String
LOCAL ADIValue AS String
LOCAL RecordsIndex AS Integer
LOCAL ADIBuffer AS Object
[highlight #FCE94F]LOCAL ADIIsEmpty AS Boolean
[/highlight]
* Go through each log entry
FOR m.RecordsIndex = 1 TO ALINES(m.ADIRecords, m.ADISource, 1+4+8, "<EOR>")

	m.ADIRecord = m.ADIRecords[m.RecordsIndex]
	
	* prepare an empty buffer to hold the data
	SCATTER BLANK NAME m.ADIBuffer
	
[highlight #FCE94F]	* signal emptiness
	m.ADIIsEmpty = .T.
[/highlight]
	DO WHILE !EMPTY(m.ADIRecord)
		* fetch each field of the log entry
		m.ADIField = STREXTRACT(m.ADIRecord, "<", ":")
		* and its value
[highlight #FCE94F]		m.ADIValue = STREXTRACT(m.ADIRecord, ">", "<", 1, 2)
[/highlight]		
		* adjust the value to the data type of the cursor
		DO CASE
		CASE TYPE("m.ADIBuffer." + m.ADIField) == "C"
[highlight #FCE94F]			m.ADIBuffer.&ADIField. = ALLTRIM(m.ADIValue, 0, " ", CHR(13), CHR(9), CHR(10))
			m.ADIIsEmpty = .F.
[/highlight]		CASE TYPE("m.ADIBuffer." + m.ADIField) == "N"
			m.ADIBuffer.&ADIField. = VAL(CHRTRAN(m.ADIValue, ".", SET("Point")))
[highlight #FCE94F]			m.ADIIsEmpty = .F.
[/highlight]		CASE TYPE("m.ADIBuffer." + m.ADIField) == "D"
			m.ADIBuffer.&ADIField. = EVALUATE("{^" + TRANSFORM(m.ADIValue, "@R 9999-99-99") + "}")
[highlight #FCE94F]			m.ADIIsEmpty = .F.
[/highlight]		ENDCASE
		
		m.ADIRecord = IIF(AT("<", m.ADIRecord, 2) = 0, "", SUBSTR(m.ADIRecord, AT("<", m.ADIRecord, 2)))
		
	ENDDO

	* whatever it was found in the entry goes to the cursor
[highlight #FCE94F]	IF !m.ADIIsEmpty
[/highlight]		INSERT INTO ADILog FROM NAME m.ADIBuffer
	ENDIF

ENDFOR

BROWSE
 
Hi António,

I am really grateful for your assistance with this Procedure, much appreciated.

The latest routine took just 51 Secs to process 3124 Log Entries.



Regards,
David
 
David,

I took your 3-entries sample log and created a 6000-entries log with it. I know it won't be representative of real data, but on my machine, the last version (I didn't try the others) systematically process this log in 0 seconds (that is, under 1 second). Do you have or know of some real-life sample data that can be used to test the import?
 
David,

do you get same results (aside of empty rows) from all versions at least?
How bisg is the impact of needing 1 minute instead of 10 seconds? Do you need to do this every minute or once a day?

Bye, Olaf.
 

Hi António,

DPGH said:
The latest routine took just 51 Secs to process 3124 Log Entries.

I suspect the differences in speed between our examples is down to the Data that I am using. The 3124 Log Entries contain 50+ Fields, although, currently for Test Purposes, only 6 are being saved to the Cursor. So it's going to be quite an overhead compared to the original Data that we started with.

Olaf said:
do you get same results (aside of empty rows) from all versions at least?

Yes I do.

Olaf said:
How big is the impact of needing 1 minute instead of 10 seconds? Do you need to do this every minute or once a day?

To be fair the above figures are perfectly acceptable. The Procedure would be infrequently used, typically when transferring files from another Authors Software Package or after some Database Management / install Data Back Up.

Regards,
David
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top