Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Is Basic a good way to parse text files?

Status
Not open for further replies.

ghiebert

Technical User
Apr 7, 2005
6
CA
Hi guys

I used to love working with text in early Basic and then in Dos based Pascal. It's fun.

It is many years later, Windows is the only programming world I can use now, and I have not ever programmed in that platform.

There is an 8 meg comma delimited text file with 130 fields that I would like to parse, eliminating most of the fields and extracting pertinent data to a much smaller text file. This will be a list of 6000 inventory items.
Then I need to build a nice fuzzy search engine so that I can find what I need quickly from keywords.

Is Basic a good way to do this, or would you recommend something like Delphi? I don't mind using anything that is freeware, as long as it is fairly easy to learn.
 
There are a lot of varieties of BASIC out there, and most of them will work for what you want. The older DOS versions will balk at reading large chunks of data because of memory constrictions, but if you're willing to take things in smaller chunks, you can even get by there. The newer versions have more functionality built in, which would make them easier to use.

Here's the URL for a list of various BASIC compilers that are alternatives to Visual Basic:


Some are free, some cost money. It'll give you a start in finding what you want, if you wish to use some form of BASIC for your project.

Lee
 
Memory not an issue if Basic language can read the text between the commas, essentially one line at a time. What I have done in the past in Dos Pascal was to read one line, process it, append it to a file until EOF marker is reached. That way I only have a few bytes in memory and file size becomes kind of irrelevant.
 
Take a look at PowerBasic Console Compiler.
Parse statement, basic statements and commands

A true 32 bit compiler but natively has the look and feel
of MS-DOS QuickBasic or PowerBasic for DOS.

Add-ons include Console Tools and Graphics tools that can
make a Graphic window or just add windows features to the
Console window.

 
If this is still on the burner, I have used GWBasic for the same type of deconstruction. Came free with earlier DOS. It gives you absolute control of what you want to do.

Also used it to create a database for record storage for searches, although one of the databases would be better.

Ed Fair
Give the wrong symptoms, get the wrong solutions.
 
You might also consider Liberty Basic - a low cost ($29 or so) Windows compiler which can also create console applications. I've been using it for a couple of years now and have written a fairly sophisticated HL7 parser (HL7 is a data protocol 'language' for hospital systems) and am currently working on a flat file database system.

You can download a free trial of LB4.x from . It's a pretty good product (albeit a bit quirky at times) which is reasonably close to the 'classic' versions of BASIC and there is a large group of dedicated users who are happy to answer questions/help out with problems.

Here's a quick sample of the routine I wrote to parse out a line:

Code:
SUB breakup l$,d$
' l$ is the line, d$ is the divider (blank space or vertical bar)

    ndx = 1

    s$ = ""

    for i=1 to len(l$)     ' skip msg type, we know what it is already

        if mid$(l$,i,1) <> d$ then
            s$=s$ + mid$(l$,i,1)
        else
            ary$(ndx) = s$
            ndx = ndx + 1
            s$ = ""
        end if
    next i

    ary$(ndx) = s$

END SUB

Basically it takes the line (here l$) and the delimiter (here d$) and puts each token into a previously declared array. This approach would make it easy for you to pull out the values you wanted to keep and exclude everything else.

Also, for the 'fuzzy search' consider using a Soundex type of search - it's pretty quick to do and reasonably accurate.

Hope this helps - if you have any questions, I'll see if I can answer them.

Tom

"My mind is like a steel whatchamacallit ...
 
Using bits of code found on the web, I cobbled together a quick demonstration of a Liberty Basic version of the Soundex algorithm:

Code:
print "The Soundex Algorithm"

print "This is a quick program to test the Soundex algorithm ..."
print

input "Enter a word to convert:  ";wrd$

print "The code for ";wrd$;" is ";Soundex$(wrd$)

stop
end

function Soundex$(w$)

rslt$ = ""

w$ = upper$(w$)   '  Soundex is case insensitive so change it all to upper case

rslt$ = left$(w$,1)   '  Capture the very first letter for the output

oldcode = Asc(Mid$("01230120022455012623010202", Asc(w$) - 64))

for i=2 to len(w$)

    acode = Asc(Mid$(w$,i,1)) - 64    '  discard all non-alphabetic characters

    if acode >= 1 and acode <= 26 then

        dcode = Asc(Mid$("01230120022455012623010202", acode, 1))

        if dcode <> 48 and dcode <> oldcode then
           rslt$ = rslt$ + chr$(dcode)
           if len(rslt$) = 4 then i = len(w$)
        end if

        oldcode = dcode

    end if
next i

Soundex$ = rslt$

end function

Hope this helps ...

Tom

"My mind is like a steel whatchamacallit ...
 
ERROR! ERROR! (Danger, Will Robinson ... !) <-- probably dating myself, huh?

Anyway, I found a slight glitch in my Soundex routine so am posting a corrected version:

Code:
print "The Soundex Algorithm"

print "This is a quick program to test the Soundex algorithm ..."
print

input "Enter a word to convert:  ";wrd$

print "The code for ";wrd$;" is ";Soundex$(wrd$)

stop
end

function Soundex$(w$)

rslt$ = ""

w$ = upper$(w$)   '  Soundex is case insensitive so change it all to upper case

rslt$ = left$(w$,1)   '  Capture the very first letter for the output

oldcode = Asc(Mid$("01230120022455012623010202", Asc(w$) - 64))

for i=2 to len(w$)

    acode = Asc(Mid$(w$,i,1)) - 64    '  discard all non-alphabetic characters

    if acode >= 1 and acode <= 26 then

        dcode = Asc(Mid$("01230120022455012623010202", acode, 1))

        if dcode <> 48 and dcode <> oldcode then
           rslt$ = rslt$ + chr$(dcode)
           if len(rslt$) = 4 then i = len(w$)
        end if

        oldcode = dcode

    end if
next i

[COLOR=red]if len(rslt$) < 4 then rslt$ = rslt$ + left$("0000",4-len(rslt$))[/color]

Soundex$ = rslt$

end function

Change is an added 'if' statement right before the return which right pads the string to 4 characters if the length of the rslt$ string is less than 4.

Some quick tests:

knuth = K530
Euler = E460
gauss = G200
lukasiewicz = L222

Sorry 'bout that ... was having too much fun to thoroughly test out my 'solution'.

"My mind is like a steel whatchamacallit ...
 
That Soundex routine sounds very interesting. I think I'll give it a try!

Thanks!

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top