Antique DOS Database Overload 2

jfelis · Sep 17, 2003

I have aboout 44,000 hours of hand typed legacy data in multiple semi-relational DOS Databases which I have been accumulating for the last 16 years yes that is about 50 hours a week for 16 years! I use SPCO's "Professional File"
program which they no longer support. I am of two minds what to do, should I try to port it all elsewhere or would I be better off tryimg to reverse engineer my current software and build in the extra functionality I need. At the moment I am primarily interested in Data Verification & cleanup with an eventual intention of creating better on-line & CRM facilities for my business and personal use. As my business is only part-time (I am semi-retired, due to ill health) I do not have funds to spend on expensive consultants etc. Could anyone suggest useful freeware/shareware resources for me. Thanking everbody in advance for any help available.

BTW I love this forum! I've just spent the better part of a working day (TSK! TSK!) reading it all, and it feels great to be sharing ideas with such a like-minded group.
If anyone wants to contact me directly I can be emailed at my website

http://www.ksb.citysearch.com.au

Yours Sincerely,
Jacq.

bkj123 · Sep 25, 2003

Hi Jacq.

What exactly are you looking to do: get the existing data in a another format where you can easily clean it, manipulate it, etc. OR a way to add NEW data to the existing structure...

Pls clarify and I'm sure we nail this. Thanks - BKJ

jfelis · Sep 25, 2003

Thanks bkj123,
After 16 years of hand typed data accumulation, I've really got 2 problems:

1/ Cleaning the data: eg "C J Cherryh" is the same person as "Carolyn Janice Cherryh" But "Green S" could be either "Simon Green" or "Sharon Green", bearing in mind that examples of the different versions of the data are spread across several thousand little individual datafiles, many of which have been hived off to archive storage.

2/ I've really run up against size limits: PF won't allow files bigger that 8meg or 65,000 records. You can't have more than 50 LOOKUP's to another file, or 8 linking indexes etc etc.

My current "Stock in Hand" file, for example, contains 33 fields, about 45,000 records, 13 automatic formulas (including 7 LOOKUP formulas which fetch values from 4 other data files), 8 sets of REPLACE operations, 5 LIST reports, 8 CROSSTAB reports, which totals about 23meg in file size.

This means that it is so big that it actually exists as three seperate data files, so if I want to search for specific data I have to do the same search in three different files, and the stock is continually expanding, so it is now threatening to become 4 files, with no real end in sight! and that is just one file, and I have hundreds!

BTW, I have some earlier emails covering aspects of this problem which were sent about to various people before I found this forum, is there a way I could send copies of these direct to the thread from Eudora - I've been typing my responses from memory and I may be forgetting to include something important.

I have tried twice to port my my stuff elsewhere, once by myself (using Access - hopeless!) and again with a friend (now EX-FRIEND) who breezed in and said, "sure, I can write
you a POS for the business in DBASE IV", who proceeded to try and change every single thing I'd ever done on the grounds that "Dymocks (a large general book chain here in Australia) doesn't do it that way!" Never mind that I have spent 16 years honing my processes BECAUSE THEY WORK FOR ME!

Don't get me wrong, I really love PF, I'm just outgrowing it. I've looked at quite a few other DBS programs, and I have found them hard to work with, which is why I'm wondering whether I might be better off trying to reverse
engineer PF, tweak the parameters, and write a new version
with greater capacities, which would still be able to work with all my current data files.
By for now,
Jacq.

jfelis · Sep 25, 2003

Just re-read my previous answer - several things I forgot to mention

1: Data typing is ODD in PF - the default type (untyped) is basically text, but you can define text as numeric at the report producing stage without affecting the type of the underlying data - I've used this "feature" a lot in my reports by adding text tags to numbers (or numbers to text) , so that the data in the field has a dual use - once as text and once as numeric. This where I ran into trouble with Access, I spent three days one Easter long weekend a couple of years ago, trying to import ONE of my data files, and failing miserably - I could import my dates as text, but as soon as I specified them as a DATE Access would blank the field even though they were in the form yyyy/mm/dd and I had specified that format in Access, I gave up in frustration!

2: If I do have to port it I might consider Filemaker Pro - I already have the software, thanks to a friend who gave me a box set which he'd got for review purposes. I've not had time to look at it much yet but I believe it has the same kind of multiple file structure that I'm used to. However
it worries me that I might end up back in the same situation
all over again in the future.

3: I do like the multiple file feature though - my entire database would never fit on my computer all at once, but today's working subset is 242 files occupying 197 meg. To explain this further a "file" for PF is pretty much the equivalent of a "table" in a fully relational DMBS, but with the advantage that individual files can be added or removed at the DOS/Windows level - this is particularly good for accounting type data - I can fit all of the files pertaining to a given financial year on one or two 120meg Superdisks,
and then take them off the system but still have almost immediate access to the data (PF will read them off the floppies) if I need to know something.

BruceReed · Sep 25, 2003

Hi Jacq

Sorry about the previous email. I thought you wanted it that way.

A friend of mine uses File maker Pro and HE LOVES IT. There is no practical upper limit on the number of records or the size (over 1 billion for sure). I can get more details from him if you want. He liked it much better than Access which he can't stand, whereas I like Access myself.

Data Cleaning there is now easy way to do it except grunt though it. I found a place that cleans company addresses but it is for North America, it won't be much help to you.

Pardon the rude question, but do you NEED the old data? Perhaps by not cleaning, importing & fixing up the old stuff will save you a lot of headaches & only bring over the last 3-6 years. Keep stuff older than this only on PF & look it up if needed. If this is not possible, you are going to have a lot of fun normalizing the database as you pointed out different customer spellings.

Cheers
Bruce

bkj123 · Sep 30, 2003

Jacq -

Any more info you can send would be helpful. I'm really trynig to get a hold on the issues. Not so much the technical issues, but the business or process issues. What is the data? What is your desired end result (data model, easy access, clean data, etc? How will you use the data? How will we add new data to it?

Are you able to export or get the data into any type of text file format (e.g. fixed width? delimited?)? If so, I believe we have an excellent chance to meet your needs (once defined/understood).

Regarding Bruce's comment, Data Cleansing with Name and Addresses can be a very laborious process not to mention not very accurate in the end.

Please respond with answer and more info (i.e. your eudora posts).

Thank you. - Brian

jfelis · Oct 10, 2003

Further Thoughts,
I've been wandering about the net, a bit, since my last visit to Tek-Tips and I came across the Open Directory project and I'm impressed by the way you can find things in it - I think what I'd like to do (at least in part) is formalise some sort of hierarchial structure within fields where I could see the existing item at each level of the hierarchy and either choose an existing item, edit an existing item, or add a new item

For example: The field I call "Alpha Force Order" and which I rely upon almost exclusively for sorting in several of my databases is really a kind of meta-field (hand constructed) and I have a constant fight to keep errors out of it as it's rules are quite complex, and carried around in my head!

Here are five examples, followed by the hierarchial break down in each case, to illustrate

For a book with two authors :-
Niven L Å& Pournelle J Footfall 5:1 OrbitSF 4.99p

Niven > L > Å& > Pournelle > J > Footfall > 5: > 1 > Orbit > SF > 4.99p

For a book in a series with many authors :-
Bugs 01 Leonard P Out of the Hive 1:1 VirginTVTieIn 4.99p

Bugs > 01 > Leonard > P > Out of the Hive > 1: > 1 >
Virgin > TVTieIn > 4.99p

For a book in a series within a series all by the same author :-
Cherryh C J Alliance Union Series 03 Chanur 04 Chanurs Homecoming 2:07 DawSF#0695 6.99u

Cherryh > C J > Alliance Union Series > 03 > Chanur > 04 > Chanurs Homecoming > 2: > 07 Daw > SF > #0695 > 6.99u

For a magazine where the same title has been reused several times :-
Judge Dredd the Megazine III Megaspecial nn 1996/??

Judge Dredd > the Megazine III > Megaspecial > nn > 1996/??

For a collected version reprinting previously periodical issues :-
Outlanders (trade) 04 1995/03 (Reprints Chapters #13-#16)

Outlanders > (trade) > 04 > 1995/06 > (Reprints Chapters #13-#16)

There are many other variants - it all depends on just what is being classified, but I need to keep the sorting order working properly. so that for example, the Batman comic book, (being the original source of the character) files before the Batman Novels and the Batman Toys and Books about Batman etc etc..

Mainly I want to see the field on the screen in the way I first typed it, because I find this layout easier to read than many individual columns, many of which could be blank for any given example, but I want it to sort as if each ">" in the second version leads down to a lower hierarchial level.

I also need a better way of resolving conflicting duplicate data. There are two authors named P Anthony - Piers and Patricia - I am currently using Anthony Pi & Anthony Pa to seperate them well, that's ok, but when Patricia first appeared on the scene I had to dig out and revise every occurance of Anthony P I already had, in multiple files,
that turned out to be well over a thousand records, and I'm still finding odd ones from time to time.

Bruce asked me if I needed the old data - well yes, because I use it to create the new data, for example, when I buy a book by Piers Anthony I look in my data file(s) to see if I have had it before and if so I copy that record to the new purchases file, or if not I copy whichever record is the closest match. I then swap files to the new purchases file and make any changes that are required, add the new pricing etc etc. The whole process was intended to cut down typing time - an average stock accesion record contains about 400 to 500 characters spread across 53 fields - and if I use a copy of a previous record I might only have to change as few as 20 characters in 5 or 6 fields manually to create a record appropriate to the new arrival.

I should stop now - that's probably way more than enough to chew on.
Jacq.

BruceReed · Oct 11, 2003

Jacq

I was thinking that your old data was sales data, not book data. Ignore my prior comment about the old data.

An open thought process, how do libraries catalog their book data? It might be interesting for you to visit a large public library & see what type of system they have in place. You might be able to use some of their ideas, as they have been doing this for decades, long before computer indexes.

Bruce

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Antique DOS Database Overload 2

jfelis

Technical User

bkj123

Technical User

jfelis

Technical User

jfelis

Technical User

BruceReed

Technical User

bkj123

Technical User

jfelis

Technical User

BruceReed

Technical User

Similar threads

Part and Inventory Search

Sponsor