Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Access2000 - SQL - 6 terabytes of data challenge 1

Status
Not open for further replies.

tballky

Programmer
Jun 10, 1999
11
US
Will Access2000 be able to run simple queries against an SQL file that is currently measured only as "about 6 terebytes". The file records are to be in ascii fixed width format but I am a week away from even a data dictionary or file layout. There will be several such files in the server. I want to reduce the file size again and again to support further analysis. The reduced files would be removed to pc drive(s) for convience and possible performance gains. I am concerned about Access2000 handling an outrageous amount of initial records and don't know what to expect. I do not have the luxury of a mainframe to do initial file management but do have AccessXP and WinXP Pro available if that would be an advantage. At this time the records are on tapes and a location is being sought for this initial dump from tape and sorting. Any thoughts will be appreciated as well as ideas toward finding initial support.
 
remember that mdb's have a limit of 2Gb.

but if you use adp (project) with sql-server you dont have that problem. "What a wonderfull world" - Louis armstrong
 
Also, the ADP proces uses the 'mini' SQL server which will handle a bit more.

Personnaly, I'd have some doubts about doing much on a standard workstation with that much 'information'. It will take a while to wade through it - no matter wheather it is a dbengine or just plain code, and most networks are not really tuned for db performance, so you get more problems in attempting to move huge ammounts of data across the net.

As a starting point, I would think about a simple parseing process to read the terabytes on a much smaller basis (all the way down to a single record), and do some low level v&v stuff, creating some sub-set databases.

It 'sounds like' the commencement of a data-mining process, without the fore-thought as to the necesary recources for the task. One concern in using legacy archives for these purposes is that many (most? all?) of the older data 'stores' had some issues with record integrity, so there may (will?) be some records with corrupted fields -if not total record structure corruption. These wil need to be identified and dealt with (v&v), a pass through the file as simple text can provide that opportunity. Placing ill-formed records in one seperate "db" and ones with propper form, but illegal values in a seperate one (or more) simply starts to winnow the chaff out of the wheat. Further sub-division according to categories of interest may make it practical for some "PC" to process the subsets.

A large issue for many small companies (or groups) is probably going to be the storage. If you start with a simple "CSV" file and translate that to one or more db files, the overall storage will MORE than double. Your 6 terabytes of CSV will remain while generating more than 6 terabytes of MDB ([MSDE | Sql | ...]). Even finding a way to address that much storage on a standard workstation will be at least a minor chore.

It could be quite an interesting exercise, I hope your organization has some patience and is willing to invest in supporting the project.

MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
thanks MichaelRed, parseing was my plan but without the knowledge beyond Access or the hardware needs to pull it off... we are definitely in the "feeling the elephant" phase. Initial plan is to reduce the large files (there are 9) into considerable smaller files that would be able to be managed on a pc. I would suspect less than 10% of an initial file will be used but the entire file must be evaluated to pick the subject records and smaller files created. Also would need to gather metrics of the larger file to identify needed subsets. Can you help suggest how to get in touch with services that would support this initial step? This does sound like data-mining without resources but it's a beginning and I do appreciate your thoughts.
 
Most 'legal' jurisdictions (state/city/...) have some minority work programs, where contracts are required to include minority owned business participation in larger contracts. The jurisdictions have (and usually publish on a Web site) the approved minority businesses and the 'codes' (work type) they are approved for. I have occassionally used this as a resource to locate potential businesses for specialty work. In many cases, they are able to provide services quite economically.

Other (obvious) sources are your local telephone book, the better business org(s) in your area, and internet searches. Issues for all of these (in my opinion) will be the same as if you do it in-house. Storage. 6 terabytes is not generally just lying there as temporary storage. Further, third parties will need a very thorough specification of the work to be done -BEFORE- any work is started. Without the specification, you will get into the loop of finger pointing (also known as well you didn't ask for that ... another pass through the data ... ).

For me, it is ye olde 'hobson's choice'. Pay now. Pay later. Pay now AND later. What does the org. expect to get from the exercise (down to specifics, such as the number of [customers | suppliers | transactions where ...]? What is it 'worth'? Someone needs to realistically decide how this 'helps' and -more importantly HOW MUCH (in $$) it helps. Then, perhaps, you (or the org in general) can get some costs for the effort to generate the SPECIFIC recordsets desired from 'contractors' and also attempt to determine the in-house costs to be able to deal with the same issues which might be contracted.

I would expect the costs to set up SOMETHING as an in-house project would be not much different than the contract options. I would compare the contract cost to an in-house estimate with a few caveats in mind. The contract cost is a simple expense item to the organization, while an in-house project can add resources which improve the org capability. But, is that 'a good' thing? Equipment purchase can be depreciated, which (in a sense) reduces the cost. If personnel resources are required, it is a major committment, especially if it is on a direct hire basis. If it can be done with simply providing some educational assistance for current staff, it can (again) enhance the org capability. Many such questions need to be asked -and answered.


MichaelRed
m.red@att.net

Searching for employment in all the wrong places
 
thanks for the followup MRed, your ideas are sound and parallel some of today's adventure. We have found there is city/st funding as well as community effort to support small business such as ours in such an exercise. I agree the expectations clearly stated are paramount. We are developing our quote and considering costs of various phases of the project. Initial data management is beyond anything I have encountered in terms of size. There is no "in-house" option. Record management will be simulated in a pc database and routines will be defined to manage the large files once loaded. There will likely be considerable dirty records identified as such. I realize datasets need to be defined for subsequent reporting and plan to move these onto our network. That is where I had hoped to use Access against an sql database. In the meantime the tapes of ascii fixed and csv records will sit for further use. thanks again, tball
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top