Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Converting From Access DAO Objects to .NET

Status
Not open for further replies.

Macka1147

Technical User
Jun 16, 2009
10
AU
Hi All

As A long time Microsoft Access user Using database objects DAO I thought it would be an easy matter to switch to VB.NET Express. Free from Microsoft. Yes the newer ADO and .NET data structures have more power. However my code works. I am hoping to simplify the transferral.

I have written an access program that manipulates data searching for certain patterns in the data.
All easy really, Using parameter queries to search when possible and if not search through the data Table and collect the information iteratively. The program makes several passes through the data then outputs to a new file. All the sub steps are stored for Auditing.
It is not possible to do it in one pass as the information is stored throughout the file.

All easy stuff, Using Access 97 or access 2003, I do not use the 2007 version.

Ok .NET is uses a new paradigm firstly being object oriented and also the “disconnected” Data model, and not one but a number of different access methods, using SQL, Datasets, or LINQ and several variants.

My question is twofold:

1. What is the fastest method to use to crunch the data. (The main reason I am straying from Access)

2. What is the simplest addressing mode? I normally have 2 tables open at once and using parameter queries derive data that I Store in the second table at each pass.

I do not need to work over a network on the net just locally.

So it is a question of efficiency, and simplification.

Any help in guiding me on this path would be appreciated.

Cheers

Macka J
 
I think you'll need to elaborate a little more on what you are trying to do, but here is the rule of thumb I use:

1. I always use the standard ADO.Net data access libraries instead of classic ADO.

2. To get a single value from a query I use Command.ExecuteScalar() to return a value instead of filling a DataSet/DataTable.

3. To get a few values back from a query in which the values are more colum-based than row-based, I will quite often use output parameters rather than filling a DataSet/DataTable.

4. If I need more than one record returned I will use a DataReader first, and if I cannot get by with a DataReader, I will use a DataSet/DataTable.

5. Always use Parameters instead of string concatenation for your queries.

Other than that, I haven't used LINQ much, so I can't give you any pointers there. However, I will caution you that if you have plans to fill a DataTable with a generic query and do processing on that table from within your .Net application, you can quickly eat up memory if it's a large table. In that case (and most cases in fact), it's often better to try to do the processing within the database engine.
 
I would recommend doing the old trick with DAO through COM. If you program DAO well, it is by far the fastest way to manage the data. You should off course work with table-type recordsets as objects and look up the records with the seek method. Alas, ADO does not have these low-level access types, making it much slower. Disconnected recordsets are plain stupid for most applications, and especially for filesystem databases you have access to.

One reason of slowness in databases is the SQL parsing. Precompiled parameter queries can overcome this somewhat, but the object approach needs no SQL whatsoever.

Both ADO and ADO.NET are made for server databases. Just look at their object models. There's nothing in there for filesystem databases such as Access. This means that the filesystem database is both handled on the local machine AND emulated to behave like a server. This is stupid, as filesystem databases are light and accessible. So instead of the lightweight storage you have, ADO and ADO.NET perform so much household spoofing work that all advantage is lost and worse.

Therefore, I would recommend using DAO. It is the last database engine Microsoft published that knows what a filesystem database is.


+++ Despite being wrong in every important aspect, that is a very good analogy +++
Hex (in Darwin's Watch)
 
I would never recommend using COM in .Net if you have a choice not to. Too many headaches for the effort. I would recommend to invest your energy in developing efficient queries and algorithms rather than trying to eek out a few milliseconds of performance by choosing a non-native data access method. Then again, your best solution, as always, is to test different approaches and see what works best for you, overall.
 
As Riverguy said, I also not recommend ADO/DAO in .net applications. If it was enough why did we migrate from VB6 to .NET? There are reasons for sure.

Zameer Abdulla
 
Thank you everyone for your support.

My suspicions have proved correct

Also found this thread on the Tech forum
viewthread.cfm?qid=1310439 (Web address removed)
Speed issues mentioned there too.

It will be interesting to compare the speed of VB dot net and Access.
Microsoft Access currently takes about 30 seconds to get the job done. (On larger files) maybe 10000 rows using 8 passes through the data (2.66 Gig Core Duo Processor with 2 gig ram)

The process has to be repeated for hundreds of codes, hence the need for speed.

For those wondering I am processing Share market data, and will use ADO alone as the first test trying to keep away from Datasets where possible.

I must admit I am right out of my comfort zone, having worked with Access from version one it is quite a change, and prior to that it was a TRS 80 then PC with Quick Basic then early VB. I used Access to run a small business with no problems. Please excuse the evangelism! Before getting back on subject it is worth noting the in my experience software improvement (Bloat?) has exceeded processor speed most of the journey.

Access has another problem that I did not mention on the previous post. The office Chart is buggy. From time to time it corrupts the MDB file. The VB.Net path allows me to choose a third party chart if necessary. However I have tried the VB/Office chart and it appears to behave differently to the embedded access chart. It was faster too.

If anyone wants to post what they believe is the lowest level approach to editing and appending a table it would be most appreciated.

In the mean time I will start coding and testing.
Will post the results soon as available.

Thank You.

Macka
 
As long as you are starting to stray from MS Access, you might also think about trying out SQL Server 2008 Express. It's free, fairly easy to install (as long as you meet all the prerequisites i.e powershell). There is a version that has a GUI, which makes using it a lot easier. Just a thought.
 
Hi All

Just an update.

Results to hand indicate that VB.Net may not be the ideal application.
Access 2003 (Using DAO File Access) appears to be a lot quicker than VB.Net working with SQL server.

MDB files have a maximum size of 2GB far too small to contain share data for an entire market. However for just one Share code the size adequate. About a years tick data can be stored in addition to many years of day data.

This was always going to be a large database with at least 2 terabytes growth per year. Searching and indexing that was always going to be difficult to optimise. (and that is only for a small subset of the total market)

The front end application just switches MDB files with VBA and does the necessary analysis easy and very fast about 1 second. The analysis itself takes longer. New files are created on the fly as needed by copying a master file and renaming it. That happens in a blink. The raw data comes in compressed form and is unpacked overnight.

There is an overhead of about 350KB per master file when empty, that has to be added to the disk space overhead. The files will also have to be split over a number of drives. That can be addressed by a lookup table, to point to the correct location.

So far I have not found a “Gotcha” everything is going to plan.

Reading a few posts in here has told me I have a lot to learn, with many highly skilled programmers contributing. Please excuse my enthusiasm for this method. It may be unfounded.

It will take a couple of weeks to get the code tidied up and fully functional. But the results are encouraging.

Cheers

Macka

 
Hi All
A further Update

So far 714 MDB files have been created
That contain just under 6 million rows of data, so far. in a couple of weeks

The largest has 250,000 rows

For market data or any other data where there is a clear division between data sets and you have to store large amounts of statistical data, this is a good option.

Data access is lightening fast. Using DAO.

Cheers

Macka
 
You're creating over 700 seperate Access databases each month?

HarleyQuinn
---------------------------------
Carter, hand me my thinking grenades!

You can hang outside in the sun all day tossing a ball around, or you can sit at your computer and do something that matters. - Eric Cartman

Get the most out of Tek-Tips, read FAQ222-2244: How to get the best answers before posting.

 
Not every month.
The number will only grow if more Market codes are added. The NYSE has thousands of listings; however this project is only following a subset.

We are storing times series data for analysis.
And access MDB file can store 2 gig max. This sort of market data can generate many terabytes.
However for one market (Code) the data will fit nicely.

I did not want to use SQL server. For cost reasons. My SQL Would be an option, however the overall speed with 100s of millions of rows is not great.

I wonder if other members of this forum have tried this method, using multiple backend MDB files? The front end MDB stores no data and all queries are located in the Backend databases.

The interesting part will be developing the analysis algorithms.

Cheers

Macka

 
Macka1147 said:
Access 2003 (Using DAO File Access) appears to be a lot quicker than VB.Net working with SQL server.
I doubt that! The difference you percieve is probably due to your years of experience with DAO while SQL Server is new to you (as I understand). The ways to optimize a server-based database requires a different way of thinking than a file based one. For example, are you using stored procedures to do operations on your database? If not, that is one way that your SQL Server database is not optimized.

Personally, I would hate to have to manage a system with hundreds of MDB files. Here's one example why - what if you need to make a schema change to your database? You will have to make that change to hundreds of files.

I'm not saying your method is neccessarily wrong - if it works, it works. I just suspect your comparison is between highly optimized DAO vs. unoptimized server database. From what I've heard about My SQL (which I have never used), hundreds of millions of rows should not be a big problem, if table design and database operations are created correctly.
 
Thank you for your thoughts.

Currently there are 1100 MDB files in the data directory compacted they are about 3 GB in overall in size with 15,227,799 records so far spread among the files. That total has accumulated in a couple of weeks.

I redo the schema by copying all the data into a new file that has been changed manually. I could write the code to create the new master file (That I copy into), but it is far less work to just create a new master manually then edit the copy program adding or removing fields, Surprisingly this is extremely fast with the added benefit that the files are compressed at the same time, it takes little more time than a normal file copy operation, taking around 15 - 20 minutes with the current data load. The added benefit of this is the fresh copies are compacted. That being the ideal time to also back up to the minimum size, so the whole process is rolled into the daily backup.

Compacting each day also gets rid of the Access MDB bloat problem. Well almost, the files get so worked over by the analysis that is done on them with multiple parameter queries that several times the 2gb limit has been exceeded and stopped the program. Adding data causes very little bloat. It is the queries that do the damage. I will have to include a file size check on the larger size files and call a compact on the fly before continuing. They are not all the same size. The boat can be 10 times the data stored size. A scan of the net has so far not revealed definition of the cause.

Speed. Now things get really interesting how do you chomp through a large database efficiently, Not so easy. So far most of the data grabbing is done with parameter queries not programmed SQL commands through VBA, actual queries that I base my record sets on, then using the sieved data I loop through the relatively small record set returned grabbing the information I want. The result of one query forms the basis for another, often there are a number of record sets open at the same time.

Maybe I am just asking too much of the system in particular one system, certainly using separate MDB files lends itself to splitting the processing of the files into groups handled by separate CPU’s, or will a common SQL database handle that with several processors accessing it. I guess only trial and error will find the best method. SQL may not even the best structure? This is not transaction processing that you would find in an accounting package. It is time series data highly repetitive lists being passed through an elaborate filter. That in the end ranks the data sets contained in each file. Maybe there is a better Data structure?

Currently it takes several hours to process the data the file system is only a small part of overall performance. At least the system itself is working fairly well. Access as always is great way to get started on a project. I hope it can be tweaked enough to get the speed needed. For real time use.

Cheers

Macka
 
Your last post actually makes a good argument against using these multiple MDB files.

Macka1147 said:
15,227,799 records
That's nothing for an SQL Server database.

Macka1147 said:
it takes little more time than a normal file copy operation, taking around 15 - 20 minutes with the current data load
I think most SQL Server DBA's would would be alarmed if anything took that long. In SQL Server, compacting and backing up the database would be part of the regular maintenance activities, usually scheduled during off-peak hours but even when done while users are still active can usually be accomplished without affecting them too much, to the point where they usually wouldn't notice.

Macka1147 said:
multiple parameter queries that several times the 2gb limit has been exceeded and stopped the program
I would expect the users don't like that. Even if you used the free version of SQL Server (Express) the database size limit is 4gb, and I suspect it would not grow as quickly as your MDB's. Although if you do hit the limit you will be faced with the same problem (i.e. everything stops).

Macka1147 said:
I will have to include a file size check on the larger size files and call a compact on the fly before continuing.
And that points out the problem with this multiple MDB type of schema, is all the extra code you have to write to maintain your database. You have to keep switching between MDB files, constantly checking for file sizes, etc. That type of work is really what a database system is supposed to take care of for you so you can concentrate on the business logic of your application. You are in fact creating your own database system which uses MDB files as a component, but you can never hope to match the efficiency and robustness of SQL Server, Oracle or even My SQL.

Macka1147 said:
Now things get really interesting how do you chomp through a large database efficiently
In SQL Server, through views, stored procedures, and above all else good table design (i.e. putting indexes on critical fields). This is where SQL Server excels over Access, in that all the data "chomping" occurs on the server side (if you do things correctly), whereas Access has to load all data and do all processing on the client side.

Macka1147 said:
Maybe there is a better Data structure?
Well yes, this is basically the difference between a database designed for data mining versus real-time processing. I'm by far not an expert on this to say any more on that topic.

Macka1147 said:
Currently it takes several hours to process the data the file system is only a small part of overall performance.
If that several hours is all automatic (i.e. not waiting for some manual data entry) then I suspect it is due largely to the file system and/or design of the database(s).

I'm not outright criticizing your system, perhaps from a cost-effective view it is the best solution (i.e. it meets the requirements and any improved technical solution would be to costly). But from a technical view I think you have already met the limits of an Access file based system which you are propping up with a kind of uber-framework of your own design. If you want to improve the efficiency of your application I think you will have to give up this file based system and commit to the world of server-based databases.

 
You really, really need to start thinking about purchasing a full blown RDMS, i.e. SQL Server. For the amount of data our handl;ing, it's the ONLY way to go.

15 million records is nothing. I was responsible for a db with in excess of 80 million rows (several THOUSAND tables). And it was in a multi user environment. Performance was well above acceptble. It also depends on the server. You really need a multi-core server with a fair amount of RAM.

And ditch the DAO. .net's ADO is really the way you want to go. It, normally, is a disconnect record set (You grab the data you need, then disconnect from the datasource).

The downside to all this is 1) Money. SQL Server isn't cheap. But then again, you have to look at the long term savings in terms of reduced work load, perforance increases, data integrity, etc.)
2) Time. It will take time to move all your data, install SQL Server, become familiar with the GUI, set up maint. schedules, build new apps, etc.

Also, there is a FREE program from MS called performance dashboard (it's really a custom report) that will show you what indexes it thinks you need to help improve performance. We used it at my last job (sucks, but I just recently got laid off..ARGHHH).

I think you have enough evidence to go to management and say "We have a problem with the existing database set up, here is the solution. It's not cheap, initially, but will help the bottom line over time."

BTW, I haven't seen anyone state it, but IF your current Access tables reside on a central server, every time you do a query, the ENTIRE table and/or database has to come over the network. It's the way Access works. If you keep a copy of each Access db on each pc, then it's very ineffective.

VB express is really a learning tool. Not really good for what your doing. Your really need a more robust version. Yes, again, it's pricey, but you can do soooooo much more.
 
Hi River Guy, DonQuichote, ZmrAbdulla, PRPhx, HarleyQuinn, and JoeAtWork.

Gee I am humbled by the time an attention you all have put in to help me.

When I started this project it was just a simple little idea. Now it has grown to 14Gb of data and that just keeps increasing.

As you know the idea of using multiple MDB’s is novel to say the least, possibly a little crazy.

As mentioned before there were issues with the files growing and also speed. But that was then. The bloat may have been caused by too many queries some with sub queries being run, redesigning the program got rid of the bloat problem. It would be great I it was possible to pinpoint the exact cause; so many changes were made that the real cause was lost. However that is no longer a problem. The files do not grow.

Optimization of the code got rid of the speed issues. (I was really lucky to have a good friend with a masters in software engineering to help code the core algorithm) Thank you “H”. It is amazing how he turned my spaghetti code into something really elegant.

The fact that the data size has increased fourfold has made no difference to the speed (of the recoded Program), I guess that makes sense. As the individual MDB file size is fairly small. They are processed one at a time. It is not a network server based system; the entire program resides on one pc. Although you could use “n” machines if it was needed, parallel processing the files. That may be a handy concept for some scientific research.

So what about using SQL server? It is impossible to ignore the advice you guys have given me on the advantages. If any hurdle pops up that can not be overcome it will have to be done. The good news is that the system is working well now as is. If we hit the wall it will be done.

Please accept my sincere thanks for you help. I will continue posting updates as the system evolves. I will be happy to answer any questions re implementation.

Cheers

Macka
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top