Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Query multiple databases from Visual Basic

Status
Not open for further replies.
Jul 23, 2009
16
US
I am using Visual Basic 6 as front end
I am using MS Access 2007 as back end

In MS Access 2007, I have 12 different databases for 12 different months ( Jan to Dec '08). Each Database is about 1.5 GB. The total size of all databases is about 18 GB
Each database has exactly one table called "Daily_Trans". All databases are exactly alike
Daily_Trans table has the sales transactions of the company on a daily and hourly basis.
The fields in the table are Date , Time, price , quantity_sold
For every day for example Jan 1, 2008, it has transactions at 9:01, 9:02 , 9:03 AM.... till 5:00 PM
and the transactions are the sales transactions for each minute and hour for that day and the price and quantity sold at that time.
January database has all 31 days of January '08, February database has all 28 days of February database and so forth.
At any given instant I can query only one month within a given database.
I need to get the aggregated sum of quantity, price , maximum price and average price.

The Visual Basic front end has the following fields
Start Date, Start time, End Date, End Time.

I am querying the databases based on this information provided at the front end.

The current query I am writing for one month is
Select max(price),average(price), sum(price) from daily_trans group by date.

Currently, my manager wants to query across multiple dates in multiple months
Is there a way I can query data from Jan 1, 08 to April 30 or ( Feb 17 to Sep 9 )

That would mean that I need to open Jan database, Feb Database, Mar database and Apr database to execute the first query
Could some one suggest a method or VBA procedure code so that I could query across multiple databases. All my databases are in " C:\Sales " folder

How could I query multiple databases for dynamically changing dates


thanks
 
I'm a little confused. Are you saying that each minute is represented in the table just once? If so, how is the database so large? Even for a month with 31 days, there's only 44,640 minutes. Then... since you are only storing 9 to 5, there should only be 1/3 as many rows (14,880).

I guess what I'm getting at is.... if you can get all the data in to one database, then the problem is trivial.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thank you

Actually we have transactions down to the seconds
I am sorry that I did not mention seconds
Yes, as you suggested, it would been easy if everything was in a single database

It is management decision that we have data in multiple tables
Any suggestions in this situation
 
What I would do is...

1. Install SQL Server Express 2008 (it's a free download).
2. Create linked servers for each of the 12 databases.
3. Create a view that combines the data from all the linked servers.
4. Write my query against the view.

Note: I am not suggesting that you import the data in to SQL Server Express. By using linked servers, you will have access to all the data stored in each of the Access databases.

Ultimately, I do think you would benefit by using SQL Server instead of Access, but I doubt you would get permission for that anyway.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I suspect you have wandered into a realm where a Jet MDB isn't the best datastore. I might go even further and suggest that most RDBMSs don't handle this kind of thing well.

I'd probably use one of the "hugefile" classes that wrap API calls to allow VB6 to break the 2GB limit. Then I'd store the data as fixed-length records using binary I/O against a single file.

Binary-search the large data file to locate "start" in the file. Then read from there through your "end" value (or EOF).

This seems practical because it sounds like you only write the data once, and then only access it sequentially after that within some arbitrary start and end range.

Writing a Max(), Mean(), and Sum() function is trivial.

If you do your sequential I/Os in blocks of 50 to 200 records (around a 32K block) this should utterly blow the doors off anything SQL Server is going to do for you. But if you need to do anything more complex it may not be a good alternative.

The downside is it will be more work to write and debug, but it shouldn't be that much more.
 
I respectfully disagree.

One of the points I was trying to get at earlier... it doesn't seem like each database should be 1.5 gigs. Especially if there is a single table with 4 columns. It's possible that the file is somehow "bloated" with transaction data and/or tons of extra/useless indexes on the table. SQL Express has a 4 gigabyte limit. If the data can fit in a single database (and I suspect it would), then you could index the data and get some nice index seeks for the query.

With 4 columns: Date, Time, Price, & Quantity

In SQL Server, DateTime takes 8 bytes. You really only need one column for this, but let's say there are 2. Price is probably a number, so let's say 12 bytes. Quantity is probably also a number (integer?) that's 4 bytes. Each row should be no more than 8 + 8 + 12 + 4 bytes. That's 32 bytes per row. With a 4 gig limit, you should be able to store approximately 130 million rows. Of course, you would probably want to index the date and/or time columns, so that would reduce the number of rows, probably to 100 million rows. If the total combined number of rows is less than this, the storing it all in a SQL Server database is probably the right thing to do.

The downside is it will be more work to write and debug, but it shouldn't be that much more.

Downloading the sql engine and the management studio (assuming a high speed internet connection) and installing it should take about an hour, tops. Creating 12 linked servers.... another 20 minutes. Creating the the view.... 5 minutes. Running the query, hard to say.

If resumes123 took my advice, it's possible that the entire task is done already. Rolling your own code for this would take considerably more effort.

resumes123,

If you have any questions regarding my suggestion, please do not hesitate to ask.





-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
gmmastros,


Thanks for your post
I had used 9 AM to 5 PM only as an example in my earlier post.
In fact, There is data round the clock for 24 hours
There is data for every hour , minute and second
Again, for each second there could be more than one record, because this company would be selling commodities online to different vendors at the same time
So for a single time second value might have different transactions

Hence, if you combine the round the clock transactions for
all these values, you getting close to 1.5 GB of data for a month.

My total rows of all the databases included is coming to aobut 143.6 million rows and is more than 10 GB of size

I was not sure if SQL Server express handle this ?

Hence I had loaded this data into multiple MS Access database
Now I am having a problem in querying across these multiple databases

Could you kindly suggest a remedy please
 
SQL Express has a 4 GB limit. But... with my suggestion, the database would be EXTREMELY small, because you wouldn't be storing any data in it. You would only be using it as a "shell" that links the other databases together.

Alternatively, you may want to consider a "data warehouse" approach. A data warehouse is really just another database. With most transactional databases, you want to normalize the data as much as possible. With a data warehouse, the data is usually stored in a denormalized way, usually to accommodate reports.

In your case, you probably wouldn't want to use a different table structure, but....

What if you aggregated this data. Instead of 10 million rows for each table, you could store less data, with less granularity. So, instead of multiple rows per second, store sums and counts for each minute. This means you would store less rows, but you would also lose some functionality (being able to filter to the second, for example).

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
I think it is hard to make recommendations without a full set of requirements. For example so far we have a scenario of one record per sale and 3 fields: timestamp, price, and quantity. Sales seem to be aggregated to each second which sounds a bit suspicious - perhaps the timestamps simply have only 1 second resolution.

I'd wonder if we really have more fields (e.g. commodity, customer) and that in theory a customer might buy a quantity of a given commodity more than once in a second? Then it seems clear that some "seconds" might have no sales at all.

The only post-update activity described is reporting on a sequential time-range of records. But are separate figures desired by customer? By commodity? Any additional requirement might change what solution is most appropriate. You'd also want to know how often each of the access styles are required and what sorts of response time each requires.


I'm scratching my head trying to imagine how one denormalizes a database consisting of exactly one flat table (no fields that are lists or sets). I may have missed something there.


The SQL Server overhead hasn't been discussed at all. In-row overhead for rows containing simple fixed-length scalar values like these run something like one byte per field plus 6 or 7 bytes per row (depending on the SQL Server edition and version). Then you have another 256 bytes of overhead for each 256 pages of row data. Then there are other less significant levels of table overhead. Even with no indexes at all SQL Server can require around 30% storage overhead for short records like these. Adding just a primary key on one field and you can be near 80% overhead for a short-record table.
 
dilletante said:
I'm scratching my head trying to imagine how one denormalizes a database consisting of exactly one flat table (no fields that are lists or sets). I may have missed something there.

Yeah, was wondering that myself. Cause right now the simplest solution to me would be to create a new database with one table, import everything over, and now you should be able to do the rudimentary reporting that is the OP's stated goal.

I've found when the old system is so poorly designed in the first place, a rewrite often takes less time than the temporary "fix" for the current problem (not to mention the inevitable follow-up fixes).

resumes123 said:
There is data for every hour , minute and second
Again, for each second there could be more than one record, because this company would be selling commodities online to different vendors at the same time
So for a single time second value might have different transactions
If the company is so busy that they are doing sales every second of every day, it's time they invest in a real database system that can handle their requirements (i.e. get a standard version of SQL Server or other database system such as Oracle).
Trying to go the cheap route will lead to invevitable downtime and therefore lost sales.
 
Have you tried building a separate Access database that consists of all the other tables in the other databases' tables only linked instead of imported?

Then you can create access queries joining the tables to a common field to combine them.
Cut & paste the SQL to a VB6 DAO recorset feeding the datasource of a common datagrid and there you are! (edit out any subtle syntax differences that might apply like time formats & handling string parameters)

You can change the datagrid recordsource to any individual table or to the combined query depending on the circumstances.

If Access will handle one big like you have table I would think should be able to handle them all as long as they are only joined and not copied into the mdb. It really only looks as a small portion of your tables at any one time.
 
I think you guys were missing my point about denormalizing the database. Denormalizing a database involves changing the table structure. The reason for denormalizing is to make reports faster and/or easier. I mentioned denormalizing because the reason for denormalizing may help here, too. I'm not suggesting a change to the table structure. I am suggesting that the data be aggregated and stored in another database. When the data is aggregated, there will be less of it, and all 12 months should fit nicely in Access database.

From the sounds of it, the purpose of this whole project is for reporting. If resumes123 can aggregate the data so that each minute has exactly one row in the database, all 12 months combined would represent approximately 500,000 rows.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Well that makes sense. Using such a large volume of detail for "what if-ing" would probably be slow and clumsy anyway.

You'd just need a good definition of what groupings and aggregations might be needed.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top