Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

mySQL Literals

Status
Not open for further replies.

BigBrother

Programmer
Jul 9, 2001
10
0
0
US
I'm trying to build a mySQL database around 2 years of historical stock quotes for all tickers on nyse. The problem is some stocks use funny characters in their ticker symbols and company names (i.e., CD~ is one symbol). When I try to create table names and add records with these characters included, mySQL returns an error. I tried using the ' character to convert these to literals (i.e., CREATE TABLE 'CD~' (field_01 INT, field_02 CHAR(50)); ), but mySQL still doesn't like it. Any ideas?
 
As I understand it, MySQL supports only alpha-numeric characters and underscores in table and column names.

My real question is "Why are you trying to name tables after individual stocks?". Generally the more abstracted and normalized your database structure is, the more you can do with it. Rather than naming tables after stocks, have a key table that keeps records of individual stocks, with an id column, a name column (where your metacharacters will have no problem), and perhaps a "description" column. Then, you could have a large table that holds the histories from all your stocks, with each record connected to it's "owner" by referencing the id column in your key table. This way, you can easily manipulate statistics on more than one stock at once.

I understand that your actual data will probably be a good deal more complicated than what I describe above, but the principle is the same; use your table names for abstract categories, rather than for specific identifiers. Your identifiers should be in database records, to give you more freedom to manipulate them.
 
rycamor: Thanks for your response. I like the idea of storing the ticker symbols, company names, and descriptions in a separate table. But don't you think localizing 2 years of date, open, high, low, close data for all 3,000+ stocks in a single table would slow database operations?
 
rycamor: here's a schematic of what I think you're proposing. let me know if I'm right.

Thanks,
-Adam


tickers data
+-------------+ +-------------+ dates
| ticker_ID --+-----+-- ticker_ID | +-----------+
| symbol | | date_ID --+-----+-- date_ID |
| name | | open | +-----------+
| description | | high |
+-------------+ | low |
| close |
| volume |
+-------------+
 
Yes, that could be one way to do it. I'm not so sure about the need to keep a separate table for Date, but I'm also not too familiar with stock market data. The point is to have a one-to-many relationship between the "tickers" table and the "data" table. You can have as many other relationships as you want along with that, though. Bear in mind that MySQL does not (yet) enforce relationships through referential integrity. It's up to you in your programming methods to enforce that.

There is very little reason to worry about a performance loss with a large table. I have heard of MySQL dealing quite well with tables containing many millions of records. Definitely the benefits far outweigh any possible drawbacks.

There are also some steps you can take to optimize MySQL, if you really need to. Just search the online documentation for "optimize".

If you really need an open-source database that provides full referential integrity (Foreign Key Constraint), you should check out PostgreSQL ( It is a more difficult database to learn, but it is very advanced, offering most of the functionality of the "big boys", such as Oracle, DB2, MSSQL, etc...
 
rycamor: thanks for the advice. I took your suggestion and redesigned my tables. I also came up with two alternate designs and tested all three on mySQL using 2 years of data for 3000+ stocks on the NYSE. Here's what I came up with.

The first model is based on your suggestion. I placed the ticker symbol, company name, and description in a separate Tickers table, and grouped date, open, high, low, close, and volume in a joined data table. The only problem here, the data table ends up being 250,000,000+ records long. The structure makes designing SELECT ststements easy, but the mySQL crashed on the query.

The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, Date_03, etc. These are then joined to the Ticker's table. This works, but queries are very difficult, as each date has to be named separately in the SELECT statement. Also, importing data is difficult.

The third model is to give each stock its own table, all joined to the Tickers table. This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

I get the feeling I'm missing something obvious. Any Ideas?
 
Are you saying that for 3000 stocks, you have over 250 million unique pieces of information? Wow! Explain this to me like I am a 4-year old, or better yet, give me some sample "raw" records from your original data, so I can get a better handle on this.

I mean, can it be thought of in these terms: "On {date}, stock FOO opened at {num}, and closed at {num}, and had a high of {num} and a low of {num}, with a volume of {num}"? In other words, is it one record per stock per day? Obviously I am missing something, because to me that would mean 3000(stocks) X 730(days), or about 2 million + records.

Am I painting the wrong word picture here? Or does tickerID refer to individual stock transactions? If so, then we have to restructure things a bit.

If you do indeed have 250 million unique pieces of information, then we're going to start having some fun! At that point it is probably time to upgrade your server, with some serious memory, and maybe even consider using PostgreSQL. Have you done any optimizing of MySQL for large tables yet?

Anyway, we will solve this. I just need to get a better picture of the information.
 
You're absolutely right, I misread my calculations. It is 2.5 million records. However, This is really just a sample set of data used to test the design. When the full implementation is launched, it will hold over 30 years of historical price data from NYSE, AMEX, and NASDAQ, which will be updated daily!

It will also hold fundamental data including eps, eps 1 qtr, 2 qtr, and 3 qtr estimates + balance sheet & income statement summaries.

And this will be integrated with economic time series interest rates, GDP, National savings, Consumption expenditures, etc.

(I planned to start small)

The big idea behind this project is to create a research platform that can be used to perform advanced studies in financial market theory.

You are also right about how the records are formed. All stock data is end-of-day, so in a single spreadsheet form, each record would look like: ticker, name, date, open, high, low, close, volume, eps, etc... (i.e., ABC, Acme. Bell Company, 45.25, 47.50, 44.00, 45.50, 2345975,...). Each study would be created as a separate set of queries and calculated fields, which will be tracked and updated by the client application.

Anyway, that's about it. But of course, before additional data sets and extended data can be added, I've got to resolve these design issues and make sure that the design selected really is scaleable to the project's requirements.

I look forward to hearing what you think.
 
Whew! I'm glad you're starting small, otherwise we might be having all kinds of trouble, like crashing databases and... oh, never mind. ;-)

Well, it should be no trouble to get MySQL handling your 2+ million records, but it will take some tweaking and possibly a hardware upgrade. On the other hand, to handle your full 30 years of data, cross-indexed with all your other statistics, I think MySQL or a standard x86 server might just be a bit out of its depth. One workaround might be to make a different table for each year. That at least would provide a certain level of abstraction to the data, while distributing your data amoung more tables. It would also allow you to distribute the data among different servers more easily. Obviously this will introduce some headaches also, but it might just be worth it. (This is just an off-the-cuff idea; the problem deserves more thought, and examination of the data, etc...)

I think this sounds like a great project. I also think for something of this nature, in the final version you would be better off with PostgreSQL ( as a database. If you are not familiar with postgres, please browse the link I posted, and check out the PostgreSQL forum. MySQL is a great database, and it definitely makes things easy for fast prototyping of an application, but PostgreSQL is a more of an enterprise-class database. Other than that, you might consider Sybase, DB2, or even Oracle, but that can get expensive.

The main thing is, with the kind of data you are dealing with, you will quickly get frustrated with MySQL's limited feature set, and lack of data protection. You will want to use transactions, create views, have foreign key constraints, sub-selects, etc... all of which MySQL doesn't (yet) support, but PostgreSQL does. PostgreSQL's features would also make it easier to manage the data if we broke up the tables by year, as I described above. Also, you are going to need a Serious machine or two to handle the data. Like a couple of quad Xeons with RAID 0+1, or even an AlphaServer, or a Sun,... something like that.

I hope you don't mind me asking a couple of other questions, but I find this project fascinating: do you plan to make this available on the Web? If so, I would love to know the location. What programming language are you using (PHP,Perl,C++)? If you are interested in getting any help with this project, please feal free to email me at rick@brainscraps.com.
 
Thanks for all the good advice.

I actually got your first model with the single data table to work - it's a little slow, but works. I did this by adding a data_ID field (primary key) and placing this in a separate join_prices table along with date_ID and tickers_ID.

I started to experiment with separating the data_prices table into data_prices_1998, data_prices_1999, etc, but then I realized that when I query these tables, the open, high, low, close, volume fields from the 1999 and 2000 table are going to appear as separate columns, distinct from the 1998 table. Is there a way to merge these in the query? Do you think this will add any speed to the queries?

-Adam
<BigBrother>
 
Actually, in reading the MySQL manual a little more, I think there is an excellent way to handle this problem: use the MERGE table syntax. From what I understand, MERGE will let you create a 'mapping' of several tables, so that you can treat them as one for SELECT, DELETE, and UPDATE queries.

See
One of the example benefits they mention for the MERGE method is that it can:

&quot;Give you more speed. You can split a big read-only table based on some criteria and then put the different table part on different disks. A MERGE table on this could be much faster than using the big table. (You can, of course, also use a RAID to get the same kind of benefits.)&quot;

The whole point here is that each of your tables is only differentiated by the year. The actual column names, keys, etc... are identical. So if you make a different table for each year, you can query them individually, or you can use a MERGE to query them as one.

For full queries on all records, though, it will not theoretically be any faster than having one big table, unless you also place different tables on different physical disks. Now if you have 30 tables, I'm not saying you need 30 disks, but if you were to split them over 6 physical disks, then you have 5 tables per disk, so you should get a nice speed increase.

Of course this also requires some knowledge about how to map tables from the MySQL data directory to different disks. I am familiar with methods for doing this on Unix systems with symbolic links, but if you are using a Windows system, I am not sure if the &quot;shortcut&quot; method will do the same thing (hopefully, it's that simple).

There is also a mention of &quot;The MySQL Compressed Read-only Table Generator&quot;, which apparently can optimize queries if you are doing read-only operations.
 
I have a suggestion. Subscribe to TC2000 which has all the data you want and which can be extracted as you need it.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top