Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Storing Pics in blobs.. speed?

Status
Not open for further replies.

Dustman

Programmer
May 7, 2001
320
US
Has anybody ever benchmarked retrieving jpgs from mysql? I have thousands of pictures that are served up on websites all over the place.. currently I'm just storing the links. I'm looking into storing them all in blog fields. I'm not worried about the actual programming, I know what it takes to do it.. I just want to know if serving up the data would be faster or not.

-Dustin
Rom 8:28
 
I've asked this question just recently as well (not here, but at my job). The response I got was to generally NOT store images as blobs, as it will significantly increase server load. It is by far more efficient for the server to read from a file than to do a query to return a blob, and then for a scripting language (like PHP) to turn that into an image. The only instances they recommended to do that was if the images were stored in a seperate location, or if you didn't have any filespace available.

To summarize: I got a big NOOOOOO! as a response :). (Although I do kind of like the idea of having all the pics neatly packaged away.)

Take Care,
Mike
 
Yeah.. I've kinda had the same response from people that I've talked to but I've yet to have anybody say they've actually tried it. I'm not too concerned about server load either. I can run MySQL on a duel xeon 2.4 with 2 gig of ram.. thats all that machine will do is process mysql so who cares. I guess the question should be, is mysql faster at reading its tables than linux is at reading its filesystem? I think the answer would be yes because of indexing.

-Dustin
Rom 8:28
 
Well, with Windows XP, it indexes its filesystem as well, if I'm not mistaken. What you would have to do is write a little script that times the loading of the page from start to end. On one page you load the same image from the filesystem, on another page you load the same image from the database. Then compare load times, and you have your answer. :) Please share the results, I'd be most interested.

Check this link for some tips on measuring load time with PHP:
Take Care,
Mike
 
Yeah.. I figured I'd have to do my own benchmarking.. I was just hoping to compare results. I'll try it out sometime in the next week or so and post all my results. For now I'll be using an apple xserve, a windows 2000 server, and probably a redhat 9 machine. I'll post all the server details too.

-Dustin
Rom 8:28
 
NO.
u have it right.
refer to image location and never store them in DB. [period]

or u will face your customers with:
"Our DB is FULL and we cant store any mroe records....and there is only 2500 records...."

maybe those 800x800 images or PDFs that are "embeded" in DB could have caused it?! :)

as far as speed goes (in general) it is fater to pull a text reference string "G:\folder\image\image1.jpg"
then a xxx Kb of an image...and most definetly indexing on the fileserver should be on....
my 5 cents is in the bucket.
All the best!

> need more info?
:: don't click HERE ::
 
Well lebisol.. so far every test I've ran has disproved your last statement. I'll post all the results in a week or two when I'm done testing. I don't know what DB you normally use but if its getting full.. sheesh.. get mysql. I've got plenty of tables with well over a gig of data in them and have absolutely no problem. Some of them with millions of records too. You wanna see a good test, dump the US deathmaster db into mysql :).

As far as initial results, I'm finding them to be about the same speed. PHP overhead is increased and the execution time is longer but the actual time for the page to load is actually faster on the mysql version. My next few phases of testing will be on caching. I've got mysql caching turned on right now which speeds things up quite a bit. I'm not too sure how much is being cached on the filesystem side right now.

I'll post more info as I do it.

-Dustin
Rom 8:28
 
Okay, here's some real data. The two scripts are identical except for how the image is actually loaded. One passes an actual link to the img tag and the other passes a script that pulls the image out of mysql. Each test randomly pulled 200 picture names out of the mysql table then displayed a page with all of those pictures. The average size of each picture is 58k.

Test 1 and 2 were both on the same server. I just changed the entire folder from my scsi drive to my raid-5 drive and ran the scripts again. Currently, there are 16351 Pictures both in the database and on the filesystem (the same pics) and my mysql table is just over 1 gb.

TEST RESULTS

Microsoft Windows 2000 Server
Version 5.0.2195 Service Pack 3 Build 2195
Processor 1: Intex x86 Family 6 Model 11 Stepping 1 GuineIntel ~1396 Mhz
Processor 2: Intex x86 Family 6 Model 11 Stepping 1 GuineIntel ~1396 Mhz
Total Physical Memory 2,096,616KB
Available Physical Memory 1,592,520KB
Total Virtual Memory 6,132,488KB
Available Virtual Memory 5,269,308KB
Page File Space 4,035,872KB

Test 1
"SEAGATE ST336706LW SCSI Disk Device","PHP TIME","PAGE TIME"
"SEAGATE ST336706LW SCSI Disk Device",".00667","8.516"
"SEAGATE ST336706LW SCSI Disk Device",".00647","8.672"
"SEAGATE ST336706LW SCSI Disk Device",".00664","6.984"
"SEAGATE ST336706LW SCSI Disk Device",".00637","6.75"
"SEAGATE ST336706LW SCSI Disk Device",".00650","7.625"

"MYSQL 4.0.16-NT","PHP TIME","PAGE TIME"
"MYSQL 4.0.16-NT",".53605","7.078"
"MYSQL 4.0.16-NT",".54538","6.953"
"MYSQL 4.0.16-NT",".55737","7.328"
"MYSQL 4.0.16-NT",".54073","6.594"
"MYSQL 4.0.16-NT",".56006","7.516"

Test 2
"ADAPTEC RAID-5 SCSI Disk Device","PHP TIME","PAGE TIME"
"ADAPTEC RAID-5 SCSI Disk Device",".00644","8.984"
"ADAPTEC RAID-5 SCSI Disk Device",".00684","9.078"
"ADAPTEC RAID-5 SCSI Disk Device",".00642","9.688"
"ADAPTEC RAID-5 SCSI Disk Device",".00638","9.219"
"ADAPTEC RAID-5 SCSI Disk Device",".00642","8.015"

"MYSQL 4.0.16-NT","PHP TIME","PAGE TIME"
"MYSQL 4.0.16-NT",".55151","7.094"
"MYSQL 4.0.16-NT",".55048","6.891"
"MYSQL 4.0.16-NT",".54289","6.109"
"MYSQL 4.0.16-NT",".55297","6.812"
"MYSQL 4.0.16-NT",".53319","6.578"

My conclusion is that MySQL is definately the better choice. The php overhead is significantly greater but the actual page load time makes up for it. I'd much rather have them all in an easy to manage database than to rely on the OS's filesystem.

-Dustin
Rom 8:28
 
Thanks for the detailed testing Dustin!

I guess it comes down to this: you need to balance how many visitors will be accessing the site with how you want to store the pictures. If you have 100s or 1000s of visitors on your site each day accessing the pictures, I doubt that the MySQL storage solution would be a viable one, since the server resource usage would just be too great. However, if you only have 10s of visitors each day, or are developing for an in-house solution, then this might be a good solution.

But I believe the numbers speak for themselves here, just don't forget to take the number of visitors into account.

Nice work!

Take Care,
Mike
 
Yeah.. another thing that helps though is the MySQL caching. The server load will decrease as you get more of the same queries. If you build your system to pull each pic out with individual queries.. they will load faster because each one gets stored in memory... I'm about to dive into this full scale over the next few months. We'll be using a system like this for a site that gets several thousand hits a day. I'll keep updating this thread as more progress is made.

-Dustin
Rom 8:28
 
ur tests are a revelation. how did u calculate the php locad time and oage load time spartaely?



[ponder]
----------------
ur feedback is a very welcome desire
 
again, I stand behind my post that targets general approach to storing (and pulling) images from DB....
test this:
store a 100,000 records with each record having (huge image files/archives) avg. 3 images =total 1 MB...

unless YOUR_sql has some mega-compression rates...hm hm hm....u can't "blob your DB " - Regardless of your pulling speed based on hardware or connection types!
Another thing is that your setup is targetting PHP pages...which is a drop in the world of DBs...not all things run over the web and especially not over PHP-MySQL developed connection engine.....someone mentioned concurrent connections.....and you have tested this with 255 of them I assume? if not, should we just estimate and degrade your speed proportianlly with the number of possible querries and conncetions?...doesn't look so great anymore does it? :)

Not all DBs are there to store a single "thumb-like" image (58k) along with the record....
consider eg. archictecture company which needs to store 5,6,7 diffenret types of images + blue prints per record etc......
ofcourse u can turn arround and offer some other DB that is overpriced $1=1 record :) but that is not solution...

anyway....these are just my opinions derived from reading and talking to other developers....again, little tweaks (as caching or DB and pages) can help perfromance....but I would hate to be arround the "emebeded DB" when it blows up....text,date time are "easy" to recover.....try recovering a corroupted blob....it is as bad as trying to sabe a burning photo....
or I might be alone on this one :)
in any case you guys have Happy Holidays and enjoy your time off work :)
All the best!

> need more info?
:: don't click HERE ::
 
Some very interesting stuff here and my thanks to everyone who worked on it (especialy Dustman). I only ever store images in a DB (before Ingres could so such things you had to split the binary up into rows !). My belief is as soon as you use the file system you prevent scaling out in a web farm. Yes the db is not scaleable but you have to put your bottle necks somewhere. I have been involved in a system which supports in excessive og 60000 conected browsers with the images stored in Oracle, which the server is a big one it copes really well and is very fast.
 
Hi ingresman
\could u please explain why database becomes unscalable with BLOBS



[ponder]
----------------
ur feedback is a very welcome desire
 
"Yes the db is not scaleable but you have to put your bottle necks somewhere..."
Thank you!

Oracle vs. MySQL.....different levels of DB talk...I have seen Sybase 9 outperform MSSQL 2000 while over the web it was not the case.....I would still rather farm file system servers than hope that at the given moment (= your medium DB of 2+ GB or whatever is full) I can afford a "higher end" DB licence....on the other hand, hardware is not so expensive.....
anyhow, good posts...please post some more if anyone has to share your experiences as the never-ending DB search goes on... :)
All the best!

> need more info?
:: don't click HERE ::
 
Just to clarify on what I see as scale.

If your web site grows to such a state where you need more than one physical server to support it you will never know if you get to the same server on any particular requests that you have visited before. Unless you have fancy hardware you will have a random chance which server you hit. For the sake of argument I'll call these web server serverA and serverB.

Suppose that the original page that stores the image gets routed to serverA and puts the image into the local file system in a directory called /images. sometime later a page requests the image and gets routed to serverA the image is found with no problems. Next another page is requested for the same inage and gets routed to serverB. ServerB has no knowledge of the filestore that serverA has and so the image can not be found (even though a directory called /images may well exist.

It would be possible to have a common directory connected by nfs (or netbios etc) which looked local to each web server but actualy physicaly lives on ONE server. You now introduce network over heads and only one machine will get the physical storage, this is starting to look like a database solution.

The application code could record which filestore the image was located in by adding the server name eg serverA/images or serverB/images. The aplication code would have to allocate which directory got the image and also have the network infrastructrue to retrieve it e.g. HTTP or FTP etc.

This is starting to get complicated. So if we stored the images in one database all this routing and allocation would be handled by the database software e.g. mysql. Both serverA and serverB would point to a database (perhapbs on a third machine serverDB) shared around. Consider the allocation issues and network resoure required if we went to 5 web server (for example). A further issue is directory managment you cant keep on adding files into a directory for ever.

The techique of adding web servers is known as scaling-out or horizonal scaleing, you simply add more (cheap ??) servers to the farm. The alterative is to scale-up (vertical scaleing) which involves a bigger faster machine. (at some point you won't be able to get a machine big enough and fast enough, the limit if scaleability). Horizonal scaleing offers some redundancy as well.

Having decided on using a database we get to the next scale problem. we have to use a single box. We could use a fancy clustering system such as with DB2 which has shared disk shared processing etc but this is an expensive option just to get horizonal scale.

I didn't say that blobs don't scale it just that just about any database needs to verticaly scale to support increasing work loads, and often thats where the real work is done inside the SQL engine this is where the hit is.

Just as a side point if you use (the default) PHP sessions you are limited to one server as the session files are held in filestore. If you see a large farm you will have to use a database or allocate file store some other way within the session handlers.

I know this is getting off topic but I thought I'd just describer my view on scaleing and why I would nearly always choose a database to support files in a web server environment.

regards
 
Hi ingresman!
Thanx for the post, very informative!
I was wondering....whih is easier (in case of corruption)?
A- restoring DB with blobs
B- restoring reference stings to images
....having in mind that some images can not be re-created while you can re-refrence thier location! Would this hold the truth?
Thanx


> need more info?
:: don't click HERE ::
 
thread669-736326 was posted in General Database. Would make interesting reading i believe



[ponder]
----------------
ur feedback is a very welcome desire
 
Corruption is a thing to bear in mind. Im not an expert on mysql at code level some of the followong may be complerly wrong.

Most database systems let you spread you work load arroubnd differenct spindles/locations. Mysql seems to allow you to use only a single disc, because of this you are limited to the size of the partition that the server can give you, on top of that I would think that the file handling code in mysql only uses 32 bit pointers, which restricts the absolute physical size of a file to 4gb (or sometines 2gb) so you would have to spread your images around in different tables anyway. Backing up/restoring tables full of blobs should take less time that backing up individual images held in file store (backups devices like to stream) so you would be able to get back from coruptions much easier.
If I were pushed to come up with a robust solution I would use an index in the data base and server images held in file store. If is used SQLServer or Ingres to do id server from the database because it can distribute accross spindles for me.
I think you need to do a lot of soul searching here asd to your best way forward, no straight answer exists.
sorry to be vague on this. Hopefully someonw with more desing experience that me around mysql could contribute
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top