Advice wanted: flat-file, DBI or RDBMS 3

derekJon · Dec 6, 2001

I have a friend who is a photographer and I'm helping him get his images online. He says there will eventually be 40,000 images. The image data that visitors to his web site need to search is:

1. the filename: in the format "John Doe tackles Bill Smith 101001.jpg"; and
2. the JPG caption with a fuller description of the image (generally around 200 characters).

I'm new to Perl but have so far written the code to extract the caption from images and I'm wondering how it should be stored. I want him to be able to dump images into any subdirectory that he creates and have my perl script check for any new directories/images and add them to (??? a flat-text file/DBI_File/Relational Database ???). The user then navigates the directory tree.
I have started down the path of creating a flat-file in each directory which has new entries added to it when images are placed in the directory and deleted when images are removed, but I'm concerned about performance and how I'm going to eventually write the code to search it. Does anybody have any recommendations about how I should design this?

MikeLacey · Dec 7, 2001

Derek Hi,

We use a free database called PostgreSql which seems excellent. The DBD module for it seems to be fine.

There is support for storing binary data (your images) in the database - I've not used this but have no reason to think it wouldn't work.

Mike
Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

tsdragon · Dec 7, 2001

I've used MySQL (also free) and the perl DBI interface for it and had very good results. It's also pretty easy to learn how to use if you have any experience with SQL. Tracy Dryden
tracy@bydisn.com

http://www.bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.

derekJon · Dec 9, 2001

Thanks for your suggestions.
I have some experience with mySQL so I have considered that option. However, given that the data will only be 1 table with 2 fields, I thought something like the DBI_File module might be more useful. For example, I will be writing a simple search to extract the date from the filename in the first field and was thinking it might be easier to work with the hash DBI_File module. Do you think it would perform well with a search over, say, 40,000 entries with 2 fields, each approx 200 characters?

Any suggestions on performance appreciated.

MikeLacey · Dec 11, 2001

40,000 is quite a few - if you'd said 4,000 I'd say text file, no problem - but 40,000? Hmmm

Use MySql - it's free, it's simple - even I can use it. <smile> Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

derekJon · Dec 11, 2001

Mike

That was certainly my thinking, but given that data is essentially 1 table with 2 fields, I thought that something like DBM (basically a persistant hash) might be a better option. Have you had any experience using DBM?

thanks in advance
Derek

rycamor · Dec 11, 2001

I certainly hope you're not planning to actually store the binary image data itself in the database. There is really no need. Just store the text data, and use perl to relate the text data to the actual image.

MySQL is perfectly adequate to do searches through 40,000 rows of simple text data. Or even 400,000 rows.

But your table should have 3 columns: one autoincrement ID column as a primary key, and then your filename, and the JPG description text:

Code:

CREATE TABLE photo_data (
  id INT (11) NOT NULL auto_increment,
  filename VARCHAR (255) NOT NULL default '',
  jpg_descr TEXT, 
  PRIMARY KEY  (id),
  UNIQUE KEY id (id),
  KEY filename (filename)
  ) TYPE=MyISAM;

Indexing the "filename" column should make searching for images somewhat faster. The filename can be as long as 255 characters.

The only other part you need is a Perl script which runs repeatedly, looking for new images, which will then enter the filename, parse the JPG header for the description, and enter that into the database. You might have to do some special coding to deal with filenames with spaces in them, such as "John Doe tackles Bill Smith 101001.jpg" -------------------

Current reading --

http://www.wnd.com/news/article.asp?ARTICLE_ID=25456

derekJon · Dec 11, 2001

rycamor

Nope - the only info will be the path/filename and the caption details. I'm not certain whether a database is better than a DBM in this instance. A DBM allows quick pattern matching across the entries in the hash without the overhead of having to pull the info out of a database and then perform the search. My only concern with the DBM is how it will cope with 40,000 entries.

Does anybody have any experience with using DBM's such as the Berkley DB?

Derek

rycamor · Dec 11, 2001

If you use DBI/MySQL, you don't really have to "pull the info out of the database" before performing your search.

The whole point of an SQL database is that you only pull the subset that you are searching for into your programming environment. Thus, you don't "SELECT * FROM photo_data", and perform your search inside Perl; You do "SELECT * FROM photo_data WHERE {condition}", and receive a relatively small hash of records into Perl, where you can then do any additional manipulating you want.

Trust me, MySQL is made for this purpose. It will pull complex conditional searches from your table in milliseconds. I know that Perl has more pattern-matching capability than anything, but it doesn't have anything like MySQL's speed. Use MySQL to at least reduce your searchable recordset to a manageable size, and only use Perl for the fine-tuning.

I'm willing to bet you would be quite surprised at the conditional and pattern-matching capability of MySQL, though.

My the way, MySQL can now include the Berkely DB table type, as well as InnoDB table types, so you can have transactions and (with InnoDB) Foreign Keys. -------------------

Current reading --

http://www.wnd.com/news/article.asp?ARTICLE_ID=25456

MikeLacey · Dec 12, 2001

DBM's a fine way of doing some things. You can search for a specific record quite nicely. It's designed so that you a find a record very cheaply.

It's not good for searches that might pick up more than one record or for searches involving a 'wild card' (like '*' or '%').

DBM works in exactly the same way (to you or me that is) as manipulating a hash. Imagine having to search through a 40,000 elemement hash.... ick - don't go there.

Rycamor is quite right. My advice is also to use MySQL so that you can say stuff like:

Select * from table where description like '%Mike%';

And all of the work is passed to the database process (which will be *faster* than any perl script you or I could write) Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

derekJon · Dec 12, 2001

Mike

Thanks for your suggestion - that is exactly what I was interested in. I guess I was a little reluctant to move away from the DBM because of the ease of updating and removing entries. My perl script iterates through all subdirectories under a base directory building an array of JPG's and comparing it to the DBM file. If the JPG file name does not exist in the DBM file, I extract the IPTC metadata embedded in the JPG image and add it to the DBM hash. If the DBM file contains a reference to a file that does not exist in the array of JPG's, I delete it from the DBM hash. Using the hash made this very easy - pushing new entries into the DBM and popping them out where the file no longer exists (perhaps 1000's of updates each time the script is run). However this simplicity is really secondary to the speed of searching - so mySQL it is.

Thanks for all your help! While it is relatively easy to pick up the sytax of Perl, it is great to have people to bounce the bigger picture issues off.

Derek

Japh · Dec 12, 2001

Sorry, just one question:

ok, I have all the jpg's from MySql (I also think MySql must be the fastest aproach) but I need to throw them to the client browser and I need to know it's size, so I can put the convenient width and height atributes in the <img> tag in html. How can Perl get the width and height of the image without having to store that previously in the db (that would be two columns more for nothing).

Pimt · Dec 12, 2001

Hi DerekJon,

When you are using a Database with so many items in there I would suggest the following.
Create enough indexes. Besides the title also Sport, politics etc etc. Hardly anyone needs all pictures. Beside that you can divide that 1 table into several tables. Say Give them a unique number and only store 5000 pictures in one table. Automaticly create a new table when the previous is full. That way you will keep your database fast. Ofcourse the retrieving of the pictures will become more difficult. But hee you wanted speed not comfort.
Pim.

tsdragon · Dec 12, 2001

Somewhere I found a perl subroutine that will read a jpg file and extract the size information. I may still have a copy of it around if you need it. Let me know. Tracy Dryden
tracy@bydisn.com

http://www.bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.

goBoating · Dec 12, 2001

just my $.02, but,
I would use PostgreSQL rather than MySQL. PostgrSQL is also open source software and it does several things that MySQL does not, namely, triggers, sub-selects and stored procedures.

There is nothing about PostgreSQL that makes it any harder to setup and use than MySQL. There is a Perl DBD Module available via CPAN or symbolstone.org.

Good Luck If you are new to Tek-Tips, please use descriptive titles, check the FAQs,
and beware the evil typo.

tsdragon · Dec 12, 2001

The main reason I would suggest MySQL over others is that more ISPs will have MySQL installed than any other database software. If you're doing it on your own server, use whichever you like better. Tracy Dryden
tracy@bydisn.com

http://www.bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.

MikeLacey · Dec 12, 2001

the GD module allows you to say

($width,$height) = $image->getBounds()

to get height and width information out of an image

you can get GD, if you're not already using it, from

http://www.cpan.org

Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

tsdragon · Dec 13, 2001

Cool, Mike! I hadn't heard about that module either! Guess I'll have to go get it. Tracy Dryden
tracy@bydisn.com

http://www.bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.

MikeLacey · Dec 13, 2001

It's a good module, most of the Perl graphics stuff seems to be based upon it. Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Advice wanted: flat-file, DBI or RDBMS 3

Programmer

MIS

Programmer

Programmer

MIS

Programmer

Programmer

Programmer

Programmer

MIS

Programmer

Technical User

Programmer

Programmer

Programmer

Programmer

MIS

Programmer

MIS

Similar threads

Log in

Part and Inventory Search

Sponsor