Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Insert directory map into database

Status
Not open for further replies.

georgeocrawford

Technical User
Aug 12, 2002
111
0
0
GB
Hi,

My problem is this. On my computer (which can be accessed from my webserver - both machines are running OS X Server), I have a directory, 'Files', with a large number of subdirectories. I would like to recursively scan through the directory, recording the name and path of each file encountered. I would then like to enter the details for each file into a MySQL database in such a way that I can use a php script to graphically display a directory browser in a web page.

'Files' should be scanned periodically (i.e. with a cron job launching the script) and the database updated - preferably only with respect to the changed files (i.e. those moved, added or deleted since last scan).

I can't get my head round how to do this. My thinking so far is to get a php script to invoke a shell script (perl?) to scan the directory, and produce a text file with the resulting directory tree. The php script will the parse the text file, and somehow enter the details in a logical way into the database.

Problems -

1 - I don't know what the fastest method would be for this, or even whether I need a shell script or a text file at all - would php's file functions be as fast as shell? (see discussion at Thread434-682722 - How can the search be stripped of all information except that relating to files moved, added or deleted since the last scan?

3 - I can't figure out the best way to record the path of each file in a database. How about two tables - FOLDERS with fields called 'Folder Name', 'Folder id' and 'Parent id', and FILES with fields called 'File name' and 'Parent id'????

My most pressing question at this stage is speed. The files are on an old G3 Mac, so I want the search to be as fast and processor-friendly as possible.

Thanks for all your help!

______________________

George
 
are you sharing all your files on the internet? seems like a lot of work just to use an alternative to the existing network file system.

make a php script that scans each directory and the files, compares to a database for the file date/time and updates as nesecarry. then, make a cron job that runes the php script periodically.

i dont see why the php file system functions would be significantly slower than running some other kind of script.
 
oh yeah, and for the mysql, the basic tree structure of "id, name, parent_id, date" works fine for directories, and the files should include "id, name, folder_id, date".

and you will have to include the code in your php script to compare file / folder lists with the database and discover anything that has been moved or deleted.
 
Sounds like a roundabout way to reinvent the FTP protocol.

I'd root around in the various web based MP3 software and see if they have a solution you can use.

bv
 
I'll explain:

I'm running a Hotline server from one Mac, and a webserver from the other. My webserver includes a php-based forum for the Hotline community, and I'd like users to be able to browse and search the Hotline file list from within the web-based forum. There won't be any downloading of these files via http:// of ftp://, only from the Hotline server as normal.

miahmiah900 - by date in your posts do you mean the modification date of the file, or the date the record was added into the MySQL database?

I'm thinking of doing the directory scan as a shell script which will produce a text file of the file index, and compare it with the previous scan using a UNIX, Perl or C function. I am guessing this would be faster than php - am I wrong?

______________________

George
 
It depends on the *nix command or cfunction.

Use of the "find" command will probably be faster than PHP or perl. You'd be running compiled code instead of interpreted code.


To touch on the database schema question, your schema really depends on how intend to use the data.

One way is to record the filename and parent item for everything in the database. For a directory layout that looks like:

[tt]main/
|
-------subdir1/
|
--------subdir2
| |
| |-------file1
| |-------file2
|
|-------subdir3
| |
| --------file3
|
|-------file4
[/tt]

The database might look like:

[tt]objectID objectname objecttype parent
1 subdir1 D 0
2 subdir2 D 1
3 file1 F 2
4 file2 F 2
5 subdir3 D 1
6 file3 F 5
7 file4 F 1[/tt]

Now, to retrieve the path of any object, you'd have to find that object and recursively fetch each of its parents in turn.

You could just use a flat system, where the full path is recorded for any item, or a combination.

Want the best answers? Ask the best questions: TANSTAAFL!!
 
I think the first directory structure is what I had in mind.

How about this for the directory-scan code:

Code:
$shell = `find /path/to/files_directory -type f -print | sort > /path/to/new.txt`;

$deleted = `comm -23 old.txt new.txt`;
$added = `comm -13 old.txt new.txt`;

$shell = `mv -f /path/to/new.txt /path/to/old.txt`;

any comments?

One question - I'm a bit confused by the backtick operator (which I know is the same as shell_exec). Could I do :

Code:
`mv -f /path/to/new.txt /path/to/old.txt`;

instead of:

Code:
$shell = `mv -f /path/to/new.txt /path/to/old.txt`;

as I don't require any output in my script?

Thanks guys!

______________________

George
 
If you don't need to capture output, there is no point in assigning the return of the backtick to a variable.


That shell stuff should work. Just keep in mind that the "-type f" part of the find command will instruct find to only list files.

Want the best answers? Ask the best questions: TANSTAAFL!!
 
question to sleipnir214:

With your proposed directory structure, writing the php to generate a visual 'file browser' seems to be quite easy. When the contents of subdir1 are displayed, for example, the icon for subdir 2 will link to something like "script.php?action=browse&parent=2" which starts a MySQL SELECT command and returns files 1 and 2.

However, what about the process of adding/deleting records in the database?

If I use my script as shown above, I'll get two strings, $deleted and $added, composed of paths to the files.

Say I get this result:

Code:
$added == 'main/subdir1/subdir3/file4'

I guess the first step is to split up the path into directories:

Code:
$added_array = explode('/', $added);

What's the best way to find the id of the main/subdir1/subdir3 directory, which I need for my MySQL INSERT command?

Can I combine all this (MySQL SELECT to find parent id, then MySQL INSERT to add record for this file) into one MySQL statement?

______________________

George
 
you wrote:

Just keep in mind that the "-type f" part of the find command will instruct find to only list files.


I need to get the directories too for this kind of database structure, don't I...

Is this better:

Code:
`find /path/to/files_directory -print | sort > /path/to/new.txt`;

can I change this further to only return visible files and directories?

______________________

George
 
It's all recursion.

To add a new record, you'll have to explode the listing. Then take each part of the path, looking for existing path records. When you don't find a path piece, start creating records.

To delete, start at the beginning of the array, recursively searching for each part of the path. When you get to the end of the path, start deleting backwards, out the way you entered the database.


Removing the "-type f" part will add directories to your list. I don't know whether find will eliminate files and directories beginning with "." or not. You may have to do this in your script. find does support regular expressions -- but getting the expression right that will eliminate dot-files may be tough.


Want the best answers? Ask the best questions: TANSTAAFL!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top