Need parsing suggestions on a LARGE text file. 1

Numbski · Jan 24, 2003

If you're familiar with MAME (

http://www.mame.net)

then you'll have an idea why I'm wanting this.

I need to grab a whole lot of information out of a datfile, and place it all into useful hashes and arrays. I'm looking for suggestions on how to go about it. PERL's a hobby for me, and while I have a couple of ideas, all of which are VERY inefficient. With this much data, I'd like to make it as efficient as possible. All that said, the first part of the text file looks like this:

Code:

clrmamepro (
	name MAME
	description &quot;MAME v0.63&quot;
	category &quot;Multi Game Arcade Emulator&quot;
	version 20030115
	author &quot;Logiqx, [URL unfurl="true"]http://www.logiqx.com/&quot;[/URL]
)

The clrmamepro is basically the name of the auditing app the file was written for. The information contained within the brackets I envision having in a hash like this:

Code:

$version_info{name}=&quot;MAME&quot;;
$version_info{description}=&quot;MAME v0.63&quot;;

etc

I can accomplish this with a series of foreach statements and splits by whitespace and newlines, but I can't help but have the feeling that there's a better way to go about it. My old QBasic days keep yelling that I should use a while loop here...

The rest of the file is a series of entries like this:

Code:

game (
	name puckmana
	description &quot;PuckMan (Japan set 2)&quot;
	year 1980
	manufacturer &quot;Namco&quot;
	cloneof puckman
	romof puckman
	rom ( name pacman.6e size 4096 crc c1e6ab10 )
	rom ( name pacman.6f size 4096 crc 1a6fb2d4 )
	rom ( name pacman.6h size 4096 crc bcdd1beb )
	rom ( name prg7 size 2048 crc b6289b26 )
	rom ( name prg8 size 2048 crc 17a88c13 )
	rom ( name chg1 size 2048 crc 2066a0b7 )
	rom ( name chg2 size 2048 crc 3591b89d )
	rom ( name pacman.5f merge pacman.5f size 4096 crc 958fedf9 )
	rom ( name 82s123.7f merge 82s123.7f size 32 crc 2fc650bd )
	rom ( name 82s126.4a merge 82s126.4a size 256 crc 3eb3a8e4 )
	rom ( name 82s126.1m merge 82s126.1m size 256 crc a9cc86bf )
	rom ( name 82s126.3m merge 82s126.3m size 256 crc 77245b66 )
)

There are quite literally THOUSANDS of game entries.

First I need to split out all the game entries to an array, then assign all the useful info into an array I suppose in a similar manner as above.

A single associative array doesn't seem like it would do the trick. It feels almost as if I need a tree of data to access it all instead of a single array. I could really use some insight on how to handle such a situation.

Sorry if I sound like I'm dumping the task off, I'm trying to work it out for myself, but there's a dimension to this that I'm having a hard time bending my brain to. I'm used to writing much simpler scripts...this is a big one for me.

This information will be compared against zipfiles to check crc's and file sizes. Here's what the lines mean

name - the name of the zipfile. In my example, puckmana would be puckmana.zip

description - The real title of the game.

year - when the game was manufactured

manufacturer - who made it

cloneof - These files need files from another set. What is that set?

romof - I'm pretty sure this is always the same as above, need to check on this.

rom () - This contains the names and information for each individual file within the zipfiles I'll be checking. I need to parse this into the array as well.

Ugh, lot of information, isn't it. The suggestions said NOT to braindump. **sigh** help?

MikeLacey · Jan 25, 2003

while(<DATA>){
next if /$\s*$/;
last if /$\s*$/;
chomp;
# $version_info{
s/^\s+//; # delete leading spaces
/\s+/; # find first whiespace
$key = $`; # $key = everything before that whitespace
$_ = $'; # just save everything after that in $_
s/"//g; # lose the " characters, if any
$version_info{$key}=$_; # add to hash
}

__END__
clrmamepro (
name MAME
description "MAME v0.63"
category "Multi Game Arcade Emulator"
version 20030115
author "Logiqx,

http://www.logiqx.com/"

)

This will do your first section. You will, of course, need to open the real file and not use the DATA filehandle. Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

Numbski · Jan 25, 2003

Okay, working on reading your code in plain english. You've got some stuff in there I've not seen before (big shocker).

Code:

while(<DATA>){      #While we're going through your filehandle...

  next if /\(\s*$/; #next if?  match regex to ( and whitespace + $...what's the dollar sign?
  last if /\)\s*$/; #last if? same thing only match to )

Despite my lack of recognition, I'm presuming the next if and last if are there in order to limit us to a certain area. This wouldn't work on latter entries because of the structure of game( rom()). Any hints on getting around this? perhaps next if game\(, then qualify the last if against whether a matching rom\( was found?

Code:

  chomp;             #kill off trailing whitespace and newline
  s/^\s+//;  # delete leading spaces
  /\s+/;     # find first whiespace
  $key = $`; # $key = everything before that whitespace
  $_ = $';   # just save everything after that in $_
  s/&quot;//g;    # lose the &quot; characters, if any
  $version_info{$key}=$_; # add to hash
}

[\code]

MikeLacey · Jan 25, 2003

Ok - I got this far but then the voices told me to stop.....

use strict;
use warnings;

my $ingame = 0;
my $key;
my $game;
my %game_info;
my ($rom_name, $game_attr, $rom_key, $rom_val);

while(<DATA>){

next unless (/^game/ or $ingame);
$ingame = 1;
next if /^game/;

if(/^\)/){
$ingame = 0;
next;
}

chomp;

unless(/rom $/){

# header information
s/^\s+//; # delete leading spaces
/\s+/; # find first whiespace
$key = $`; #
if($key eq 'name'){
$game = $key;
}
$_ = $';
s/"//g;
$game_info{$game}{$key}=$_;
# print "header: $_\n";
} else {

# game details
/rom\s+\(\s+(.*?)\s+$/;
$_=$1;
print ".$_.\n";
# print "detail: $_\n";
/name\s+/; $_ = $'; # first word is 'name', so get rid
print "..$_.\n";
/\s+/; # look for more whitespace
$rom_name = $`; # value for name is before that
$_ = $'; # remove so we can move on
print "..$rom_name.$_.\n";
while($_){
/\"s+/; # find whitespace
$rom_key = $`; # $rom_key is stuff before that
$_ = $'; # remove $rom_key from $_
if(/\s+/){ # find whitespace, again
$rom_val = $`; # $rom_val is stuff before that
$_ = $'; # remove $rom_val from $_
} else {
$rom_val = $_;
$_='';
}
# print "$rom_key = $rom_val\n";
$game_info{$game}{rom}{$rom_name}{$rom_key}=$rom_val;
print ":$game, rom, $rom_name, $rom_key, $rom_val:\n";
}
}
}
Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

Numbski · Jan 25, 2003

Oh man...LOL.

You got some good voices going there man.

E-mail me an address and I'll send you a 6-pack.

Bud. Coors. Coka-Cola, you pick.

Thanks.

MikeLacey · Jan 26, 2003

Have a look at file://C:\Perl\html\lib\Pod\perldsc.html or perldoc perldsc for some good examples and explanations of hashes of hashes. The code above doesn't work but it might give you a starting point, I just don't have time to look at it properly at the moment.
Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

Numbski · Jan 27, 2003

I'd be happy to look at that file, but I'm on MacOS X, so I'm sure it doesn't exist.

I get what you mean, I'll go poking around in the docs. Thanks again.

Numbski · Jan 27, 2003

Heh, just thought of one more for you.

Let's just say I wind up with a hash full of crc values. Call it %crc for argument's sake. I'm going along through a directory full of zip files, take the first crc, then I need to find a match in %crc. Is the fastest way to do a foreach statement on %crc?

MikeLacey · Jan 27, 2003

Hi Numbski,

Hashes are good for searching if you have the right value in the hash key.

Let's say you've built the hash like this:

while ... {
$crc_for_file = crc_val_for_file($file);
$crc{$crc_for_file} = $file;
}

In this case, if you have a crc you can quickly find a file with that CRC like this.

$file = $crc{$crc};

Now - the other way around:

while ... {
$crc_for_file = crc_val_for_file($file);
$crc{$file} = $crc_for_file;
}

*Now* you can find the CRC for a file from the filename

$crc = $crc{$file};

Hope I understood your question correctly. Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Need parsing suggestions on a LARGE text file. 1

Numbski

MIS

MikeLacey

MIS

Numbski

MIS

MikeLacey

MIS

Numbski

MIS

MikeLacey

MIS

Numbski

MIS

Numbski

MIS

MikeLacey

MIS

Similar threads

Part and Inventory Search

Sponsor