Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need assistance inserting data from complex array/hash into MySQL 2

Status
Not open for further replies.

expert01

Technical User
Jul 19, 2002
26
US
I'm having trouble figuring out how to insert data from an array into several MySQL tables. My biggest problem is I can't figure out how to get $xtvdata->{programs}->{program}->__program_name_here__ and assign the program name as `pid` in the programs table, with the data contained in the program name array as other entries in the table.

Code:
[ignore]
My perl script so far:

#!/windows/web/server/perl/bin/perl

# use module
use XML::Simple;
#use DBI;
use Data::Dumper;

#Connect to MySQL
#my $dbh = DBI->connect("DBI:mysql:adp:127.0.0.1:50388", "root", "deskport",
#                       { RaiseError => 1, AutoCommit => 0 }
#       ) || die "Error connecting to the database: $DBI::errstr\n";

# create object
$xmltv = new XML::Simple;

# read XML file
$xtvdata = $xmltv->XMLin("gold_sample.xml");

#used temporarily so I can see the output
print "Content-type: text/html\n\n";
print "<HTML><BODY>";

#dumps out the array so I can see the structure
print Dumper($xtvdata);
#foreach $p($xtvdata->{programs}->{program})
#{
#
#Get these variables somehow
#my $pid = $xtvdata->{programs}->{program}->__program_name_here__;
#my $title = $xtvdata->{programs}->{program}->__program_name_here__->{title};
#my $sth = $dbh->prepare('insert into tvguide_programs(pid, title)
#values $pid, $title)');
#$sth->execute();
#}

#used temporarily so I can see the output
print "</BODY></HTML>"
[/ignore]

Code:
[ignore]
Table makeup:

CREATE TABLE `tvguide_stations` (
  `id` varchar(12) NOT NULL,
  `name` carchar(12) NOT NULL,
  `affiliate` varchar(25) NOT NULL,
  `callsign` varchar(10) NOT NULL,
  `fccchannelnumber` varchar(8) NOT NULL,
  `name` varchar(40) NOT NULL,
  `channel` varchar(40) NOT NULL,
  `genremain` varchar(30) NOT NULL,
  `genre0` varchar(30),
  `genre1` varchar(30),
  `genre2` varchar(30),
  `genre3` varchar(30),
  `genre4` varchar(30),
  `genre5` varchar(30),
  PRIMARY KEY  (`id`)
);

CREATE TABLE `tvguide_lineups` (
  `pid` int(11) NOT NULL auto_increment,
  `device` varchar(30) NOT NULL,
  `id` varchar(12) NOT NULL,
  `location` varchar(28) NOT NULL,
  `name` varchar(100) NOT NULL,
  `postalcode` tinyint(6),
  `type` varchar(20) NOT NULL,
  `userLineupName` varchar(50),
  PRIMARY KEY  (`pid`)
);

CREATE TABLE `tvguide_programs` (
  `pid` varchar(12) NOT NULL,
  `title` varchar(120) NOT NULL,
  `subtitle` varchar(150),
  `colorcode` varchar(20),
  `description` varchar(255),
  `mpaaRating` int(11),
  `originalAirDate` varchar(10),
  `runtime` varchar(8),
  `series` varchar(12),
  `showtype` varchar(30),
  `starRating` varchar(5),
  `syndicatedEpisodeNumber` varchar(20),
  `year` tinyint(4),
  `advisory0` varchar(30),
  `advisory1` varchar(30),
  `advisory2` varchar(30),
  `advisory3` varchar(30),
  `advisory4` varchar(30),
  `advisory5` varchar(30),
  PRIMARY KEY  (`pid`)
);

CREATE TABLE `tvguide_schedules` (
  `pid` int(11) NOT NULL,
  `sid` int(11) NOT NULL,  
  `duration` varchar(8) NOT NULL,
  `time` varchar(20) NOT NULL,
  `closecaptioned` enum('true', 'false') default 'true',
  `dolby` enum('none', 'Dolby', 'Dolby Digital') default 'none',
  `hdtv` enum('true', 'false') NOT NULL default 'none',
  `partnumber` tinyint(2),
  `totalnumber` tinyint(2),
  `repeat` enum('true', 'false') default 'false',
  `stereo` enum('true', 'false') default 'false',
  `subtitled` enum('true', 'false') default 'false',
  `tvRating` enum('TV-Y', 'TV-Y7', 'TV-G', 'TV-PG', 'TV-14', 'TV-MA', 'null') default 'null',
  PRIMARY KEY  (`pid`)
);

CREATE TABLE `tvguide_production_crew` (
  `id` int(11) NOT NULL auto_increment,
  `pid` varchar(11) NOT NULL,
  `givenname` varchar(20) NOT NULL,
  `surname` varchar(20) NOT NULL,
  `role` varchar(30) NOT NULL,
  PRIMARY KEY  (`id`)
);
[/ignore]

Code:
[ignore]
Sample XML file:

<?xml version='1.0' encoding='utf-8'?>
<xtvd from="2005-06-03T00:00:00Z" to="2005-06-04T00:00:00Z" xmlns="urn:TMSWebServices" xmlns:xsi="[URL unfurl="true"]http://www.w3.org/2001/XMLSchema-instance"[/URL] xsi:schemaLocation="urn:TMSWebServices [URL unfurl="true"]http://docs.tms.tribune.com/tech/xml/schemas/tmsxtvd.xsd"[/URL] schemaVersion="1.3">
<stations>
<station id="14755">
<callSign>BLOOM</callSign>
<name>Bloomberg Television</name>
<affiliate>Satellite</affiliate>
</station>
<station id="10142">
<callSign>CNN</callSign>
<name>Cable News Network</name>
<affiliate>Satellite</affiliate>
</station>
<station id="17125">
<callSign>ENCOREW</callSign>
<name>ENCORE  - Encore (Pacific)</name>
<affiliate>PAY</affiliate>
</station>
</stations>
<lineups>
<lineup id="DITV602:-" name="DIRECTV Chicago" location="Chicago" type="Satellite" postalCode="60611">
<map station="10142" channel="202" from="2000-06-26"></map>
<map station="14755" channel="353" from="2000-06-26"></map>
<map station="17125" channel="527" from="2000-06-26"></map>
</lineup>
</lineups>
<schedules>
<schedule program="SH6434270000" station="14755" time="2005-06-03T01:00:00Z" duration="PT03H00M"></schedule>
<schedule program="SH5789380000" station="14755" time="2005-06-03T04:00:00Z" duration="PT03H00M"></schedule>
<schedule program="SH6063000000" station="10142" time="2005-06-03T00:00:00Z" duration="PT01H00M" closeCaptioned="true"></schedule>
<schedule program="SH4058630000" station="10142" time="2005-06-03T02:00:00Z" duration="PT01H00M" closeCaptioned="true"></schedule>
<schedule program="MV0318610000" station="17125" time="2005-06-03T01:00:00Z" duration="PT02H00M" stereo="true" closeCaptioned="true"></schedule>
<schedule program="MV1400930000" station="17125" time="2005-06-03T03:00:00Z" duration="PT02H30M" stereo="true" closeCaptioned="true"></schedule>
<schedule program="EP0809550108" station="11097" time="2005-06-03T21:00:00Z" duration="PT01H00M" tvRating="TV-14" stereo="true" closeCaptioned="true">
<part number="2" total="2"></part>
</schedule>
</schedules>
<programs>
<program id="EP0014574180">
<title>Entertainment Tonight</title>
<description>Actors Russell Crowe and Renée Zellweger ("Cinderella Man"); celebrity splurges.</description>
<showType>Series</showType>
<series>SH001457</series>
<syndicatedEpisodeNumber>6189</syndicatedEpisodeNumber>
<originalAirDate>2005-06-02</originalAirDate>
</program>
<program id="EP0039850080">
<title>Star Trek</title>
<subtitle>The Cage</subtitle>
<description>When the Enterprise answers a distress call, Capt. Pike (Jeffrey Hunter) encounters manipulative aliens. Original series pilot.</description>
<showType>Series</showType>
<series>SH003985</series>
<colorCode>Color and B &amp; W</colorCode>
<originalAirDate>1966-11-17</originalAirDate>
</program>
<program id="EP0191100009">
<title>The Twilight Zone</title>
<subtitle>Perchance to Dream</subtitle>
<description>A man with recurring nightmares fears sleep.</description>
<showType>Series</showType>
<series>SH019110</series>
<colorCode>B &amp; W</colorCode>
<syndicatedEpisodeNumber>7609</syndicatedEpisodeNumber>
<originalAirDate>1959-11-27</originalAirDate>
</program>
<program id="EP2254210069">
<title>Stargate SG-1</title>
<subtitle>Crystal Skull</subtitle>
<description>Daniel vanishes during an exploration.</description>
<showType>Series</showType>
<series>SH225421</series>
<advisories>
<advisory>Adult Situations</advisory>
<advisory>Violence</advisory>
</advisories>
<syndicatedEpisodeNumber>3021</syndicatedEpisodeNumber>
<originalAirDate>2000-03-03</originalAirDate>
</program>
</programs>
<productionCrew>
<crew program="EP0039850080">
<member>
<role>Actor</role>
<givenname>William</givenname>
<surname>Shatner</surname>
</member>
<member>
<role>Actor</role>
<givenname>Leonard</givenname>
<surname>Nimoy</surname>
</member>
</crew>
<crew program="EP0191100009">
<member>
<role>Actor</role>
<givenname>Richard</givenname>
<surname>Conte</surname>
</member>
</crew>
</productionCrew>
<genres>
<programGenre program="EP0014574180">
<genre>
<class>Talk</class>
<relevance>0</relevance>
</genre>
<genre>
<class>Entertainment</class>
<relevance>1</relevance>
</genre>
</programGenre>
<programGenre program="EP0039850080">
<genre>
<class>Science Fiction</class>
<relevance>0</relevance>
</genre>
<genre>
<class>Fantasy</class>
<relevance>1</relevance>
</genre>
</programGenre>
<programGenre program="EP0191100009">
<genre>
<class>Science Fiction</class>
<relevance>0</relevance>
</genre>
<genre>
<class>Fantasy</class>
<relevance>1</relevance>
</genre>
<genre>
<class>Suspense</class>
<relevance>2</relevance>
</genre>
</programGenre>
</genres>
</xtvd>
[/ignore]
 
I'm not going to be much help with the DB/MySQL portion of this, so somoene else might be able to help you out with that.

As far as parsing the xml file, hopefully this will give you a start:
Code:
use strict;
use warnings;
use XML::Simple;

my $xs = new XML::Simple;
my $xtvdata = $xs->XMLin('gold_sample.xml');

foreach my $pid (sort keys %{$xtvdata->{'programs'}->{'program'}}) {
    my $ref = \%{$xtvdata->{'programs'}->{'program'}->{$pid}};
    print "PID: $pid\n";
    print "\tName: $ref->{'title'}\n";
    print "\tSubtitle: $ref->{'subtitle'}\n" if $ref->{'subtitle'};
    print "\tFirst Aired: $ref->{'originalAirDate'}\n";
}
Which yeilds the output:
Code:
PID: EP0014574180
	Name: Entertainment Tonight
	First Aired: 2005-06-02
PID: EP0039850080
	Name: Star Trek
	Subtitle: The Cage
	First Aired: 1966-11-17
PID: EP0191100009
	Name: The Twilight Zone
	Subtitle: Perchance to Dream
	First Aired: 1959-11-27
PID: EP2254210069
	Name: Stargate SG-1
	Subtitle: Crystal Skull
	First Aired: 2000-03-03
 
That is quite a help (seems no matter what language I try, I just can't work with arrays very well). I should be able to use this code to do the necessary MySQL queries (wonder if MySQL will take 40,000 queries in a couple minutes...).
 
I've got the basics down, it now inserts the entire Programs table. Just have to add the other tables and I'll be set :D BTW, this uses data from Zap2It (labs.zap2it.com) to populate a MySQL DB with TV listings. There is a perl script available there that downloads the listings XML File for your zip code if you (anyone reading this) would like to do something with this script. Beware, the XML file you get from them is about 30MB.

Code:
[literal]
#!/windows/web/server/perl/bin/perl

# Modules used
use XML::Simple;
use DBI;
#use strict;
use warnings;

#Connect to MySQL
my $dbh = DBI->connect("DBI:mysql:adp:127.0.0.1:50388", "root", "deskport",
                       { RaiseError => 1, AutoCommit => 0 }
       ) || die "Error connecting to the database: $DBI::errstr\n";

# Create new XML object
$xs = new XML::Simple;

# read XML file
$xtvdata = $xs->XMLin("gold_sample.xml");

# Remove old TV listings
$dbh->do('DELETE FROM tvguide_programs WHERE 1');

# Generate beginning of query
my $query = 'INSERT INTO tvguide_programs (pid, showtype, series, ';
$query .= 'title, subtitle, description, mpaaRating, starRating, ';
$query .= 'originalAirDate, runtime, syndicatedEpisodeNumber, year, ';
$query .= 'colorcode, advisory0, advisory1, advisory2, advisory3, ';
$query .= 'advisory4, advisory5) VALUES ';

foreach my $pid (sort keys %{$xtvdata->{'programs'}->{'program'}})
{
    my $ref = \%{$xtvdata->{'programs'}->{'program'}->{$pid}};
    my $pid2 = $dbh->quote($pid);
    my $showType = "''";
    $showType = $dbh->quote($ref->{'showType'}) if $ref->{'showType'};
    my $series = "''";
    $series = $dbh->quote($ref->{'series'}) if $ref->{'series'};
    $title = $dbh->quote($ref->{'title'});
    my $subtitle = "''";
    $subtitle = $dbh->quote($ref->{'subtitle'}) if $ref->{'subtitle'};
    my $description = "''";
    $description = $dbh->quote($ref->{'description'}) if $ref->{'description'};
    my $mpaaRating = "''";
    $mpaaRating = $dbh->quote($ref->{'mpaaRating'}) if $ref->{'mpaaRating'};
    my $starRating = "''";
    $starRating = $dbh->quote($ref->{'starRating'}) if $ref->{'starRating'};
    my $originalAirDate = "''";
    $originalAirDate = $dbh->quote($ref->{'originalAirDate'}) if $ref->{'originalAirDate'};
    my $runtime = "''";
    $runtime = $dbh->quote($ref->{'runtime'}) if $ref->{'runtime'};
    my $syndicatedEpisodeNumber = "''";
    $syndicatedEpisodeNumber = $dbh->quote($ref->{'syndicatedEpisodeNumber'}) if $ref->{'syndicatedEpisodeNumber'};
    my $year = "''";
    $year = $dbh->quote($ref->{'year'}) if $ref->{'year'};
    my $colorCode = "''";
    $colorCode = $dbh->quote($ref->{'colorCode'}) if $ref->{'colorCode'};
    my $advisory0 = "''";
    my $advisory1 = "''";
    my $advisory2 = "''";
    my $advisory3 = "''";
    my $advisory4 = "''";
    my $advisory5 = "''";
    my $count = 0;
foreach my $adv (@{$xtvdata->{'programs'}->{'program'}->{$pid}->{'advisories'}->{'advisory'}})
{
	$advisory0 = $dbh->quote($adv) if $count == 0;
	$advisory1 = $dbh->quote($adv) if $count == 1;
	$advisory2 = $dbh->quote($adv) if $count == 2;
	$advisory3 = $dbh->quote($adv) if $count == 3;
	$advisory4 = $dbh->quote($adv) if $count == 4;
	$advisory5 = $dbh->quote($adv) if $count == 5;
	$count++;
}
$query .= "($pid2, $showType, $series, $title, $subtitle, $description, ";
$query .= "$mpaaRating, $starRating, $originalAirDate, $runtime, ";
$query .= "$syndicatedEpisodeNumber, $year, $colorCode, $advisory0, ";
$query .= "$advisory1, $advisory2, $advisory3, $advisory4, $advisory5), \n";
}

# Add this to query because I haven't figured out how to get rid of the final ", "
$query .= "('0', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')";

$dbh->do($query);

# Delete the junk entry we added
$dbh->do('DELETE FROM tvguide_programs WHERE pid = "0"');

# Disconnect
$dbh->disconnect();

# Print to the browser so it doesn't complain
# Remove this if you use this solely from the command line
print "Content-type: text/html\n\n";
print "<HTML><BODY>";
print "Success!";
print "</BODY></HTML>";
[/literal]
 
I'm having problems now with getting the Genres. XML::Simple turns thegenre part of the XML data file into:

Code:
genres(hash)
  programgenre(array)
    genre(arrar if more than one, hash if only one)
      relevance (hash)
      class (hash)
    program (hash)

Here's the code I'm trying to use, but I don't really know what to do since the secon genre is either a hash or an array

Code:
#!/windows/web/server/perl/bin/perl

# Modules used
use XML::Simple;
use DBI;
#use strict;
use DATA::Dumper;
use warnings;
no warnings 'deprecated';
use UNIVERSAL;
#Connect to MySQL
my $dbh = DBI->connect("DBI:mysql:adp:127.0.0.1:50388", "root", "deskport",
                       { RaiseError => 1, AutoCommit => 0 }
       ) || die "Error connecting to the database: $DBI::errstr\n";

# Create new XML object
my $xs = '';
$xs = new XML::Simple;

# read XML file
my $xtvdata = '';
$xtvdata = $xs->XMLin("gold_sample.xml");


# Genre hash
[blue]
my $genre;
%$genre = ('Adventure', 'Action/Adventure', 
'Action', 'Action/Adventure', 
'Adults only', 'Adults only', 
'Animated', 'Animation', 
'Anime', 'Animation', 
'Arts/crafts', 'Arts', 
'Ballet', 'Arts', 
'Performing arts', 'Arts', 
'Theater', 'Arts', 
'Dance', 'Arts', 
'Art', 'Arts', 
'Opera', 'Arts', 
'Cooking', 'Arts', 
'Children', 'Children', 
'Children-music', 'Children', 
'Children-special', 'Children', 
'Children-talk', 'Children', 
'Comedy-drama', 'Comedy', 
'Comedy', 'Comedy', 
'Standup', 'Comedy',
'Interview', 'Documentary', 
'Docudrama', 'Documentary', 
'Documentary', 'Documentary', 
'Drama', 'Drama', 
'Suspense', 'Drama', 
'Crime drama', 'Drama', 
'Mystery', 'Drama', 
'Crime', 'Drama', 
'Anthology', 'Educational', 
'Consumer', 'Educational', 
'Biography', 'Educational', 
'Debate', 'Educational', 
'Environment', 'Educational', 
'Educational', 'Educational', 
'Entertainment', 'Entertainment', 
'Game show', 'Game Show', 
'Reality', 'Game Show', 
'Historical drama', 'History', 
'History', 'History', 
'Holiday special', 'Holiday', 
'Holiday music special', 'Holiday', 
'Holiday-children', 'Holiday', 
'Holiday music', 'Holiday', 
'Holiday-children special', 'Holiday', 
'Holiday', 'Holiday', 
'Horror', 'Horror', 
'Ringuette', 'International', 
'Pelota vasca', 'International', 
'Spanish', 'International', 
'French', 'International', 
'Music talk', 'Music', 
'Music special', 'Music', 
'Music', 'Music', 
'Musical comedy', 'Musical', 
'Musical', 'Musical',
'Miniseries', 'Miniseries', 
'Parade', 'Misc', 
'Awards', 'Misc', 
'Extreme', 'Misc', 
'Auction', 'Misc', 
'Community', 'Misc',
'Collectibles', 'Misc', 
'Misc', 'Misc', 
'Agriculture', 'Health/Nature', 
'Nature', 'Health/Nature', 
'Animals', 'Health/Nature', 
'House/garden', 'Health/Nature', 
'Health', 'Health/Nature', 
'Horse', 'Health/Nature', 
'Medical', 'Health/Nature', 
'Dog show', 'Health/Nature', 
'Parenting', 'Health/Nature', 
'Self improvement', 'Health/Nature', 
'Bus./financial', 'News', 
'News', 'News', 
'Newsmagazine', 'News', 
'Public affairs', 'News', 
'Talk', 'News', 
'Politics', 'News', 
'Religious', 'Religious', 
'Romance', 'Romance', 
'Romance-comedy', 'Romance', 
'Computers', 'Science/Technology', 
'Science', 'Science/Technology', 
'Aviation', 'Science/Technology', 
'Fantasy', 'Science Fiction', 
'Scienve fiction', 'Science Fiction', 
'Sitcom', 'Sitcom', 
'Soap special', 'Soaps', 
'Soap talk', 'Soaps', 
'Soap', 'Soaps', 
'Special', 'Special', 
'Event', 'Special', 
'Home improvement', 'Special Interest', 
'Shopping', 'Special Interest', 
'Variety', 'Special Interest', 
'Travel', 'Special Interest', 
'War', 'Special Interest', 
'Gay/lesbian', 'Special Interest', 
'Fundraiser', 'Special Interest', 
'Skeleton', 'Special Interest', 
'Paranormal', 'Special Interest', 
'How-to', 'Special Interest', 
'Law', 'Special Interest', 
'Fashion', 'Special Interest', 
'Aerobics', 'Sports', 
'Archery', 'Sports', 
'Arm wrestling', 'Sports', 
'Auto', 'Sports', 
'Auto racing', 'Sports', 
'Badminton', 'Sports', 
'Baseball', 'Sports', 
'Basketball', 'Sports', 
'Beach soccer', 'Sports', 
'Beach volleyball', 'Sports', 
'Biathlon', 'Sports', 
'Bicycle', 'Sports', 
'Bicycle racing', 'Sports', 
'Billiards', 'Sports', 
'Boat', 'Sports', 
'Boat racing', 'Sports', 
'Bobsled', 'Sports', 
'Bodybuilding', 'Sports', 
'Bowling', 'Sports', 
'Boxing', 'Sports', 
'Bullfighting', 'Sports', 
'Canoe', 'Sports', 
'Cheerleading', 'Sports', 
'Cricket', 'Sports', 
'Curling', 'Sports', 
'Darts', 'Sports', 
'Diving', 'Sports', 
'Dog racing', 'Sports', 
'Dog sled', 'Sports', 
'Drag racing', 'Sports', 
'Equestrian', 'Sports', 
'Exercise', 'Sports', 
'Fencing', 'Sports', 
'Field hockey', 'Sports', 
'Figure skating', 'Sports', 
'Fishing', 'Sports', 
'Football', 'Sports', 
'Gaelic football', 'Sports', 
'Golf', 'Sports', 
'Gymnastics', 'Sports', 
'Handball', 'Sports', 
'Hockey', 'Sports', 
'Hunting', 'Sports', 
'Hurling', 'Sports', 
'Hydroplane racing', 'Sports', 
'Indoor soccer', 'Sports', 
'Intl basketball', 'Sports', 
'Intl hockey', 'Sports', 
'Intl soccer', 'Sports', 
'Kayaking', 'Sports', 
'Lacrosse', 'Sports', 
'Luge', 'Sports', 
'Martial arts', 'Sports', 
'Motorcycle', 'Sports', 
'Motorcycle racing', 'Sports', 
'Motorsports', 'Sports', 
'Mountain biking', 'Sports', 
'Olympics', 'Sports', 
'Outdoors', 'Sports', 
'Playoff sports', 'Sports', 
'Polo', 'Sports', 
'Pool', 'Sports', 
'Pro wrestling', 'Sports', 
'Racquet', 'Sports', 
'Rodeo', 'Sports', 
'Roller derby', 'Sports', 
'Rowing', 'Sports', 
'Rugby', 'Sports', 
'Running', 'Sports', 
'Sailing', 'Sports', 
'Shooting', 'Sports', 
'Skateboarding', 'Sports', 
'Skating', 'Sports', 
'Skiing', 'Sports', 
'Snooker', 'Sports', 
'Snowboarding', 'Sports', 
'Snowmobile', 'Sports', 
'Soccer', 'Sports', 
'Softball', 'Sports', 
'Speed skating', 'Sports', 
'Sports event', 'Sports', 
'Sports non-event', 'Sports', 
'Sports talk', 'Sports', 
'Squash', 'Sports', 
'Sumo wrestling', 'Sports', 
'Surfing', 'Sports', 
'Swimming', 'Sports', 
'Table tennis', 'Sports', 
'Tennis', 'Sports', 
'Track/field', 'Sports', 
'Triathlon', 'Sports', 
'Volleyball', 'Sports', 
'Water polo', 'Sports', 
'Water skiing', 'Sports', 
'Watersports', 'Sports', 
'Weightlifting', 'Sports', 
'Wrestling', 'Sports', 
'Yacht racing', 'Sports', 
'Weather', 'Weather', 
'Western', 'Western');
[/blue]
# Remove old TV listings
$dbh->do('DELETE FROM tvguide_programs WHERE 1');

# Generate beginning of query
my $query1 = 'INSERT INTO tvguide_programs (pid, showtype, series, ';
$query1 .= 'title, subtitle, description, mpaaRating, starRating, ';
$query1 .= 'originalAirDate, runtime, syndicatedEpisodeNumber, year, ';
$query1 .= 'colorcode, advisory0, advisory1, advisory2, advisory3, ';
$query1 .= 'advisory4, advisory5) VALUES ';

foreach my $pid (sort keys %{$xtvdata->{'programs'}->{'program'}})
{
    my $ref = \%{$xtvdata->{'programs'}->{'program'}->{$pid}};
    my $pid2 = $dbh->quote($pid);
    my $showType = "''";
    $showType = $dbh->quote($ref->{'showType'}) if $ref->{'showType'};
    my $series = "''";
    $series = $dbh->quote($ref->{'series'}) if $ref->{'series'};
    my $title = '';
    $title = $dbh->quote($ref->{'title'});
    my $subtitle = "''";
    $subtitle = $dbh->quote($ref->{'subtitle'}) if $ref->{'subtitle'};
    my $description = "''";
    $description = $dbh->quote($ref->{'description'}) if $ref->{'description'};
    my $mpaaRating = "''";
    $mpaaRating = $dbh->quote($ref->{'mpaaRating'}) if $ref->{'mpaaRating'};
    my $starRating = "''";
    $starRating = $dbh->quote($ref->{'starRating'}) if $ref->{'starRating'};
    my $originalAirDate = "''";
    $originalAirDate = $dbh->quote($ref->{'originalAirDate'}) if $ref->{'originalAirDate'};
    my $runtime = "''";
    $runtime = $dbh->quote($ref->{'runtime'}) if $ref->{'runtime'};
    my $syndicatedEpisodeNumber = "''";
    $syndicatedEpisodeNumber = $dbh->quote($ref->{'syndicatedEpisodeNumber'}) if $ref->{'syndicatedEpisodeNumber'};
    my $year = "''";
    $year = $dbh->quote($ref->{'year'}) if $ref->{'year'};
    my $colorCode = "''";
    $colorCode = $dbh->quote($ref->{'colorCode'}) if $ref->{'colorCode'};
    my $advisory0 = "''";
    my $advisory1 = "''";
    my $advisory2 = "''";
    my $advisory3 = "''";
    my $advisory4 = "''";
    my $advisory5 = "''";
    my $count = 0;
foreach my $adv (@{$xtvdata->{'programs'}->{'program'}->{$pid}->{'advisories'}->{'advisory'}})
{
	$advisory0 = $dbh->quote($adv) if $count == 0;
	$advisory1 = $dbh->quote($adv) if $count == 1;
	$advisory2 = $dbh->quote($adv) if $count == 2;
	$advisory3 = $dbh->quote($adv) if $count == 3;
	$advisory4 = $dbh->quote($adv) if $count == 4;
	$advisory5 = $dbh->quote($adv) if $count == 5;
	$count++;
}
$query1 .= "($pid2, $showType, $series, $title, $subtitle, $description, ";
$query1 .= "$mpaaRating, $starRating, $originalAirDate, $runtime, ";
$query1 .= "$syndicatedEpisodeNumber, $year, $colorCode, $advisory0, ";
$query1 .= "$advisory1, $advisory2, $advisory3, $advisory4, $advisory5), \n";
}

# Add this to query because I haven't figured out how to get rid of the final ", "
$query1 .= "('0', '', '', '0', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')";

#$dbh->do($query1);

# Delete the junk entry we added
#$dbh->do('DELETE FROM tvguide_programs WHERE pid = "0"');
print "Content-type: text/html\n\n";
print "<HTML><BODY>";
[red]
$count = 0;
foreach my $gen (@{$xtvdata->{'genres'}->{'programGenre'}})
{
	print $gen;
    my $ref = \%{$xtvdata->{'genres'}->{'programGenre'}->[$count]};
    my $program = $dbh->quote($ref->{'program'});
    my $genremain = '';
    my $genre0 = "'Misc'";
    my $genre1 = "''";
    my $genre2 = "''";
    my $genre3 = "''";
    my $genre4 = "''";
    my $genre5 = "''";

$count2 = 0;
	foreach my $rel (@{$xtvdata->{'genres'}->{'programGenre'}->[$count]->{'genre'}})
{
	#print "Dumper($rel)";
	my $ref2 = \%{$xtvdata->{'genres'}->{'programGenre'}->[$count]->{'genre'}->[$rel]};
	$genre0 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 0;
	$genre1 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 1;
	$genre2 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 2;
	$genre3 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 3;
	$genre4 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 4;
	$genre5 = $dbh->quote($ref2{'class'}) if $ref2{'relevance'} == 5;
	$count++;
}

  $genremain = $dbh->quote($genre->{$genre0}) or $genremain = "'Misc'";
    
    
my $query2 = "UPDATE tvguide_programs SET genremain=$genremain, ";
$query2 .= "genre0=$genre0, genre1=$genre1, genre2=$genre2, ";
$query2 .= "genre3=$genre3, genre4=$genre4, genre5=$genre5 WHERE pid = $program";
#$dbh->do($query2);
#print "$query2";
$count++;
}
[/red]
# Disconnect
$dbh->disconnect();

# Print to the browser so it doesn't complain
# Remove this if you use this solely from the command line

print "Success!";
print "</BODY></HTML>";
 
I meant to post this before - thanks for the star!

Now, the problem with the genres - the data structure is pretty ugly, like you alluded to. See if this helps:
Code:
foreach my $pid (sort keys %{$xtvdata->{'programs'}->{'program'}}) {
    my $ref = \%{$xtvdata->{'programs'}->{'program'}->{$pid}};
    my %program_genres;
    print "PID: $pid\n";
    print "\tName: $ref->{'title'}\n";
    foreach (@{$xtvdata->{'genres'}->{'programGenre'}}) {
        if ($_->{'program'} eq $pid) {
            foreach my $my_genre (@{$_->{'genre'}}) {
                $program_genres{"genre".$my_genre->{'relevance'}} = $my_genre->{'class'};
            }
            last;
        }
    }
    foreach (sort keys %program_genres) {
        print "\t$_: $program_genres{$_}\n";
    }
}
 
Ahh, I see your problem. This should deal with the array/hash problem.
Code:
use XML::Simple;
use LWP::Simple;

my $content = get("[URL unfurl="true"]http://docs.tms.tribune.com/tech/tmsdatadirect/zap2it/gold_sample.xml");[/URL]
my $xs = new XML::Simple;
my $xtvdata = $xs->XMLin($content);

foreach my $pid (sort keys %{$xtvdata->{'programs'}->{'program'}}) {
    my $ref = \%{$xtvdata->{'programs'}->{'program'}->{$pid}};
    my %program_genres;
    print "PID: $pid\n";
    print "\tName: $ref->{'title'}\n";
    
    foreach (@{$xtvdata->{'genres'}->{'programGenre'}}) {
        if ($_->{'program'} eq $pid) {
            if (ref($_->{'genre'}) eq 'ARRAY') {
                foreach my $my_genre (@{$_->{'genre'}}) {
                    $program_genres{"genre".$my_genre->{'relevance'}} = $my_genre->{'class'};
                }
            } elsif (ref($_->{'genre'}) eq 'HASH') {
                my $my_genre = $_->{'genre'};
                $program_genres{"genre".$my_genre->{'relevance'}} = $my_genre->{'class'};
            }
            last;
        }
    }
    foreach (sort keys %program_genres) {
        print "\t$_: $program_genres{$_}\n";
    }
}
 
Thanks, I've adapted that to my code (and my code to that). It works on smaller files, but on a test file of full TV listings (almost 30MB) it takes a long time (in fact, I don't know if it even completes). Is there a file size limit?
 
There shouldn't be a file size limit - are you getting an out of memory error? If you're downloading a 30 MB file using LWP::Simple code, it going to wait for the download to finish before doing the processing.

If you're using a slow connectionn it could take a while.
 
The file is downloaded (I've got another script doing that). It's parsing the XML file that is taking so long.

Script to download XML file:
Site to sign up to get XML listings: Certificate Code: "ADPX-9K33-MLX8"

I must say that this is a great project for anyone who wants their own cable guide on their computer.

SQL:
Code:
CREATE TABLE `tvguide_stations` (
  `sid` varchar(12) NOT NULL,
  `affiliate` varchar(25),
  `callsign` varchar(10) NOT NULL,
  `fccchannelnumber` varchar(8),
  `name` varchar(40),
  `channel` varchar(40) NOT NULL,
  PRIMARY KEY  (`sid`)
);

CREATE TABLE `tvguide_lineups` (
  `id` varchar(12) NOT NULL,
  `device` varchar(30),
  `location` varchar(28) NOT NULL,
  `name` varchar(100) NOT NULL,
  `postalcode` tinyint(6),
  `type` varchar(20) NOT NULL,
  `userLineupName` varchar(50),
  PRIMARY KEY  (`id`)
);

CREATE TABLE `tvguide_programs` (
  `pid` varchar(12) NOT NULL,
  `showtype` varchar(30),
  `series` varchar(12),
  `title` varchar(120) NOT NULL,
  `subtitle` varchar(150),
  `description` varchar(255),
  `mpaaRating` varchar(11),
  `starRating` varchar(5),
  `originalAirDate` varchar(10),
  `runtime` varchar(8),
  `syndicatedEpisodeNumber` varchar(20),
  `year` varchar(4),
  `colorcode` varchar(20),
  `advisory0` varchar(30),
  `advisory1` varchar(30),
  `advisory2` varchar(30),
  `advisory3` varchar(30),
  `advisory4` varchar(30),
  `advisory5` varchar(30),
  `genremain` varchar(30),
  `genre0` varchar(30),
  `genre1` varchar(30),
  `genre2` varchar(30),
  `genre3` varchar(30),
  `genre4` varchar(30),
  `genre5` varchar(30),
  PRIMARY KEY  (`pid`)
);

CREATE TABLE `tvguide_schedules` (
  `id` int(11) NOT NULL auto_increment,
  `sid` int(11) NOT NULL,  
  `duration` varchar(8) NOT NULL,
  `time` varchar(20) NOT NULL,
  `closecaptioned` enum('true', 'false') default 'true',
  `dolby` enum('none', 'Dolby', 'Dolby Digital') default 'none',
  `hdtv` enum('true', 'false') NOT NULL default 'false',
  `partnumber` tinyint(2),
  `totalnumber` tinyint(2),
  `repeat` enum('true', 'false') default 'false',
  `stereo` enum('true', 'false') default 'false',
  `subtitled` enum('true', 'false') default 'false',
  `tvRating` enum('TV-Y', 'TV-Y7', 'TV-G', 'TV-PG', 'TV-14', 'TV-MA', 'null') default 'null',
  PRIMARY KEY  (`id`)
);

CREATE TABLE `tvguide_production_crew` (
  `id` int(11) NOT NULL auto_increment,
  `pid` varchar(11) NOT NULL,
  `givenname` varchar(20) NOT NULL,
  `surname` varchar(20) NOT NULL,
  `role` varchar(30) NOT NULL,
  PRIMARY KEY  (`id`)
);
 
Ahh, I've got you - any of the XML parsing modules that create an exact representation of the document take a while with big files.

If you want to speed everything up, you could take a shot at it with XML::parser but it is significantly more complicated - hence using XML::Simple. Using the Simple module is a bit of a trade off, but does your script really need to be fast?

I assume, if you're keeping a cable guide in a database, you probably don't need to run your script more than once a day (probably on some kind of schedule/cron.) If that is the case, personally, I wouldn't worry about trying to speed it up as long as the code you have now works.
 
The perl script uses up to 130MB of ram and 99% processor. I read at CPAN (
If you are working with very large XML files, XML::Simple's approach of representing the whole file in memory as a 'tree' data structure may not be suitable.

Might there be another XML function that would work better?

I've uploaded the XML file and a test script that just attempts to load the XML file and prints out the time it took.

 
This is a long way from being complete, and I'm sure it can be cleaned up, but it might give you a start with using XML::parser. I've not used the module much, so I'm sure there are better way to go about doing this - but it's a start. And it may reduce your processing time.

Also, one note - this only prints the 'stations' data.

Code:
use Data::Dumper;
use XML::Parser;

open OUTPUT, ">temp.txt" or die;

my $file = 'xtvdata.xml';
my ($record, $context, %records, $record_id);

my $parser = XML::Parser->new( Handlers => {
        Start =>    \&handle_elem_start,
        End =>      \&handle_elem_end,
        Char =>     \&handle_char_data,
                                             });

$parser->parsefile($file);

sub handle_elem_start {
    my ($expat, $name, %atts) = @_;
	if ( defined $expat->{Context}[1] ) {
		if ($expat->{Context}[1] eq 'stations') {
			$context = $name;
			$record = {} if ($name eq 'station');
			$record->{id} = $atts{id} if $name eq 'station';
		}
	}
}

sub handle_char_data {
    my ($expat, $text) = @_;
	if ( defined $expat->{Context}[1] ) {
		if ($expat->{Context}[1] eq 'stations') {
			if ($context) {
				$text = trim($text) if $text;
				$record->{$context} .= $text;
			}
		}
	}
}

sub handle_elem_end {
    my ($expat, $name) = @_;
	if ( defined $expat->{Context}[1] ) {
		if ($expat->{Context}[1] eq 'stations') {
			return unless ($name eq 'station');
			print OUTPUT Dumper \{$record->{'id'} => $record};
			# Instead of printing, your DB update code could go here.
		}
	}
}

sub trim {
	my @out= @_;
	for (@out) {
		s/^\s+//;
		s/\s+$//;
	}
	return wantarray ? @out : $out[0];
}
close OUTPUT;
 
I'll work on this more, but I might end up just using existing scripts and XMLTV.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top