Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Getting Rid of Redudant Data

Status
Not open for further replies.

nanohurtz

Programmer
Mar 25, 2002
15
US
I need to modify this code to eliminate any duplicate 'words' it picks up from a flat file

THE CODE
#!/usr/bin/perl -w

open(IN, &quot;< scdreader.txt&quot;) or die &quot;Cannot open file for read\n&quot;;
open(OUT, &quot;> scdreaderout.txt&quot;) or die &quot;Cannot open file for write\n&quot;;
my ($scdsrv, $scdname, $scdon, $scdjob);

while(<IN>) {
chomp;

if (m/(SCHEDULE) (.*)#(.*)$/) {
$scdsrv=$2, $scdname=$3;
}elsif (m/(ON) (.*)$/) {
$scdon=$2;
if ($scdon!~/(REQUEST)$/) {
$scdon=&quot;ACTIVE&quot;
}
}elsif (m/([A-Z]{3}[0-9]{3}[A-Z]{2})/gi) {
$scdjob=$1;
print OUT &quot;$scdjob\|$scdsrv\|$scdname\|$scdon\n&quot;;
}elsif (m/(END)$/) {
$scdsrv=&quot;&quot;,$scdname=&quot;&quot;,$scdon=&quot;&quot;,$scdjob=&quot;&quot;;
}
}
Close (IN);
Close (OUT);

SAMPLE IN (flat .txt file)

SCHEDULE RSWS555A#BLUMERGE
ON MO, TU, WE, TH
:
POP555BT
III777CT
POP555BT
END
*more stuff
SCHEDULE RCCS919A#WHTMERGE
ON REQUEST
:
EXCEPT UDE888QT
RXX818WT
END
*even more stuff

SAMPLE OUT (flat .xls file)

POP555BT|RSWS555A|BLUMERGE |ACTIVE
POP555BT|RSWS555A|BLUMERGE |ACTIVE <--redudant
III777CT|RSWS555A|BLUMERGE |ACTIVE
UDE888QT|RCCS919A|WHTMERGE|REQUEST
RXX818WT|RCCS919A|WHTMERGE|REQUEST

Can someone help me figure this out? I think @rrays might be needed here.
 
I meant to say..'getting rid of duplicate records it may dynamically create from reading a flat file'
 
Perhaps the easiest method is to store each line in a hash, then when you read a line check whether it's in the hash before printing:

my %stored;

while(<IN>) {
chomp;
next if($stored{$_});
$strored{$_} = 1;

# etc ....

}

Barbie.
Leader of Birmingham Perl Mongers
 
Thank Barbie,

Can I purge the hash for every instance of /SCHEDULE/END? I want the @rray to handle duplicates within each schedule. It's ok if a duplicate job turns up in another schedule. How do I refine?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top