cjtucker1976
Technical User
Many years ago we had perl scrip created that parsed a AP news feed and just grabbed the headlines and created a clean newswire.html file. It has been working for over 10 years. The way received the feed has changed and our script is no longer working. I am a newbie- any help is appreciated. Below is the perl scrip and then the textfile work on.
SCRIPT:
use Cwd;
$curr= cwd();
# Test
# print "Current working directory is ", $curr, "\n\n";
$PresentTime = time;
#Bigen WHYY content
print "WHYY Content";
chdir "d:\\signfiles" ;
$now = localtime;
$tday = substr($now,0,3) ;
$tmo = substr($now,4,3) ;
$tdate = substr($now,8,2) ;
$ttime = substr($now,11,5) ;
$contentfile = $tday. "_". $tmo. "_". $tdate. ".txt" ;
$apchk = "start" ;
$spchk = "specs" ;
#open(CONFILELIST, $contentfile) || die "cannot opendir. $!";
open(CONFILELIST, $contentfile) or open (CONFILELIST, "whyydef.txt");
print "In whyy content";
$timeskip eq "no" ;
$skip = "yes" ;
$spec = "no" ;
$goodtogo = "no" ;
while (<CONFILELIST>) {
$a = $_ ;
$headline = substr($a,0,5) ;
$headline =~ tr/A-Z/a-z/ ;
if ($apchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "no" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}elsif ($spchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "yes" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}
if ($skip eq "no") {
if ($timeskip eq "no") {
@pcontent[$ct] = $a ;
$ct = $ct + 1
}
}
$timeskip = "no" ;
$oldchktime = $chktime
}
close CONFILELIST;
chdir "C:\\aperl\\bin" ;
open(FILELIST, "newswire.txt") or $spec = "yes" ;
print filelist
print " :$spec: ";
if ($spec eq "yes") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print" Once more in content";
goto FINISHLINE ;
}
# End WHYY contnet
print "End WHYY Content";
#open(FILELIST, "newswire.txt") || die "cannot opendir. $!";
$chk = "^AP Top Headlines" ;
$chkus = "^AP Top U.S. News" ;
$chkyes = "no" ;
$apchk = "AP" ;
$brkchk = "\^" ;
#print $chk ;
$fg = "not" ;
$skip = "no" ;
$newsct = 0 ;
while (<FILELIST>) {
#check for news content
$goodtogo = "yes" ;
$headline eq "" ;
$a = $_ ;
#$a =~ s/\W// ; #This gets rid of the ^ on other lines
$headline = substr($a,11,2) ;
#print $headline ;
#$wait = <STDIN> ;
if ($apchk eq $headline) {
$fg = "not" ;
#print $fg ;
}
$grab = substr($a,0,17) ;
if ($grab eq $chk) {
$chkyes = "yes" ;
$grab =~ s/\W// ; #Delete First Character of line
#print HTML "... " ;
#print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
} elsif ($grab eq $chkus) {
if ($chkyes eq "no") {
$grab =~ s/\W// ; #Delete First Character of line
print HTML "... " ;
print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
}
}
if ($grab ne $chk) {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}
if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
} elsif ($grab ne $chkus) {
if ($chkyes eq "no") {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}
if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
$skip = "no" ;
}
}
}
$skip = "no" ;
}
}
print HTML "... " ;
close FILELIST;
#close HTML;
FINISHLINE:
if ($goodtogo eq "no") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
}
$nct = 0 ;
for ($nct =0; $nct <= $ct; $nct++) {
print HTML @pcontent[$nct] ;
print HTML " " ;
}
close HTML;
Example of NEWSWIRE TEXT File:
Niall Horan (One Direction) is 20. Actor Mitch Holleman (``Reba'') is 18.
^
'' _ Charlotte Bronte (BRAWN'-tee), English author (1816-1855).
^
(Above Advance for Use Friday, Sept. 13)
^
Copyright 2013, The Associated Press. All rights reserved.
^
AP-WF-09-13-13 0401GMT<
0403-----
r a BC-US--People-Kidman 09-13 0501
^*1402< ^AP-US-People-Kidman,118<
^Kidman says she's OK but shaken after collision<
^AP Photo NYET105<
^Eds: APNewsNow. With AP Photos.<
Calvin Klein event, she said she was ok.
Kidman added: ``I'm up, I'm walking around, but I was shaken.''
AP-WF-09-13-13 0411GMT<
0401-----
r a BC-US-TEC--Twitter-IPO-T 09-13 0916
^BC-US-TEC--Twitter-IPO-Tweet Facts,492<
^Tweetable facts about Twitter's IPO<
^AP Photo NYBZ147<
^Eds: With AP Photos.<
^By SCOTT MAYEROWITZ=
^AP Business Writer=
a Tweet.
e limit of tweets.
a planned IPO.
announcement tweet, 7,872 people retweeted the message.
_ The public offering comes at a time of heightened investor interest
in the IPO market _ 131 IPOs have priced so far this year.
_ Is (at)Twitter trying to avoid (at)Facebook's May 2012 IPO (hash)fa
il? Well, company is keeping details secret for now. (hash)TwitterIPO
_ The company hasn't said if it makes a profit or how much revenue it
takes in. (hash)FadOrFuture? Wonder if (at)WarrenBuffett will buy stock.
_ Most of Twitter's revenue comes from advertising. (at)eMarketer est
imates $582.8 million this year, up from $288.3 million in 2012.
_ Compare: In latest quarter, Facebook had $1.6 billion in ad revenue
. By 2015, Twitter's annual ad revenue is expected to hit $1.3 billion.
_ 2013 (hash)Superbowl performance by (at)Beyonce had 268 million twe
ets per minute, more than any other event in past two years.
_ Not everybody on (at)Twitter is who they claim to be. (at)United Ai
rlines CEO Jeff Smisek has to put up with (at)FakeUnitedJeff
_ Sometimes even missing zoo animals get their own Twitter accounts.
And they can be funny. Just read (at)BronxZoosCobra
.''
AP-WF-09-13-13 0411GMT<
0407-----
r a BC-US--NuclearSpending 09-13 0578
^*1110< ^AP-US-Nuclear-Spending,130<
^Nation's bloated nuclear spending comes under fire<
^AP Photo LA104, LA103<
^Eds: APNewsNow. Will be expanded. With AP Photos.<
^By JERI CLAUSING and MATTHEW DALY=
^Associated Press=
sitive nuclear bomb-making facilities doesn't work.
ms that include a redesign to raise the roof so equipment can fit inside.
tic budget increases for nuclear contractors.
uld be overhauled.
SCRIPT:
use Cwd;
$curr= cwd();
# Test
# print "Current working directory is ", $curr, "\n\n";
$PresentTime = time;
#Bigen WHYY content
print "WHYY Content";
chdir "d:\\signfiles" ;
$now = localtime;
$tday = substr($now,0,3) ;
$tmo = substr($now,4,3) ;
$tdate = substr($now,8,2) ;
$ttime = substr($now,11,5) ;
$contentfile = $tday. "_". $tmo. "_". $tdate. ".txt" ;
$apchk = "start" ;
$spchk = "specs" ;
#open(CONFILELIST, $contentfile) || die "cannot opendir. $!";
open(CONFILELIST, $contentfile) or open (CONFILELIST, "whyydef.txt");
print "In whyy content";
$timeskip eq "no" ;
$skip = "yes" ;
$spec = "no" ;
$goodtogo = "no" ;
while (<CONFILELIST>) {
$a = $_ ;
$headline = substr($a,0,5) ;
$headline =~ tr/A-Z/a-z/ ;
if ($apchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "no" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}elsif ($spchk eq $headline) {
$timeskip = "yes" ;
$chktime = substr($a,6,5) ;
if ($ttime ge $chktime) {
$skip = "no" ;
$spec = "yes" ;
@pcontent = "" ;
$ct = 0 ;
} else {
$skip = "yes" ;
}
}
if ($skip eq "no") {
if ($timeskip eq "no") {
@pcontent[$ct] = $a ;
$ct = $ct + 1
}
}
$timeskip = "no" ;
$oldchktime = $chktime
}
close CONFILELIST;
chdir "C:\\aperl\\bin" ;
open(FILELIST, "newswire.txt") or $spec = "yes" ;
print filelist
print " :$spec: ";
if ($spec eq "yes") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print" Once more in content";
goto FINISHLINE ;
}
# End WHYY contnet
print "End WHYY Content";
#open(FILELIST, "newswire.txt") || die "cannot opendir. $!";
$chk = "^AP Top Headlines" ;
$chkus = "^AP Top U.S. News" ;
$chkyes = "no" ;
$apchk = "AP" ;
$brkchk = "\^" ;
#print $chk ;
$fg = "not" ;
$skip = "no" ;
$newsct = 0 ;
while (<FILELIST>) {
#check for news content
$goodtogo = "yes" ;
$headline eq "" ;
$a = $_ ;
#$a =~ s/\W// ; #This gets rid of the ^ on other lines
$headline = substr($a,11,2) ;
#print $headline ;
#$wait = <STDIN> ;
if ($apchk eq $headline) {
$fg = "not" ;
#print $fg ;
}
$grab = substr($a,0,17) ;
if ($grab eq $chk) {
$chkyes = "yes" ;
$grab =~ s/\W// ; #Delete First Character of line
#print HTML "... " ;
#print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
} elsif ($grab eq $chkus) {
if ($chkyes eq "no") {
$grab =~ s/\W// ; #Delete First Character of line
print HTML "... " ;
print HTML $grab ;
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
print HTML "The Top Headlines From WHYY" ; #add at " to include the time
@timeparts = localtime(time) ;
#print HTML $timeparts[2], ":", $timeparts[1] ;
$fg = "yes" ;
$skip = "yes" ;
}
}
if ($grab ne $chk) {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}
if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
} elsif ($grab ne $chkus) {
if ($chkyes eq "no") {
if ($fg eq "yes") {
$brk = substr($a,0,1) ;
if ($skip eq "no") {
if ($brk eq $brkchk) {
print HTML "... " ;
}
#} else {
#print HTML " " ;
#}
if ($brk eq $brkchk) {
$a =~ s/\W// ; #Delete First Character of line
}
print HTML $a ;
}
$skip = "no" ;
}
}
}
$skip = "no" ;
}
}
print HTML "... " ;
close FILELIST;
#close HTML;
FINISHLINE:
if ($goodtogo eq "no") {
unlink ("newswire.html") ;
open (HTML, ">newswire.html") or die "Can not create index.html. $!";
}
$nct = 0 ;
for ($nct =0; $nct <= $ct; $nct++) {
print HTML @pcontent[$nct] ;
print HTML " " ;
}
close HTML;
Example of NEWSWIRE TEXT File:
Niall Horan (One Direction) is 20. Actor Mitch Holleman (``Reba'') is 18.
^
'' _ Charlotte Bronte (BRAWN'-tee), English author (1816-1855).
^
(Above Advance for Use Friday, Sept. 13)
^
Copyright 2013, The Associated Press. All rights reserved.
^
AP-WF-09-13-13 0401GMT<
0403-----
r a BC-US--People-Kidman 09-13 0501
^*1402< ^AP-US-People-Kidman,118<
^Kidman says she's OK but shaken after collision<
^AP Photo NYET105<
^Eds: APNewsNow. With AP Photos.<
Calvin Klein event, she said she was ok.
Kidman added: ``I'm up, I'm walking around, but I was shaken.''
AP-WF-09-13-13 0411GMT<
0401-----
r a BC-US-TEC--Twitter-IPO-T 09-13 0916
^BC-US-TEC--Twitter-IPO-Tweet Facts,492<
^Tweetable facts about Twitter's IPO<
^AP Photo NYBZ147<
^Eds: With AP Photos.<
^By SCOTT MAYEROWITZ=
^AP Business Writer=
a Tweet.
e limit of tweets.
a planned IPO.
announcement tweet, 7,872 people retweeted the message.
_ The public offering comes at a time of heightened investor interest
in the IPO market _ 131 IPOs have priced so far this year.
_ Is (at)Twitter trying to avoid (at)Facebook's May 2012 IPO (hash)fa
il? Well, company is keeping details secret for now. (hash)TwitterIPO
_ The company hasn't said if it makes a profit or how much revenue it
takes in. (hash)FadOrFuture? Wonder if (at)WarrenBuffett will buy stock.
_ Most of Twitter's revenue comes from advertising. (at)eMarketer est
imates $582.8 million this year, up from $288.3 million in 2012.
_ Compare: In latest quarter, Facebook had $1.6 billion in ad revenue
. By 2015, Twitter's annual ad revenue is expected to hit $1.3 billion.
_ 2013 (hash)Superbowl performance by (at)Beyonce had 268 million twe
ets per minute, more than any other event in past two years.
_ Not everybody on (at)Twitter is who they claim to be. (at)United Ai
rlines CEO Jeff Smisek has to put up with (at)FakeUnitedJeff
_ Sometimes even missing zoo animals get their own Twitter accounts.
And they can be funny. Just read (at)BronxZoosCobra
.''
AP-WF-09-13-13 0411GMT<
0407-----
r a BC-US--NuclearSpending 09-13 0578
^*1110< ^AP-US-Nuclear-Spending,130<
^Nation's bloated nuclear spending comes under fire<
^AP Photo LA104, LA103<
^Eds: APNewsNow. Will be expanded. With AP Photos.<
^By JERI CLAUSING and MATTHEW DALY=
^Associated Press=
sitive nuclear bomb-making facilities doesn't work.
ms that include a redesign to raise the roof so equipment can fit inside.
tic budget increases for nuclear contractors.
uld be overhauled.