Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl Beginner - Need Help

Status
Not open for further replies.

Bob6739

Programmer
Jan 2, 2023
3
US
I'm just learning Perl and can't figure out why this code does not work.

#!/usr/bin/env perl
use strict;
use warnings;

open(my $configfile, "<", 'PoliceLog.txt') or die "Could not read file\n";

my @configdata = split 'zz', $configfile;

print "$configdata[0]\n";

close $configfile;[/b]

I expect it to print the first line of the PoliceLog.txt file but instead I get this:

new-host-6:perl bobgreen$ perl TestB.pl
GLOB(0x12280bc10)
new-host-6:perl bobgreen$
 
Hi

Because there you [tt]split()[/tt] the filehandle itself, not the file content read from the filehandle.

Code:
[b]my[/b] [navy]@configdata[/navy] [teal]=[/teal] [i][green]<$configfile>[/green][/i][teal];[/teal]

( No idea what you tried with the [tt]split()[/tt] there, so I skipped it. Reading from filehandle into array does the line splitting automatically. )


Feherke.
feherke.github.io
 
I thought one split command would split an entire file but it appears it returns when it does one split so you have to keep cycling through it? I found an example online and changed it to use my filename. Here it is -

#!/usr/bin/perl

# perl split function example

$filename = 'PoliceLog.txt';

open(FILE, $filename) or die "Could not read from $filename, program halting.";
while(<FILE>)
{
# get rid of the pesky newline character
chomp;

# read the fields in the current record into an array
@fields = split('zz', $_);

# print the first field
print "$fields[0]\n";
print "One Pass";
}

print "$fields[0]\n";
print "$fields[1]\n";
print "$fields[2]\n";
print "$fields[3]\n";

close FILE;

I also inserted the (print "One Pass";) command and found it cycles through the while loop for each line in the file. IE, the split command returns after each line not just when it finds the zz delimiter?? I added the 4 print commands at the end to see what is in the array and nothing prints?? Very frustrating trying to figure out how this thing works.
 
Hi

There would be one thing to mention : in Perl there is a function like nowhere else[sup](*)[/sup], the [tt]wantarray[/tt]. It tells to the function in which it is called whether the one which called it intends to assign its return value to an array or not.

In your original code, the value read from file was passed to [tt]split[/tt], so used in a scalar context. So the file reading would read and return one line. If that line would contain the substring "zz", then @configdata would contain the line split on that. Otherwise @configdata would contain the read line as its only item.

In my modified code, the value read from file was passed directly to the @configdata array. So the file reading would read the entire file content, split it on the [tt]$/[/tt] variable's value ( by default "\n" ) and place each piece in a separate array item.

In your new code, the value read from file is used as [tt]while[/tt] condition, again a scalar context. So the file reading will read the file line by line. And for each line executes the block that splits the line and prints the first item resulted from the split. Then after the file was exhausted, the last read line's first 4 field are printed.

There would be one more way to read the file : slurp in the entire content without splitting it automatically. For that you simply [tt]undef [navy]$/[/navy][teal];[/teal][/tt][sup](**)[/sup], then no more automatic splitting is performed and you are free to process the unaltered file content.

BTW, is strongly encouraged to use variable for filehandle as you did in your original code. $configfile will live only in the current block, while FILE is global and if closing is accidentally skipped, the open file will be available from other parts of your script, where FILE may be expected to access other file.

Unfortunately I do not really get your goal. Maybe if you show us a sample of that PoliceLog.txt file and explain which parts from it you need.

_____
[sup](*)[/sup] As far as I know.
[sup](**)[/sup] Well, [tt]undef[/tt]ing it is ugly. In practice you temporarily overdeclare a local variable in an anonymous function : [tt][navy]$everything[/navy] [teal]=[/teal] do [teal]{[/teal] local [navy]$/[/navy][teal];[/teal] [green]<$configfile>[/green] [teal]};[/teal][/tt]


Feherke.
feherke.github.io
 
Hi Feherke,
Thanks for your help. Let me step back and explain what I'm trying to do and see if it makes sense.
I'm a retired engineer working with several other volunteers to publish a weekly senior citizens email newsletter for our town. The newsletter popularity is growing and is now sent to about 450 people. Part of the newsletter is a police log which we get weekly from the town police department. The police department emails us a PDF which we convert to text and then edit with WORD. The log consists of several hundred incidents such as a traffic ticket stop. The editing takes 2-3 hours and I would like to automate this as much as possible to save time. I have attached a file (PoliceLogEx.txt) containing examples, where the first part of the file is what we get from the police department and the last part is the final edited log we publish.

Many of the source incidents are deleted since they would be of no interest to senior citizens. The remainder are reformatted to make them more readable. My plan is to split the source text file into an array with each incident being an array element. I can then inspect each element to see if I want to delete it or reformat it. I thought it would be easier to work on each incident if they were isolated. At the end I can then combine the reformatted array elements back into a single text file.

I have experience with machine language programming, C, C++ and Visual Basic. I currently work on a Mac but have worked in Unix environments with Unix command line tools. I chose Perl for this job because I read it was the best language for text processing. I'm finding it hard to learn and am thinking of looking for an easier language. I did some javascript at one time and had no trouble with it. Do you have any recommendations for an easy solution?

Bob
 
 https://files.engineering.com/getfile.aspx?folder=167b283b-4878-4d0f-9ef5-1bb20309fce3&file=PoliceLogEx.txt
Hi

Bob said:
I currently work on a Mac but have worked in Unix environments with Unix command line tools.
My relation with Mac was never good, so I may ask the dumbest thing now : is the GNU coreutils package available there ? Because if yes, then it has a right tool for you : [tt]csplit[/tt]
Code:
csplit --digits=3 PoliceLog.txt '%^[0-9]%' '/^[0-9]/' '{*}'

For the first part of the file you attached, the above command will create 111 files with names from xx000 to xx110, each containing one log entry. ( Assuming I got it right that each entry starts with a digit at the beginning of the line. )

Having those files I would use Midnight Commander like this :
[ul]
[li]In the left panel navigate to the directory containing the entry files[/li]
[li]Go to the Left menu's Listing format... command and in the Listing format dialog choose Brief file list: and type 1 to columns. Then OK.[/li]
[li]Go to the Options menu's Layout... command and in the Layout dialog's Panel split group uncheck Equal split and change the left side to the minimum 12. Then Ok.[/li]
[li]Go to the Left menu's Quick view command ( or press [kbd]Ctrl[/kbd]+[kbd]x[/kbd], [kbd]q[/kbd] )[/li]
[/ul]
Then you can traverse the entry files in the left panel while in the right panel you will have the content of the currently selected entry file. If you find it interesting, you can mark/unmark the currently selected and viewed entry file with [kbd]Ins[/kbd] key. When finished, you will have all interesting entries marked. With the [kbd]*[/kbd] key on the numeric block you cna invert the marks, so to have the uninteresting ones marked. So to be able to delete them all with [kbd]F8[/kbd] and remain only with the interesting ones.
mc_pfvklv.png


Well, assuming Midnight Commander is available on Mac.

Feherke.
feherke.github.io
 
Hi

Bob said:
I chose Perl for this job because I read it was the best language for text processing.
Yes, Perl is good for text processing, however it has a relatively steep learning curve.
A standup comedian's words about a girl applies to Perl too : "It's that kind of beauty which needs some accommodation".

Tried to write a script for your log file but after awhile I realized the format is less standard than seemed on first glance. So will need some more thinking before going further with this :
Perl:
[gray]#!/bin/perl[/gray]

[b]use[/b] strict[teal];[/teal]
[b]use[/b] warnings[teal];[/teal]

[gray]# slurp it all[/gray]
[b]open my[/b] [navy]$file[/navy][teal],[/teal] [i][green]'PoliceLog.txt'[/green][/i] or [b]die[/b] $[teal]!;[/teal]
[b]my[/b] [navy]$text[/navy] [teal]=[/teal] [b]do[/b] [teal]{[/teal] [b]local[/b] [navy]$/[/navy][teal];[/teal] [i][green]<$file>[/green][/i] [teal]};[/teal]
[b]close[/b] [navy]$file[/navy][teal];[/teal]

[gray]# split on entries[/gray]
[b]my[/b] [navy]@entry[/navy] [teal]=[/teal] [b]split[/b] [fuchsia]/^(?=\w)/[/fuchsia][b]m[/b][teal],[/teal] [navy]$text[/navy][teal];[/teal]

[gray]# take out the header from the array and store it separately[/gray]
[b]my[/b] [navy]$header[/navy] [teal]=[/teal] [b]shift[/b] [navy]@entry[/navy][teal];[/teal]

[gray]# count entries containing either "accidental" or "misdial" ( case insensitive )[/gray]
[b]my[/b] [navy]$nr_accidental[/navy] [teal]=[/teal] [purple]0[/purple][teal];[/teal]
[b]foreach my[/b] [navy]$entry[/navy] [teal]([/teal][navy]@entry[/navy][teal]) {[/teal]
  [navy]$nr_accidental[/navy][teal]++[/teal] [b]if[/b] [navy]$entry[/navy] [teal]=~[/teal] [fuchsia]/accidental|misdial/[/fuchsia][b]i[/b][teal];[/teal]
[teal]}[/teal]

[gray]# extract "Key: Value" pairs from the header[/gray]
[b]my[/b] [navy]%piece[/navy][teal];[/teal]
[navy]$piece[/navy][teal]{[/teal][navy]$1[/navy][teal]} =[/teal] [navy]$2[/navy] [b]while[/b] [navy]$header[/navy] [teal]=~[/teal] [b]m[/b][fuchsia]/(\w+):\s+(\S+)/[/fuchsia][b]g[/b][teal];[/teal]

[gray]# print output header[/gray]
[b]print[/b] [i][green]"Acton Police Log [navy]$piece[/navy][teal]{[/teal][navy]From[/navy][teal]}[/teal] to [navy]$piece[/navy][teal]{[/teal][navy]Thru[/navy][teal]}[/teal]

NOTE[teal]:[/teal] Police respond to every 911 call including accidental 911 calls.
During the past week Acton Police responded to [navy]$nr_accidental[/navy] accidental 911 calls.
These calls are not included in this report.

"[/green][/i][teal];[/teal]

[gray]# print output content[/gray]
[b]foreach my[/b] [navy]$entry[/navy] [teal]([/teal][navy]@entry[/navy][teal]) {[/teal]
    [b]if[/b] [teal]([/teal][navy]$entry[/navy] [teal]=~[/teal] [b]m[/b][fuchsia]/^For Date:/[/fuchsia][teal]) {[/teal]
        [b]print[/b] [navy]$entry[/navy][teal];[/teal]
    [teal]}[/teal] [b]elsif[/b] [teal]([/teal][navy]$entry[/navy] [teal]=~[/teal] [b]m[/b][fuchsia]/^\d+/[/fuchsia][teal]) {[/teal]
        [navy]$entry[/navy] [teal]=~[/teal] [b]s[/b][fuchsia]/^\s+//[/fuchsia][b]mg[/b][teal];[/teal]
        [b]print[/b] [navy]$entry[/navy][teal],[/teal] [i][green]"\n"[/green][/i][teal];[/teal]
    [teal]}[/teal]
[teal]}[/teal]

Feherke.
feherke.github.io
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top