Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using ActiveState Perl For Windows to Parse 3

Status
Not open for further replies.

pmking

IS-IT--Management
Mar 1, 2006
59
0
0
US
Hello All,

I have a parsing question. I have a formated report that I need to parse through and extract certain pieces of the text file.

Firsl of all, I am new to Perl, and I am not sure where to begin.

What I would like to do is:

1. Create a GUI interface so users can input where their text file that needs parsing is located on their pc.

2. Have a perl script (created via ActiveState Perl for windows) run against the text file and extract only what is needed, and output it to an excel spreadsheet. Not sure if this can be, the output to an excel spreadsheet.

I am not sure where to begin with the perl script, since the text file that I need to parse is formatted already. There are tabs, spaces, colons, dashes throughout this text file.

Am I on the right track by using perl? And if so, can somebody please, please help me get a start?

Thanks.


 
If you want a GUI then Tk most likely. Tk modules come with activestate perl for windows so you can look them up in the activestate documentation. The rest also sounds possible. There are excel modules available on CPAN you can install using ppm on your windows box.
 
Yes, you are precisely in the right place. The key to what you are trying to accomplish is with Regular Expressions. I recommend the following site:


Regular Expressions are found in other scripting languages, I first started a similar project in VBScript using the Windows Script Hosting. VBScript can also apply Regular Expression using various objects.

But Regular Expressions are most completely and naturally applied in a script language here in PERL. You might as well start here, although if you are new to scripting, expect a fairly steep learning curve with poorly navigated documentation.

If PERL proves too much to handle right away and you are in a Windows environment and you have Office, try Visual Basic for Applications which is a script that runs in Office programs like Word and Excel. You can find Regular Expressions there. For an even simpler introduction, Search and Replace in MS Word applies "wildcards", which are rooted in Regular Expressions.

But, really, you should orient yourself around returning to PERL, which is vastly more powerful than VBS.

Regular Expressions are applied everywhere in computing, compilers, for example.
 
Thank you SparceMatrix..

I am comfortable with regex, since I have been reading about it for the past 3 days.. I get the concept and I do feel that it could assist me in parsing through my 'formated text file'.

I am stuck and I am embarrased to say, how do I start my perl script off? Meaning, I am reading a ton of posts and such, but for some reason, my brain is locked from knowing where to start.

Thanks.

 
Hi

Some comments


1. Create a GUI interface so users can input where their text file that needs parsing is located on their pc.

Tk is a great module with a lot of examples on th net. You could create a windows easily.

2. Have a perl script (created via ActiveState Perl for windows) run against the text file and extract only what is needed, and output it to an excel spreadsheet. Not sure if this can be, the output to an excel spreadsheet.

To match characters just use reg expression, it is well documenteda as well. Output to excel you could use Win32::Ole or SpreadsheetWrite Excel module.

Cheers.


dmazzini
GSM System and Telecomm Consultant

 
New to perl? Suggest you write the read-parse-output part first to get your feet wet before diving into Tk for the GUI.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thank you to all who have assisted..

I have a question, how do I in PERL code for a prompt? Meaning, I want my script to ask me what file do I want to read from, I guess you would call that a Windows dialogue box.

I am trying to use:

$pid = open(INPUTFILE, "C:\MYFILE |")
#while (<INPUTFILE>) {
#my ($loacation, $subsystem, $fromtime, $totime,
# $primauth, $planname, $average, $applcl1,
# $db2cl2, $cputime, $occurrences, $commits,
# $rollbbacks, $deadlocks) =
#/^LOCATION.\s^.$
}
close(INPUTFILE)

NOTE: I have commented out the variables above to see if my $pid would work

---------END OF SAMPLE CODE I AM SCREWING UP---------



Okay, I want the script when executed to pop up a windows dialogue box and ask for the input text file.

Then I want to be able to parse through this file using REGEX to find the data that I need from the input file.

My confusion/questions are:

1. Can somebody suggest a good perl book that will 'GERBER' feed this stuff to me?

2. How do I in perl, code in a way that I will be prompted to insert the inputfile? Which I was trying to do with the $PID = open. But according to the "perl cookbook - O'reilly" $pid is interracting with a program, and I thought that microsoft's notepad (text file) would be concidered another program (please no laughs:))

3. I am learning REGEX and it is simple, I guess enough practice will be like speaking a third language, eh?

In summary here are the steps I have taken and will be taking.

1. I am learning regex on the fly.

a. Using Kamodo (I feel comfortable with it)
b. I have set up my projects already.
c. Now I am just staring at the edit window:)
d. I have manually (with a red pen)
sorted through the formated
report that I want to parse so I can identify
how I want to utilize regex against the formated
text file.
e. I started to code in perl the
manipulation of the formated text via regex.

I apologize for this verbose thread.

Any help will be greatly apprecited and it is needed.... Please.. Please.. Please..

I truly hope this makes sense. I know this forum is a stickler for 'asking the right questions, and posting the right amount of info to receive help'...



 
Rather than interacting with the console, you may find it easier to use a command line argument
Code:
use strict;
use warnings;

my $file = $ARGV[0]; # 1st command line argument
open(REPORT, $file) or die "Can't open file $file, $!";
Then when you write your GUI front end you just need to invoke your base script with a parameter.
Code:
perl myscript.pl report.txt
for example.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thank you Stevexff,

Can you please help me here.

[blue]use strict;[/blue] [red]--->what does this do?[/red]
[blue]use warnings;[/blue] [red]--->what does this do?[/red]

[red]How is the below suppose to work? I am getting an error while in debug mode, debug is telling me no such file or directory.[/red]


[red]Below there is open(REPORT), I have not identified REPORT, so do I need to identify it somehow?[/red]
[blue]my $file = $ARGV[0]; # 1st command line argument
open(REPORT, $file) or die "Can't open file $file, $!";
[/blue]
[red]I am assuming I put this on the top of my script?[/red]

Thanks for all the help so far...



 
use strict; tells perl to be strict about lax programming on your part, for example you have to declare all variables before use (hence 'my $file'). Without this perl will assume that if a variable doesn't exist you want it created on the fly. While this can be convenient, if you make a typo in a variable name later in the program it can be tricky to debug. It checks a few other things as well. It's good practice to use it.

use warnings; gives you more debugging info if things go wrong. Again, not compulsory, but good practice.

When you open a file, you have to give it a handle. I chose REPORT because you said you were parsing a report file. So to extend the snippet to a full example
Code:
use strict;
use warnings;

my $file = $ARGV[0]; # 1st command line argument
open(REPORT, $file) or die "Can't open file $file, $!";

while (<REPORT>) {
   chomp; # remove newline
   next if (/^\s*$/); #skip blank lines
   # your stuff goes here
}

close(REPORT);
next if

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thanks so much Steve.. When I am done I will post my end result...
 
Regex question.

I am having issues with the following:

The report reads this:

[blue]LOCATION: DDFIF0B[/blue]

I tried the regex the above with:

/LOCATION:/^\w$ [red]is this correct?[/red]

The report read this:

[blue]SUBSYSTEM: IF1B[/blue]

I tried to regex the above with:

/SUBSYSTEM:/^w$ [red]is this correct?[/red]

What am I doing wrong with my REGEX?

I have been reading OREILLY Regular Expression Pocket Refernce mini book. It makes sense in the book, but when I try it, my komodo editor underlines the line and says I am in an error state.

If I want to parse a report that has at various times 2000+ lines, and the format is constant, but the size isn't, and all I need from the report are:

NOTE: this is the text format has it appears including spaces and pound sign(#), but the below are scattered through out the report.

[red]
LOCATION: DDFIF0B

SUBSYSTEM: IF1B

INTERVAL FROM: 12/06/05 22:03:17.59

TO: 12/07/05 01:59:59.01

#OCCURRENCES : 1534

DEADLOCKS 0.00 0[/red]

[red]When I write a regex for the above, do I make it one continious line, or do I break it up, and how do I break it up?[/red]

I have concluded from my Perl OREILLY readings, that the result from a REGEX (captured submatches) for LOCATION will be in the form of $1, and for SUBSYSTEM will in the form of $2 and so on. So, with this, once I am further into the code and when I am ready to 'print' the output, I would reference each (submatch) with their appropriate $1, $2, right?

Any help you can give me is greatly appreciated...







 
Code:
my $location;

while (<REPORT>) {
   chomp;
   if (/^LOCATION:\s*(\w+)$/) {
      $location = $1;
   }
   if (/^SUBSYSTEM:\s*(\w+)$/) {
      print "$location $1\n";
   }
}

* untested as I don't have perl on this machine *

The parentheses around the \w+ in the regex force capture into $1, $2 etc.

Note that $location is defined outside the loop otherwise it won't 'remember' the location from the previous iteration...

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thanks again Steve,

This is what I have so far:

#!/usr/bin/perl -w

use File::Basename;
use strict;
use warnings;

my $file = $ARGV[0]; # 1st command line argument
open(REPORT, $file) or die "Can't open file $file, $!\n";

my $location;

while (<REPORT>) {
chomp;
[red]how it appears in the formated report: LOCATION: DDFIF0B[/red]
if (/^LOCATION:\s*(\w+)$/) {
$location = $1;
}
[red]how it appears in the formated report: SUBSYSTEM: IF1B IF1B changes, never the same[/red]
if (/^SUBSYSTEM:\s*(\w+)$/) {
$location = $1;
}
[red]how it appears in the formated report: PRIMAUTH: MTSCLMS1 PLANNAME: PUL0B NOTE: primauth and planname change[/red]
if (/^PRIMAUTH:\sPLANNAME:\s*(\w+)$/) {
$location = $1;
}
[red]how it appears in the formated report: INTERVAL FROM: 12/06/05 22:03:17.59
[/red]
if (/^INTERVAL\sFROM:\s*(\w+)$/) { [red]NOTE: I'm working on how i can regex the dd/mm/yy hh:mm:ss:ms[/red]
$location = $1;
}
[red]how it appears in the formated report: TO: 12/07/05 01:59:59.01
[/red]
if (/^TO:\s*(\w+)$/) { [red]NOTE: I'm working on how i can define the dd/mm/yy hh:mm:ss:ms[/red]
$location = $1;
}
[red]how #occurences appears in the formated report: #OCCURRENCES : 1534 [/red]
if (/^#OCCURRENCES\s:\s*(\d+)$/) {
$location = $1;
}
[red]how deadlock appears in the report: DEADLOCKS 0.00 0

if (/^DEADLOCKS\s*(\d+\s\d+)$/) {
$location = $1;
}

close(REPORT);


All the code above in black is actual code, all the stuff in red are my comments with hopes of giving the perl genius who assists me a more visual of what is what.. Hope this made sense...

Am I on the right track??
 
it would be better using if/elsif blocks instead of if/if blocks. Perl will return the first if/elsif that evaluates to true without checking the rest of them. But if you use all /if/if conditions perl must evaluate all of them, even though only one looks like it could ever be true in the context of what you are doing. Without commenting about you regexps:

Code:
while (<REPORT>) {
   chomp;
   #how it appears in the formated report: LOCATION: DDFIF0B
   if (/^LOCATION:\s*(\w+)$/) {
      $location = $1;
   }
   #how it appears in the formated report: SUBSYSTEM: IF1B IF1B changes, never the same
   elsif (/^SUBSYSTEM:\s*(\w+)$/) {
      $location = $1;
   }
   #how it appears in the formated report: PRIMAUTH: MTSCLMS1  PLANNAME: PUL0B NOTE: primauth and planname change
   elsif (/^PRIMAUTH:\sPLANNAME:\s*(\w+)$/) {
      $location = $1;
   }
   #how it appears in the formated report: INTERVAL FROM: 12/06/05 22:03:17.59

   elsif (/^INTERVAL\sFROM:\s*(\w+)$/) { #NOTE: I'm working on how i can regex the dd/mm/yy hh:mm:ss:ms
      $location = $1;
   }
   #how it appears in the formated report:   TO: 12/07/05 01:59:59.01

   elsif (/^TO:\s*(\w+)$/) {  #NOTE: I'm working on how i can define the dd/mm/yy hh:mm:ss:ms
      $location = $1;
   }
   #how #occurences appears in the formated report:  #OCCURRENCES    :     1534  
   elsif (/^#OCCURRENCES\s:\s*(\d+)$/) {
      $location = $1;
   }
   #how deadlock appears in the report:     DEADLOCKS                  0.00        0

   elsif (/^DEADLOCKS\s*(\d+\s\d+)$/) {
      $location = $1;
   }
   else {
      next;
   }
}
close(REPORT);

you could use '#' to denote a comment in your code and any perl coder will know it's a comment.
 
I have been truing to figure out why I am getting the following error when I run this program..

syntax error at C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt line 22, near "while"
Missing right curly or square bracket at C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt line 66, at end of line
Execution of C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt aborted due to compilation errors.

Below is the code again, cleaned up:

#use File::Basename;

use strict;
use warnings;

my $file = $ARGV[0]; # 1st command line argument
open(REPORT, $file) or die "Can't open file $file, $!\n";

my ($location, $subsystem, $fromtime, $totime,
$primauth, $planname, $average, $applcl1,
$db2cl2, $cputime, $occurrences, $commits,
$rollbbacks, $deadlocks) =

while (<REPORT>) {
chomp;
if (/^LOCATION:\s*(\w+)$/) {
$location = $1;
}
elsif (/^SUBSYSTEM:\s*(\w+)$/) {
$location = $1;
}
elsif (/^PRIMAUTH:\sPLANNAME:\s*(\w+)$/) {
$location = $1;
}
elsif (/^INTERVAL\sFROM:\s*(\w+)$/) {
$location = $1;
}
elsif (/^TO:\s*(\w+)$/) {
$location = $1;
}
elsif (/^#OCCURRENCES\s:\s*(\d+)$/) {
$location = $1;
}
elsif (/^ELAPSED\sTIME\s\s*(\d+)/) {
$location = $1;
}
elsif (/^CPU\sTIME\s\s*(\d+)/) {
$location = $1;
}
elsif (/^DEADLOCKS\s*(\d+\s\d+)$/) {
$location = $1;
}
else {
next;
}

close(REPORT);



 
you need a closing curly bracket:

Code:
    else {
      next;
   }
}#<-- here is the closing bracket you are missing
close(REPORT);
 
Alternative, using more generic parsing

Code:
use Data::Dumper;
my %perfdata

while (<REPORT>) {
   chomp;
   my ($key, $value) = split(/:/, $_);
   $perfdata{$key} = $value;
   if ($key eq "DEADLOCKS") {
      print Dumper(%perfdata);
   }
}
This buffers all the values in the %perfdata hash until you hit some trigger, in this case I have assumed that DEADLOCKS is the last statistic in a group. You can then print them out in any way you like. I've used Dumper as an easy way to show you what you have. You'll have to format it in the way that makes the most sense to your needs.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Thanks for the replies.

I am still getting errors that just make me want to throw my coffee all over the screen:)

--------start of error--------------------------------
syntax error at C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt line 22, near "while"
syntax error at C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt line 54, near "}"
Execution of C:\Program Files\ActiveState Komodo 3.5\Parse-DB2 Accounting.txt aborted due to compilation errors.

--------end of error----------------------------------

I am confused as to why the 'while' statement is an issue. So any help will be greatly appreciated..


Stevexff, thanks for the suggestion, but I am not sure how to put all of it together. So I will go with the long way for now, just for my lack of grasping the flow. I read, and read these Perl Cookbooks and I am just not getting it. Is learning a new coding language always this way? Meaning, one day I will wake up and say, "I finally get it!!!"..

Anybody know why I am getting the error above?

Thanks.
 
Can you post some lines of the report file as a sample? Two or three groups should be sufficient. Might make it easier to help you.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top