Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Processing and not just parsing a file -- Please help !

Status
Not open for further replies.

johnsbn

Technical User
Mar 23, 2010
12
0
0
US
Hello all,

I have a file similar to this, (about 1G)

==================================================
startpoint: AAA
endpoint: BBB
# body for every start/end combo
a/Xxxxxx yyy num1 num2 num3 & num4
.
.
.
a/ttttttt fff num4 num5 num6 * num722
b/ttttttt fff num44 num50 non_zero * num734
.
.
.
b/yyyyyy fff num43 num52 num65 num745

startpoint: CCC
endpoint: DDD

zzzzzz yyy numa numb numc & num4
.
.
.
a4/nnnnn yyy num4 num5 num6 * num44
bb/nnnnn yyy num4 num5 num6 * num9
bb/kkkkk yyy num4 num5 non_zero * num8

.
.
.
===================================================================
For every startpoint/endpoint combo, I need to go into the body and find the line with “*”. (As I have pointed out, not all lines will have a “*” in their 6th column. ) Also there will be multiple lines inside the body with “*” in column 6, but I need to pick the last one of them. This last line holds a non zero number in column 5. The rest of the lines with “char1” in column 6 will have zeros in their respective column 5s.

Finally print the startpoint/endpoint combo along with its “non_zero” column 5 value and column 7. Here is another complication for me – I also need to print the (n-1)th line’s column 1 before the last “*” for every start/end combo. For instance, if you look at the 2nd start/end combo in the above example, I want print “bb/nnnnn” .

O/P file,

Startpoint endpoint column5 column7 column1(of previous line)

I’m a complete newbie to perl – any help will be appreciated !

Thanks a lot,
John




 
Try this:

Code:
[gray]#!/usr/bin/perl -w[/gray]
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$startpoint[/blue] = [red]"[/red][purple][/purple][red]"[/red][red];[/red]
[black][b]my[/b][/black] [blue]$endpoint[/blue] = [red]"[/red][purple][/purple][red]"[/red][red];[/red]
[black][b]my[/b][/black] [blue]@a[/blue][red];[/red]
[black][b]my[/b][/black] [red]([/red][blue]$col1[/blue],[blue]$col5[/blue],[blue]$col7[/blue],[blue]$prevcol1[/blue][red])[/red][red];[/red]

[url=http://perldoc.perl.org/functions/sub.html][black][b]sub[/b][/black][/url] [maroon]printem[/maroon][red]([/red][red])[/red] [red]{[/red]
        [olive][b]if[/b][/olive] [red]([/red][blue]$startpoint[/blue] ne [red]"[/red][purple][/purple][red]"[/red][red])[/red] [red]{[/red]
                [url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$startpoint[/blue] [blue]$endpoint[/blue] [blue]$col5[/blue] [blue]$col7[/blue] [blue]$prevcol1[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
        [red]}[/red]
        [blue]$startpoint[/blue]=[blue]$endpoint[/blue]=[blue]$col5[/blue]=[blue]$col7[/blue]=[red]"[/red][purple][/purple][red]"[/red][red];[/red]
[red]}[/red]

[olive][b]while[/b][/olive] [red]([/red]<>[red])[/red] [red]{[/red]
        [url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
        [gray][i]# skip blank lines or comments[/i][/gray]
        [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^#[/purple][red]/[/red] || [red]/[/red][purple]^[[:space:]]*$[/purple][red]/[/red][red])[/red][red];[/red]
        [black][b]my[/b][/black] [blue]@a[/blue]=[url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url][red];[/red]
        [gray][i]# print the previous record (if there was one) when we[/i][/gray]
        [gray][i]# hit a start point[/i][/gray]
        [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^startpoint:[/purple][red]/[/red][red])[/red] [red]{[/red] [maroon]&printem[/maroon][red];[/red] [blue]$startpoint[/blue] = [blue]$a[/blue][red][[/red][fuchsia]1[/fuchsia][red]][/red][red];[/red] [olive][b]next[/b][/olive][red];[/red] [red]}[/red]
        [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^endpoint:[/purple][red]/[/red][red])[/red] [red]{[/red] [blue]$endpoint[/blue] = [blue]$a[/blue][red][[/red][fuchsia]1[/fuchsia][red]][/red][red];[/red] [olive][b]next[/b][/olive][red];[/red] [red]}[/red]
        [gray][i]# save the 5th and 7th columns if the 6th is a *[/i][/gray]
        [olive][b]if[/b][/olive] [red]([/red][blue]$a[/blue][red][[/red][fuchsia]5[/fuchsia][red]][/red] eq [red]'[/red][purple]*[/purple][red]'[/red][red])[/red] [red]{[/red] [blue]$col5[/blue] = [blue]$a[/blue][red][[/red][fuchsia]4[/fuchsia][red]][/red][red];[/red] [blue]$col7[/blue] = [blue]$a[/blue][red][[/red][fuchsia]6[/fuchsia][red]][/red][red];[/red] [red]}[/red]
        [gray][i]# save the 1st column (and its previous value)[/i][/gray]
        [blue]$prevcol1[/blue] = [blue]$col1[/blue][red];[/red]
        [blue]$col1[/blue] = [blue]$a[/blue][red][[/red][fuchsia]0[/fuchsia][red]][/red][red];[/red]
[red]}[/red]

[gray][i]# print out the final record[/i][/gray]
[maroon]&printem[/maroon][red];[/red]

I'm assuming that you want the first column from the (n-1)th line, regardless of whether the fifth column of that line is an asterisk. If not, just move the $col1 and $prevcol1 assignments up into the previous if clause.

Annihilannic.
 
Thanks a lot Annihilannic.

When I exceute your code, I get the following error at this line,
% if ($a[5] eq '*') { $col5 = $a[4]; $col7 = $a[6]; }

Error,
Use of uninitialized value in string eq at parse.pl line 43, <IP_FILE> line 50070.

Thanks again.
John
 
What's on the 50070th line of the input file? I'm guessing it may contain fewer than 6 fields... if there are a few like that and it is okay to ignore them you can add the following:

Code:
        [olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]^endpoint:[/purple][red]/[/red][red])[/red] [red]{[/red] [blue]$endpoint[/blue] = [blue]$a[/blue][red][[/red][fuchsia]1[/fuchsia][red]][/red][red];[/red] [olive][b]next[/b][/olive][red];[/red] [red]}[/red]
        [gray][i]# skip any lines with incorrect number of fields[/i][/gray]
        [olive][b]next[/b][/olive] [olive][b]if[/b][/olive] [red]([/red][blue]@a[/blue] != [fuchsia]7[/fuchsia][red])[/red][red];[/red]
        [gray][i]# save the 5th and 7th columns if the 6th is a *[/i][/gray]

Annihilannic.
 
There are a lot of lines with incorrect number no fields. I'm not getting any errors after adding @a != 8 in the script, but there is no output either.

Here is the actual file,

Startpoint: A/XX
(rising edge-triggered flip-flop clocked by clk)
Endpoint: B/YY
(rising edge-triggered flip-flop clocked by clk)
Path Group: clk
Path Type: max

Point Cap Trans Derate Incr Path
-------------------------------------------------------------------------------------------------------------------------
Clock clk (rise edge) 0.027 0.000 0.000
clock network delay (ideal) 0.000 0.000
A/CP (D1) 0.027 1.000 0.000 0.000 r
A/Q (D1) 0.001 0.012 1.000 0.056 0.056 r
A/A2 (D2) 0.012 1.000 0.000 0.056 r
A/Z (D2) 0.002 0.008 1.000 0.021 0.077 r
A_d2 (rrhdl) 0.000 1.000 0.000 0.077 r
A/r_d2 (andr) 0.000 1.000 0.000 0.077 r
A/I (BUF) 0.008 1.000 0.000 0.077 r
A/Z (BUF) 0.009 0.023 1.000 0.022 & 0.099 r
Ar/d2 (main) <- 0.000 1.000 0.000 * 0.099 r
Ar_d2 (main2) 0.000 1.000 0.000 * 0.099 r
Ar_d2 (logic) 0.000 1.000 0.000 * 0.099 r
A/D (D8) 0.023 1.000 0.191 * 0.290 r
data arrival time 0.290

clock clk (rise edge) 0.027 0.444 0.444
clock network delay (ideal) 0.000 0.444
clock reconvergence pessimism 0.000 0.444
clock uncertainty -0.051 0.393
library setup time 1.000 -0.015 0.378
data required time 0.378
-------------------------------------------------------------------------------------------------------------------------
data arrival time -0.290
-------------------------------------------------------------------------------------------------------------------------
slack (MET) 0.089

====================================================

Also one more thing I forgot to mention - I want to print the data arrival time and its corresponding value. Note that this data arrival time occurs twice in the file - I just want to capture the positive no and print it along with the rest of the fileds mentioned earlier....

So for this example my o/p file should contain,
A/XX B/YY 0.191 0.290 Ar_D2 0.290


Thanks again for your help,
John
 
It always helps to see some of the actual input data. In this case it shows that the problem is much more complex than you originally described. For example, what you describe as the "7th field" is actually at a varying field position if you split up by spaces as I did, because things like "Ar/d2", "(main)" and "<-" count as fields as well, and they don't appear on every line. And because the fields are arranged in columns, with some of them being blank, it's impossible to tell which column they actually belong to. Even when I paste it into a text editor with fixed-width font, they don't line up. The apparently random mix of spaces and tabs probably doesn't help.

Is there some other criteria you can use to identify the lines you are interested in? Can you go by the "Point" name of the row, or does that vary for each set of data?

Annihilannic.
 
The formatting for some reason gets messed up when I paste it here. My apologies.

Here's what we can do.

Instead of the (n-1)th row, column 1 value - we can grep for the line with "<-" and store its column 1 value.

For column 5 and column 7 values,

You are right about the spaces and tabs. But the line of interest will have a "*" or a "H" in column 6 always. That row will have 8 columns always if we split by spaces. So for a row with a * or H, we need to look at its column 5. If the value equals 0.000 then ignore that row and move to the next. If it's a non_zero value, then store column 5 and column 7.

Thanks,
John
 
Ann,

I got it to work. I tried using the way I described and was able to get it working. Thanks a lot to you. Really appreciate your help.

- John
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top