matching on mulitple lines

howdthattaste · Aug 7, 2007

hello,

i have an interesting question, which, knowing perl, is probably summed up in nice easy to implement function already hammered out ages ago.

the question (which isn't that helpful at first) is: how do i match on multiple lines?

for example if we have the lines:

Code:

0001233 - [this is text which fits on one line]
0001234 - [this is text which has a really really long sentence and 
0001235 - wraps to the next line]
0001236 - [another line that doesn't wrap]
...

(obviously, you browser wrapping could make this look weirder, try to maxize to get the full effect)

As we can see the provider has supplied a line number sequence and has also shown that lines start and end with brackets. i want to read through this file and strip out the 'complete' lines... meaning the source file provided 4 lines of data, but in reality, there are only 3. I just want the 3, without the sequence number.

Can anyone help?

i step through files like this:

Code:

while(<FILE>)
{
   do some processing on $_
   print some output
}

...but since the data i need to process is split amongst two lines, how can i get them together?

ps. there is newline character at the end of each line in the source file.

----------------------------------
"Not New York..., Kansas

travs69 · Aug 7, 2007

This works.. but there's probably a better way I would think.

Code:

[blue]@file[/blue] = [red]([/red][red]"[/red][purple]0001233 - [this is text which fits on one line][/purple][red]"[/red], 
	[red]"[/red][purple]0001234 - [this is text which has a really really long sentence and[/purple][red]"[/red],
	[red]"[/red][purple]0001235 - wraps to the next line][/purple][red]"[/red],
	[red]"[/red][purple]0001236 - [another line that doesn't wrap][/purple][red]"[/red][red])[/red][red];[/red]


[olive][b]for[/b][/olive] [blue]$line[/blue] [red]([/red][blue]@file[/blue][red])[/red] [red]{[/red]
	[red]([/red][url=http://perldoc.perl.org/functions/undef.html][black][b]undef[/b][/black][/url], [blue]$rest[/blue][red])[/red] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url] [red]/[/red][purple][purple][b]\s[/b][/purple]+[purple][b]\-[/b][/purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red], [blue]$line[/blue][red];[/red]
	[blue]@bulk[/blue] = [black][b]split[/b][/black] [red]/[/red][purple][/purple][red]/[/red], [blue]$rest[/blue][red];[/red]
	[olive][b]for[/b][/olive] [blue]$char[/blue] [red]([/red][blue]@bulk[/blue][red])[/red] [red]{[/red]
		[olive][b]if[/b][/olive] [red]([/red][blue]$char[/blue] eq [red]"[/red][purple]][/purple][red]"[/red][red])[/red] [red]{[/red]
			[url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@out[/blue], [blue]$completeline[/blue][red];[/red]
			[blue]$completeline[/blue] = [red]"[/red][purple][/purple][red]"[/red][red];[/red]
		[red]}[/red]
		[olive][b]elsif[/b][/olive] [red]([/red][blue]$char[/blue] eq [red]"[/red][purple][[/purple][red]"[/red][red])[/red] [red]{[/red]
			[olive][b]next[/b][/olive][red];[/red]
		[red]}[/red]
		[olive][b]else[/b][/olive] [red]{[/red]
			[blue]$completeline[/blue] .= [red]"[/red][purple][blue]$char[/blue][/purple][red]"[/red][red];[/red]
		[red]}[/red]
	[red]}[/red]
[red]}[/red]

[olive][b]for[/b][/olive] [blue]$line[/blue] [red]([/red][blue]@out[/blue][red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$line[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]

Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;

MillerH · Aug 7, 2007

I came up with two solutions for this. I'll give the better one first, and will throw in the second just for amusement. Both rely on regular expressions, but the second is much more hard-nosed about it.

Code:

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$wrapped_line[/blue][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	
	[url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Invalid line format: [blue]$![/blue][/purple][red]"[/red]
		[olive][b]if[/b][/olive] ![red]m{[/red][purple]^ ([purple][b]\d[/b][/purple]+) [purple][b]\s[/b][/purple]+-[purple][b]\s[/b][/purple]+ ([purple][b]\[[/b][/purple]?) (.*?) ([purple][b]\][/b][/purple]?) $[/purple][red]}[/red][red]x[/red][red];[/red]
	
	[black][b]my[/b][/black] [red]([/red][blue]$id[/blue], [blue]$is_start[/blue], [blue]$content[/blue], [blue]$is_end[/blue][red])[/red] = [red]([/red][blue]$1[/blue], [blue]$2[/blue], [blue]$3[/blue], [blue]$4[/blue][red])[/red][red];[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][blue]$is_end[/blue][red])[/red] [red]{[/red]
		[blue]$content[/blue] = [blue]$wrapped_line[/blue] . [blue]$content[/blue][red];[/red]
		[blue]$wrapped_line[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
		
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$id[/blue] - [blue]$content[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[blue]$wrapped_line[/blue] .= [blue]$content[/blue][red];[/red]
	[red]}[/red]
[red]}[/red]

[teal]__DATA__[/teal]
[teal]0001233 - [this is text which fits on one line][/teal]
[teal]0001234 - [this is text which has a really really long sentence and [/teal]
[teal]0001235 - wrap ends here][/teal]
[teal]0001236 - [another line that doesn't wrap][/teal]
[teal]0001237 - [this is text which has a really really long sentence and [/teal]
[teal]0001238 - wraps to the next line[/teal]
[teal]0001239 - wraps ends][/teal]
[teal]0001240 - [another line that doesn't wrap][/teal]

Or the original version. Playing with tokenization:

Code:

[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$wrapped_line[/blue][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	[black][b]my[/b][/black] [blue]$line[/blue] = [blue]$_[/blue][red];[/red]
	
	[gray][i]# Parse out ID[/i][/gray]
	[black][b]my[/b][/black] [blue]$id[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]/[/red][purple]^([purple][b]\d[/b][/purple]+)[purple][b]\s[/b][/purple]+-[purple][b]\s[/b][/purple]+[/purple][red]/[/red][red]gc[/red][red])[/red] [red]{[/red]
		[blue]$id[/blue] = [blue]$1[/blue][red];[/red]
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Invalid line format: [blue]$line[/blue][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	
	[gray][i]# Parse out Content[/i][/gray]
	[black][b]my[/b][/black] [blue]$content[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$wrapped_line[/blue][red])[/red] [red]{[/red] [gray][i]# Continuing Wrapped Content [/i][/gray]
		[olive][b]if[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] (.*) [purple][b]\][/b][/purple] [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# End of Wrap[/i][/gray]
			[blue]$content[/blue] = [blue]$wrapped_line[/blue] . [blue]$1[/blue][red];[/red]
			[blue]$wrapped_line[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
			
		[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] (.*) [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Continuation of Wrap[/i][/gray]
			[blue]$wrapped_line[/blue] .= [blue]$1[/blue][red];[/red]
		[red]}[/red]
		
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] [purple][b]\[[/b][/purple] (.*) [purple][b]\][/b][/purple] [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Single line[/i][/gray]
		[blue]$content[/blue] = [blue]$1[/blue][red];[/red]
		
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] [purple][b]\[[/b][/purple] (.*) [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Start of Wrap[/i][/gray]
		[blue]$wrapped_line[/blue] = [blue]$1[/blue][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[black][b]die[/b][/black] [red]"[/red][purple]Invalid line format: [blue]$line[/blue][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][blue]$content[/blue][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$id[/blue] -> [blue]$content[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red]
[red]}[/red]

[teal]__DATA__[/teal]
[teal]0001233 - [this is text which fits on one line][/teal]
[teal]0001234 - [this is text which has a really really long sentence and [/teal]
[teal]0001235 - wrap ends here][/teal]
[teal]0001236 - [another line that doesn't wrap][/teal]
[teal]0001237 - [this is text which has a really really long sentence and [/teal]
[teal]0001238 - wraps to the next line[/teal]
[teal]0001239 - wraps ends][/teal]
[teal]0001240 - [another line that doesn't wrap][/teal]

And the output from either:

Code:

>perl scratch.pl
0001233 -> this is text which fits on one line
0001235 -> this is text which has a really really long sentence and wrap ends here
0001236 -> another line that doesn't wrap
0001239 -> this is text which has a really really long sentence and wraps to the next line wraps ends
0001240 -> another line that doesn't wrap

- Miller

brigmar · Aug 7, 2007

Short and simple:

Code:

while ( <DATA> ) {
  chomp;
  s/^\d+ - \[?//;
  $_ .= ' '; 
  s/\] /\n/; 
  print;
}

MillerH · Aug 7, 2007

Obligatory:

Code:

>perl -pe "chomp;s/.*? - \[?//;s/\]$/\n/;" file.txt

howdthattaste · Aug 8, 2007

thanks guys, i believe the $wrapped_line script is going to do it... the replace/one line versions simply "display" what im looking for, im really in need of processing that entire line.

i probably didn't make that too clear anyway.

but thanks!

----------------------------------
"Not New York..., Kansas

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

matching on mulitple lines

howdthattaste

Programmer

travs69

MIS

MillerH

Programmer

brigmar

Programmer

MillerH

Programmer

howdthattaste

Programmer

Similar threads

Part and Inventory Search

Sponsor