Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

matching on mulitple lines

Status
Not open for further replies.

howdthattaste

Programmer
Apr 12, 2007
17
US
hello,

i have an interesting question, which, knowing perl, is probably summed up in nice easy to implement function already hammered out ages ago.

the question (which isn't that helpful at first) is: how do i match on multiple lines?

for example if we have the lines:
Code:
0001233 - [this is text which fits on one line]
0001234 - [this is text which has a really really long sentence and 
0001235 - wraps to the next line]
0001236 - [another line that doesn't wrap]
...
(obviously, you browser wrapping could make this look weirder, try to maxize to get the full effect)

As we can see the provider has supplied a line number sequence and has also shown that lines start and end with brackets. i want to read through this file and strip out the 'complete' lines... meaning the source file provided 4 lines of data, but in reality, there are only 3. I just want the 3, without the sequence number.

Can anyone help?

i step through files like this:
Code:
while(<FILE>)
{
   do some processing on $_
   print some output
}
...but since the data i need to process is split amongst two lines, how can i get them together?


ps. there is newline character at the end of each line in the source file.


----------------------------------
"Not New York..., Kansas
 
This works.. but there's probably a better way I would think.

Code:
[blue]@file[/blue] = [red]([/red][red]"[/red][purple]0001233 - [this is text which fits on one line][/purple][red]"[/red], 
	[red]"[/red][purple]0001234 - [this is text which has a really really long sentence and[/purple][red]"[/red],
	[red]"[/red][purple]0001235 - wraps to the next line][/purple][red]"[/red],
	[red]"[/red][purple]0001236 - [another line that doesn't wrap][/purple][red]"[/red][red])[/red][red];[/red]


[olive][b]for[/b][/olive] [blue]$line[/blue] [red]([/red][blue]@file[/blue][red])[/red] [red]{[/red]
	[red]([/red][url=http://perldoc.perl.org/functions/undef.html][black][b]undef[/b][/black][/url], [blue]$rest[/blue][red])[/red] = [url=http://perldoc.perl.org/functions/split.html][black][b]split[/b][/black][/url] [red]/[/red][purple][purple][b]\s[/b][/purple]+[purple][b]\-[/b][/purple][purple][b]\s[/b][/purple]+[/purple][red]/[/red], [blue]$line[/blue][red];[/red]
	[blue]@bulk[/blue] = [black][b]split[/b][/black] [red]/[/red][purple][/purple][red]/[/red], [blue]$rest[/blue][red];[/red]
	[olive][b]for[/b][/olive] [blue]$char[/blue] [red]([/red][blue]@bulk[/blue][red])[/red] [red]{[/red]
		[olive][b]if[/b][/olive] [red]([/red][blue]$char[/blue] eq [red]"[/red][purple]][/purple][red]"[/red][red])[/red] [red]{[/red]
			[url=http://perldoc.perl.org/functions/push.html][black][b]push[/b][/black][/url] [blue]@out[/blue], [blue]$completeline[/blue][red];[/red]
			[blue]$completeline[/blue] = [red]"[/red][purple][/purple][red]"[/red][red];[/red]
		[red]}[/red]
		[olive][b]elsif[/b][/olive] [red]([/red][blue]$char[/blue] eq [red]"[/red][purple][[/purple][red]"[/red][red])[/red] [red]{[/red]
			[olive][b]next[/b][/olive][red];[/red]
		[red]}[/red]
		[olive][b]else[/b][/olive] [red]{[/red]
			[blue]$completeline[/blue] .= [red]"[/red][purple][blue]$char[/blue][/purple][red]"[/red][red];[/red]
		[red]}[/red]
	[red]}[/red]
[red]}[/red]

[olive][b]for[/b][/olive] [blue]$line[/blue] [red]([/red][blue]@out[/blue][red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$line[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[red]}[/red]


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
I came up with two solutions for this. I'll give the better one first, and will throw in the second just for amusement. Both rely on regular expressions, but the second is much more hard-nosed about it.

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$wrapped_line[/blue][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	
	[url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Invalid line format: [blue]$![/blue][/purple][red]"[/red]
		[olive][b]if[/b][/olive] ![red]m{[/red][purple]^ ([purple][b]\d[/b][/purple]+) [purple][b]\s[/b][/purple]+-[purple][b]\s[/b][/purple]+ ([purple][b]\[[/b][/purple]?) (.*?) ([purple][b]\][/b][/purple]?) $[/purple][red]}[/red][red]x[/red][red];[/red]
	
	[black][b]my[/b][/black] [red]([/red][blue]$id[/blue], [blue]$is_start[/blue], [blue]$content[/blue], [blue]$is_end[/blue][red])[/red] = [red]([/red][blue]$1[/blue], [blue]$2[/blue], [blue]$3[/blue], [blue]$4[/blue][red])[/red][red];[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][blue]$is_end[/blue][red])[/red] [red]{[/red]
		[blue]$content[/blue] = [blue]$wrapped_line[/blue] . [blue]$content[/blue][red];[/red]
		[blue]$wrapped_line[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
		
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$id[/blue] - [blue]$content[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[blue]$wrapped_line[/blue] .= [blue]$content[/blue][red];[/red]
	[red]}[/red]
[red]}[/red]

[teal]__DATA__[/teal]
[teal]0001233 - [this is text which fits on one line][/teal]
[teal]0001234 - [this is text which has a really really long sentence and [/teal]
[teal]0001235 - wrap ends here][/teal]
[teal]0001236 - [another line that doesn't wrap][/teal]
[teal]0001237 - [this is text which has a really really long sentence and [/teal]
[teal]0001238 - wraps to the next line[/teal]
[teal]0001239 - wraps ends][/teal]
[teal]0001240 - [another line that doesn't wrap][/teal]

Or the original version. Playing with tokenization:

Code:
[url=http://perldoc.perl.org/functions/use.html][black][b]use[/b][/black][/url] [green]strict[/green][red];[/red]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$wrapped_line[/blue][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<DATA>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	[black][b]my[/b][/black] [blue]$line[/blue] = [blue]$_[/blue][red];[/red]
	
	[gray][i]# Parse out ID[/i][/gray]
	[black][b]my[/b][/black] [blue]$id[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]/[/red][purple]^([purple][b]\d[/b][/purple]+)[purple][b]\s[/b][/purple]+-[purple][b]\s[/b][/purple]+[/purple][red]/[/red][red]gc[/red][red])[/red] [red]{[/red]
		[blue]$id[/blue] = [blue]$1[/blue][red];[/red]
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Invalid line format: [blue]$line[/blue][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	
	[gray][i]# Parse out Content[/i][/gray]
	[black][b]my[/b][/black] [blue]$content[/blue][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red][blue]$wrapped_line[/blue][red])[/red] [red]{[/red] [gray][i]# Continuing Wrapped Content [/i][/gray]
		[olive][b]if[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] (.*) [purple][b]\][/b][/purple] [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# End of Wrap[/i][/gray]
			[blue]$content[/blue] = [blue]$wrapped_line[/blue] . [blue]$1[/blue][red];[/red]
			[blue]$wrapped_line[/blue] = [red]'[/red][purple][/purple][red]'[/red][red];[/red]
			
		[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] (.*) [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Continuation of Wrap[/i][/gray]
			[blue]$wrapped_line[/blue] .= [blue]$1[/blue][red];[/red]
		[red]}[/red]
		
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] [purple][b]\[[/b][/purple] (.*) [purple][b]\][/b][/purple] [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Single line[/i][/gray]
		[blue]$content[/blue] = [blue]$1[/blue][red];[/red]
		
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$line[/blue] =~ [red]m{[/red][purple] [purple][b]\G[/b][/purple] [purple][b]\[[/b][/purple] (.*) [purple][b]\z[/b][/purple] [/purple][red]}[/red][red]xgc[/red][red])[/red] [red]{[/red] [gray][i]# Start of Wrap[/i][/gray]
		[blue]$wrapped_line[/blue] = [blue]$1[/blue][red];[/red]
		
	[red]}[/red] [olive][b]else[/b][/olive] [red]{[/red]
		[black][b]die[/b][/black] [red]"[/red][purple]Invalid line format: [blue]$line[/blue][/purple][red]"[/red][red];[/red]
	[red]}[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][blue]$content[/blue][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] [red]"[/red][purple][blue]$id[/blue] -> [blue]$content[/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red]
[red]}[/red]

[teal]__DATA__[/teal]
[teal]0001233 - [this is text which fits on one line][/teal]
[teal]0001234 - [this is text which has a really really long sentence and [/teal]
[teal]0001235 - wrap ends here][/teal]
[teal]0001236 - [another line that doesn't wrap][/teal]
[teal]0001237 - [this is text which has a really really long sentence and [/teal]
[teal]0001238 - wraps to the next line[/teal]
[teal]0001239 - wraps ends][/teal]
[teal]0001240 - [another line that doesn't wrap][/teal]

And the output from either:

Code:
>perl scratch.pl
0001233 -> this is text which fits on one line
0001235 -> this is text which has a really really long sentence and wrap ends here
0001236 -> another line that doesn't wrap
0001239 -> this is text which has a really really long sentence and wraps to the next line wraps ends
0001240 -> another line that doesn't wrap

- Miller
 
Short and simple:
Code:
while ( <DATA> ) {
  chomp;
  s/^\d+ - \[?//;
  $_ .= ' '; 
  s/\] /\n/; 
  print;
}
 
Obligatory:

Code:
>perl -pe "chomp;s/.*? - \[?//;s/\]$/\n/;" file.txt
 
thanks guys, i believe the $wrapped_line script is going to do it... the replace/one line versions simply "display" what im looking for, im really in need of processing that entire line.

i probably didn't make that too clear anyway.

but thanks!

----------------------------------
"Not New York..., Kansas
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top