Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

extracting text between two tags in a very long string 1

Status
Not open for further replies.

czarj

Technical User
Apr 22, 2004
130
US
I have a file containing about 30,000 characters on 3-4 lines. I need to extract the text between two tags <TAG1> and <TAG2>. I've tried a couple things but nothing yields the results i'm looking for. Below is an example of the text and the last code snippet I tried.

Text Example:
Code:
/i""","customAttributes":{},"alt":null,"href":null},{<TAG1>[URL unfurl="true"]http://i.mg.com/00/s/MTIwMFgxNjAw/z/~5UAAOxyoA1RThdw/$T2eC16d,!ygE9s7HKL!RBRThdwL2fQ~~60_3.JPG<TAG2>","customAttributes":{},"alt":null,"href":null},{<TAG1>http://i.mg.com/00/s/MTIwMFgxNjAw/z/FLQAAOxy14VRTheJ/$T2eC16Z,!zEE9s3!YlV+BRTheJOOu!~~60_3.JPG<TAG2>","customAttributes":{},"alt":null,"href":null},{<TAG1>http://i.mg.com/00/s/MTIwMFgxNjAw/z/BMoAAMXQySpRTheV/$T2eC16N,!yEE9s5jE,jcBRTheUuoz!~~60_3.JPG<TAG2>"ew[/URL] vjo.darwin.core.thumbnailgrid.ThumbnailGrid({"instId":"th_js_vv4-38","cmpId":"vv4-38"

Perl example:
Code:
!/usr/bin/perl
my $input = "<TAG1>"; #Search string
my $end = "<TAG2>"; #End string
open my $DATA, '<', 'web1' or die $!;

undef $/;
$_=<$DATA>;
close $DATA;

@list = m/\b$input\b(.*?)\b$end\b/mg;
map { print "found : $_\n" } @list;

Desired Result:
Code:
[URL unfurl="true"]http://i.mg.com/00/s/MTIwMFgxNjAw/z/~5UAAOxyoA1RThdw/$T2eC16d,!ygE9s7HKL!RBRThdwL2fQ~~60_3.JPG[/URL]
[URL unfurl="true"]http://i.mg.com/00/s/MTIwMFgxNjAw/z/FLQAAOxy14VRTheJ/$T2eC16Z,!zEE9s3!YlV+BRTheJOOu!~~60_3.JPG[/URL]
[URL unfurl="true"]http://i.mg.com/00/s/MTIwMFgxNjAw/z/BMoAAMXQySpRTheV/$T2eC16N,!yEE9s5jE,jcBRTheUuoz!~~60_3.JPG[/URL]

--- You must not fight too often with one enemy, or you will teach him all your tricks of war.
 
Hi

man perlre said:
"\b" still means to match at the boundary between "\w" and "\W"
In your string the [highlight #cfc]word[/highlight] and [highlight #fcc]non-word[/highlight] characters are spread like this :

[tt][highlight #cfc]null[/highlight][highlight #fcc]},{<[/highlight][highlight #cfc]TAG1[/highlight][highlight #fcc]>[/highlight][highlight #cfc]http[/highlight][highlight #fcc]://[/highlight][/tt]

So right before "<TAG1>" there is no change from word to non-word, so there is not a word boundary.

Just remove the [tt]\b[/tt] assertion and will work.


Feherke.
[link feherke.github.com/][/url]
 
perfect thanks!


--- You must not fight too often with one enemy, or you will teach him all your tricks of war.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top