Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Formatting URL's from text file - not easy for me! 2

Status
Not open for further replies.

ednit

Technical User
Jan 31, 2006
16
US
I am attempting to reformat the structure of URL's inside a text file, but I am about pulling my hair out.

The urls are like this inside a text file:

Code:
[URL unfurl="true"]http://www.domain.com/http://www.somesite.com/index.phphttp://www.[/URL] . .

In other words, every URL is butted up next to the other without a space in between. I want to return the text into an easy to use format like so:

Code:
[URL unfurl="true"]http://domain.com[/URL]
[URL unfurl="true"]http://www.example.com[/URL]
[URL unfurl="true"]http://site.com/folder/[/URL]
etc. . .

The code I'm using is this:

Code:
open(DATA, "data.txt") or die "you stupid script";
while (<DATA>) {
@links = split(/^http+$/, $_);
print "$links[0]\n";
}
close DATA;

$links[0] prints out the whole @links - not only the first link.

I'm not a programmer really, more of a hack that just knows a few things. . . but this has had me stopped up for over 2 hours. Any help would be appreciated.

Thanks.
 
Your problem is in the regular expression. Try this one instead:

Code:
@links = split [COLOR=blue]'(?=[URL unfurl="true"]http://)'[/URL][/color], $_;
 
Thank you so much! That worked perfectly, and I've never seen something formatted like that before. . .I might never have figured it out.

The finished code, in case someone else needs to use it:

Code:
open(DATA, "data.txt") or die "you stupid script";
while (<DATA>) {
@links = split '(?=[URL unfurl="true"]http://)',[/URL] $_;
foreach $links (@links) { 
print "$links<br>\n"; 
}
}
close DATA;

So maybe I can understand it & learn from it, what does the equal sign & the question mark mean in this instance? Basically what does '?=matchtext' mean?

This part is not imperitive to the code, only I don't understand the why or what of the above regular expression that fixed my issue.

An explination is not necessary, but a big thank you again.
 
Thank you again, and especially for that link. I've never heard or read a reference of 'zero width positive lookahead assertion' before that I know of.


-Rob
 
Hello MillerH,

Your code works perfectly with the look-ahead extension.

While I was trying to add the modifier 'i' to make it case insensitive, I succeeded in doing it if I use '/' as a delimiter, but it leads to awkward code.

I understand that a prefix operator is required if we do not use / as the delimiter. Would you suggest to me what should be done using the single quote as a delimeter and the i modifier?

This is what I've got so far:

$line="#@array=split '(?= $line; # case sensitive
@array=split /(?=http:\/\/)/i, $line; # case insensitive
#@array=split '(?= $line; # compilation error
foreach $i (@array){printf("%s\n",$i);}
 
I love it when people answer their own questions. It's just soo .... efficient. :)

I believe that you came up with the best solution for your case-insensitivity problem. However, just to introduce an alternate method, it also would have been possible to embed the modifier in the regex.

Code:
my $line = "[URL unfurl="true"]http://www.abc.comHTTP://www.def.comhttp://www.ghi.com";[/URL]
my @array = split '[COLOR=green](?i)[/color](?=[URL unfurl="true"]http://)',[/URL] $line;
print "$_\n" foreach @array;

This can be read more about in perldoc as well:
 
While answering one's own question sounds a little... studpid, for lack of a better word, it is sure efficient, as you said, saving everyone else from having to work on the problem if an answer has already been found.

I vote your answer with the embedded modifier as the best, as it can be varied, turned on and off through the expression. Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top