Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing a string using regex

Status
Not open for further replies.

scareyman

Technical User
Apr 28, 2004
7
US
I have a string that I want to extract fields from.

The string is awkward (to me anyway).
It is as follows:
018:000000000000100778,010:abcdefghij,004:xx,xx,003:bbb,017:abcdefghijklmnopq,001:x

Each 3 digit number before the : is the size of the proceeding field i.e. 018:000000000000100778 means 000000000000100778 is 18 chars long.

Unfortunately i can't control this being spewed out from my data source.

I was wondering if there was a regex type thing that would allow me to replace the 3 chars before each : with nothing i.e. 018: would become :
That way I could parse the string easily.

The values I would want from the above example would be:
000000000000100778
abcdefghij
xx,xx
bbb
abcdefghijklmnopq
x

I thought I had it working using rindex etc, but it was messy and didn't work properly.

Any help would be appreciated.
 
If I understood your pattern correctly, then you got a typo. To me, your string should be like this:

018:000000000000100778,010:abcdefghij,004:xxxx,003:bbb,017:abcdefghijklmnopq,001:x


I think there are multiple ways to do this. Here are two examples:

Code:
my $str = '018:000000000000100778,010:abcdefghij,004:xxxx,003:bbb,017:abcdefghijklmnopq,001:x';
$str =~ s/\d\d\d\://g;

OR

Code:
my $str = '018:000000000000100778,010:abcdefghij,004:xxxx,003:bbb,017:abcdefghijklmnopq,001:x';
$str =~ s/\d+://g;

 
Hi
thanks for replying.

No Typo I'm afraid.

If there were no commas, then I'd be able to use it as some sort of delimeter.

However, the field with xx,xx is one field i.e. the value is "xx,xx".
 
However, I think I can do it as follows, absed on your example:
s/^\d\d\d://g;
s/(,\d\d\d:)|(\d\d\d:)/:/g;

This will give me
000000000000100778:abcdefghij:xx,xx:bbb:abcdefghijklmnopq:x

Thanks for your help.
 
This may work for you too:
Code:
my $str = '018:000000000000100778,010:abcdefghij,004:xx,xx,003:bbb,017:abcdefghijklmnopq,001:x';
my (@lengths, @values); 

foreach (split /,(?=\d{3}\:)/, $str) {
	my ($length, $value) = split /:/, $_;
	push @lengths, $length;
	push @values, $value;
}
 
Hi,

I think this is what you are expecting. It splits the string into words each in a new line.

Code:
use strict;
my $str = '018:000000000000100778,010:abcdefghij,004:xx,xx,003:bbb,017:abcdefghijklmnopq,001:x';
$str =~ s/(,\d\d\d\:)/\n/g;
$str =~ s/(\d\d\d\:)//g;
print $str;

Output:
-------
000000000000100778
abcdefghij
xx,xx
bbb
abcdefghijklmnopq
x
 
I think the best for you is to follow the structure of your data, using the length information. This because any other method would fail if groups of three digits and one colon could be present inside a field.
However you should explain why the field pointed to by whn is quoted as 4 char length instead of 5. Assumung the correct version is 5, you would go this way:
Code:
my$str='018:000000000000100778,010:abcdefghij,005:xx,xx,003:bbb,017:abcdefghijklmnopq,001:x';
my(@values,$len);
while(length$str){
  $len=substr($str,0,3);
  push@values,substr($str,4,$len);
  $str=substr($str,$len+5);
}
$str=join',',@values;
Note that some error checking on the integrity of your data should be added to the above code.

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Not sure if this is anymore better or worse, but it sure is more cryptic looking:

Code:
my $str = '018:000000000000100778,010:abcdefghij,005:xx,xx,003:bbb,017:abcdefghijklmnopq,001:x';
push @l,$1 while ($str =~ s/,?(\d+)://);
my @fields = unpack((join "", map {"A$_"} @l), $str);
print "$_\n" for @fields;

Also assumes the third field should be 5 instead of 4

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Hi
thanks for the replies.
Now solved as follows:
Code:
          s/^\d\d\d://g;  #  remove 1st 'length' field
          s/(,\d\d\d:)|(\d\d\d:)/:/g; # remove middle 'length' fields
          s/:$//g; # remove last 'length' field

And the field xx,xx was xxxx (and therefore 4 in length) but I added a comma to show that a comma could not be used as a delimeter, but forgot to change the length to 5.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top