Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Reg Ex Parsing a string with variable length fields

Status
Not open for further replies.

RileyCat

Programmer
Apr 5, 2004
124
US
I have a string that might look something like this ...

'8110003099704499441399110600308123191010'

Out of this string I need to parse out whatever defined components may be there. The string is dynamic so it will change from one instance to the next. Further, each string may contain a different set of components.

To complicate matters, each components of the string is a variable length. For instance this part . . .

"0030997".

The first "0" indicates the length of the component following if that component is longer than 6 digits. In this example, that first digit is 0 so we know we only need to read the next 6 digits to get the component value. If however that value were 1, we would know we would have to read the next 7 (6 + 1) digits to get the component value . . . and so on.

Other than the first four digits of this string ("8110"), the balance of the string is formatted in the manner I described above.

Anyway, I am totally and 100% stumped on how to even remotely parse a string of this nature using Regular Expressions.

Anybody have any suggestions, clues, or code snippets I can copy from that will allow me to parse a string of this nature?

Any and all help is greatly appreciated.

Thank you!

Stay Cool Ya'll! [smile2]

-- Kristin
 
I think that you can accomplish this using a few loops. After having taken (and removed) the first four digits, loop until the end of the string is reached. Inside this loop, take the first character. Go into another loop, which repeats for 6 + (the character casted to an integer), each time reading from the string.

Maybe I can provide some example later.

|| ABC
 
so the 1st 4 characters are static, after that they are dynamic? simple string indexing should do it.
Code:
public class ComponentParser()
{
   public string[] Parse(string componentsText)
   {
      List<string> components = new List<string>();
      int index = 4; //skip 1st 4 characters
      while(index < componentsText.Length)
      {
           int key = 6 + int.Parse(new String(componentsText[index ], 1));
           components.Add(componentsText.SubString(index. key);
           index += key;
      }
      return components.ToArray();
   }
}
good chance there is an off-by-one error in my code, but this is the general idea.

Jason Meckley
Programmer
Specialty Bakers, Inc.
 
[0] >Other than the first four digits of this string ("8110"), the balance of the string is formatted in the manner I described above.
That description could not be complete leaving out the trailing number uninterpreted.

[1] Unfortunately, backreference cannot be applied to the braces such as this figuratively:
[tt] (\d)\d{6}\d{\1} - figurative only[/tt]
but that's the idea.

[2] For this finitely enumerable case, like it or not, you can do it like this.
[tt]
string s="8110003099704499441399110600308123191010";
string pattern;
string[] a=new string[11];
a[0]=@"^\d{4}";
for (int i=1; i<a.Length; i++) {
a=(i-1)+@"\d{6}\d{"+(i-1)+"}";
}
pattern="("+String.Join("|",a)+")";
foreach (Match m in Regex.Matches(s, pattern)) {
System.Console.WriteLine(m.ToString());
}
[/tt]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top