SeriesConsumer
Programmer
Hello,
I have a very simple input file to parse but currently don't manage to build a Type Tree that is capable of parsing it. The input looks like
[tt]
ED03_06
ABC09_07
FEX301_12
A303_03
_05_06
[/tt]
Each line starts with a symbolic name of arbitrary length (minimum 1 character) followed by two digits indicating the month, a fixed underscore sign and finally two more digits indicating the year.
The problem is the symbol field: I don't know a way of making it non-greedy, i.e. prevent it from consuming the month too. I already set month and year to fixed size. To represent the underscore I introduced a separate field with a value restriction and an inclusion rule for just the character '_'. Using component rules also didn't lead to success.
For demonstration purposes I will include a small Perl script that does exactly what I want (using non-greedy regular expressions):
[tt]
while (<>) {
chomp;
if (/^(.+?)(\d\d)_(\d\d)$/) {
my ($symbol, $month, $year) = ($1, $2, $3);
print "symbol=$symbol, month=$month, year=$year\n";
} else {
print "can't parse $_\n";
}
}
[/tt]
Does anyone have a clue how this behaviour can be achieved using Mercator 6.7?
I have a very simple input file to parse but currently don't manage to build a Type Tree that is capable of parsing it. The input looks like
[tt]
ED03_06
ABC09_07
FEX301_12
A303_03
_05_06
[/tt]
Each line starts with a symbolic name of arbitrary length (minimum 1 character) followed by two digits indicating the month, a fixed underscore sign and finally two more digits indicating the year.
The problem is the symbol field: I don't know a way of making it non-greedy, i.e. prevent it from consuming the month too. I already set month and year to fixed size. To represent the underscore I introduced a separate field with a value restriction and an inclusion rule for just the character '_'. Using component rules also didn't lead to success.
For demonstration purposes I will include a small Perl script that does exactly what I want (using non-greedy regular expressions):
[tt]
while (<>) {
chomp;
if (/^(.+?)(\d\d)_(\d\d)$/) {
my ($symbol, $month, $year) = ($1, $2, $3);
print "symbol=$symbol, month=$month, year=$year\n";
} else {
print "can't parse $_\n";
}
}
[/tt]
Does anyone have a clue how this behaviour can be achieved using Mercator 6.7?