Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

"Case of the Mondays" Regexp Problem

Status
Not open for further replies.

tktuffkid

Programmer
Dec 12, 2006
11
US
It's Monday and I'm sure I am doing something stupid with my regexp. But because it's Monday, I am looking for someone to tell me what I am doing wrong.

In essence, I am trying to parse out a Linux component string from its version string. Because the component name can contain one or more digits, I see

Code:
[jdoe:cheech ~]$ tclsh
% set rpm foo123-0.7.8-9
foo123-0.7.8-9
% regexp {(.*?)-(\d.*)} $rpm match rpmName rpmVers
1
% puts $rpmName
foo123
% puts $rpmVers
0
% puts $match
foo123-0
%
 
Sorry. I submitted perhaps to early. If you notice above, I am only able to capture '0' from '0.7.8-9'
 
Hi

And what is your goal ?

If you want to get "foo123" in $rpmName and "0.7.8-9" in $rpmVers, I would change the expression to :
Code:
[b]regexp[/b] [teal]{[/teal][teal](.*?)-(.*)$}[/teal] [navy]$rpm[/navy] match rpmName rpmVers

Feherke.
[link feherke.github.com/][/url]
 
Code:
set rpm "foo123-0.7.8-9"
if {[regexp {(.*)\-(\d+\..*)} $rpm match rpmName rpmVers]} {
  puts "\$rpmName = '$rpmName'"
  puts "\$rpmVers = '$rpmVers'"
  puts "\$match   = '$match'"
}
Output:
Code:
$rpmName = 'foo123'
$rpmVers = '0.7.8-9'
$match   = 'foo123-0.7.8-9'
 
Hi

Just a minor note.

I am not familiar with the RPM package naming schemes, I use Frugalware and there the packages keep the original name and version as far as possible. So there are packages called erlang-15B01-1, fontforge-20110222-1, j2re-6-31 for which mikrom's regular expression would fail. There are also packages called alsa-firmware-1.0.25-1, c-ares-1.7.5-1, gnome-desktop-3.4.2-1 for which my regular expression would fail. Of course there are also packages like ca-certificates-20090814-6, systemd-sysvinit-44-1, terminus-font-ttf-33-3 for which both regular expressions would fail.

So I would try to ask [tt]rpm[/tt] to extract and provide those information, then parse its output :
Code:
[b]set[/b] rpm [green][i]"erlang-R15B-01.2.fc18.x86_64.rpm"[/i][/green]

[b]set[/b] data [teal][[/teal][b]exec[/b] rpm [teal]-[/teal]qip [navy]$rpm[/navy][teal]][/teal]

[b]foreach[/b] one [teal][[/teal][b]split[/b] [navy]$data[/navy] [green][i]"[/i][/green][lime][i]\n[/i][/lime][green][i]"[/i][/green][teal]][/teal] [teal]{[/teal]
  [b]if[/b] [teal]{[/teal][teal][[/teal][b]lindex[/b] [navy]$one[/navy] [purple]0[/purple][teal]]==[/teal][green][i]"Name"[/i][/green][teal]}[/teal] [teal]{[/teal] [b]set[/b] rpmName [teal][[/teal][b]lindex[/b] [navy]$one[/navy] [purple]2[/purple][teal]][/teal] [teal]}[/teal]
  [b]if[/b] [teal]{[/teal][teal][[/teal][b]lindex[/b] [navy]$one[/navy] [purple]0[/purple][teal]]==[/teal][green][i]"Version"[/i][/green][teal]}[/teal] [teal]{[/teal] [b]set[/b] rpmVers [teal][[/teal][b]lindex[/b] [navy]$one[/navy] [purple]2[/purple][teal]][/teal] [teal]}[/teal]
[teal]}[/teal]

[b]puts[/b] [green][i]"[/i][/green][lime][i]\$[/i][/lime][green][i]rpmName = '$rpmName'"[/i][/green]
[b]puts[/b] [green][i]"[/i][/green][lime][i]\$[/i][/lime][green][i]rpmVers = '$rpmVers'"[/i][/green]
Code:
$rpmName = 'erlang'
$rpmVers = 'R15B'
In this example I used a local file as there is no RPM database on my machine. To make it query data from your database seems you have to only remove the [tt]-p[/tt] switch.

Feherke.
[link feherke.github.com/][/url]
 
Thanks for all the replies. Feherke, your last post is the best approach since it is fool proof. Thank you very much.

I wasn't very clear in my post and I apologize. I am not a Linux packaging guy whatsover but I noticed on the product I'm working on is that there are multiple different patterns in the RPM names. And to determine the name vs. the version is not trivial (since the name may contain digits, underscores, and hyphens). The only pattern which I could manually recognize was the hyphen followed by a digit (which is why my original regexp had {-\d}).

However, I guess I would still be curious as to why .7.8-9 is lost below. Although I will use the 'rpm -qip', it still will bother me as to why the following regexp does not work as expected for foo123-0.7.8-9.

regexp {(.*?)-(\d.*)} foo123-0.7.8-9 match rpmName rpmVers

 
Hi

Feherke said:
it still will bother me as to why the following regexp does not work as expected for foo123-0.7.8-9.
Sorry, I have no proper answer for this. There is a sentence in re_syntax which rises curiosity :
re_syntax said:
The matching rules for REs containing both normal and non-greedy quantifiers have changed since early beta-test versions of this package.
But I found no details on this.

Actually Tcl uses ARE ( Advanced Regular Expressions ) not the usual PCRE ( Perl Compatible Regular Expressions ), so maybe our expectation is wrong. But while that [tt]?[/tt] clearly influences the greediness, I would expect identical behavior.

My test indicates that if the first quantifier of the regular expression is non-greedy, all quantifiers become non-greedy. Making other quantifier then the first one non-greedy is without effect. Strange.
Code:
[gray]# command                                                       $g1 $g2 $g3 $g4[/gray]
[b]regexp[/b] [teal]{[/teal][teal](.*)-(.*)-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4    [gray]# foo bar fiz baz[/gray]
[b]regexp[/b] [teal]{[/teal][teal](.*)-(.*)-(.*)-(.*?)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4   [gray]# foo bar fiz baz[/gray]
[b]regexp[/b] [teal]{[/teal][teal](.*?)-(.*)-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4   [gray]# foo bar fiz[/gray]
[b]regexp[/b] [teal]{[/teal][teal]%*(.*)-(.*)-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4  [gray]# foo bar fiz baz[/gray]
[b]regexp[/b] [teal]{[/teal][teal]%*?(.*)-(.*)-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4 [gray]# foo bar fiz[/gray]
[b]regexp[/b] [teal]{[/teal][teal](.*)-(.*)%*-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4  [gray]# foo bar fiz baz[/gray]
[b]regexp[/b] [teal]{[/teal][teal](.*)-(.*)%*?-(.*)-(.*)[/teal][teal]}[/teal] foo[teal]-[/teal]bar[teal]-[/teal]fiz[teal]-[/teal]baz m g1 g2 g3 g4 [gray]# foo bar fiz baz[/gray]


Feherke.
[link feherke.github.com/][/url]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top