Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations dencom on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

You think you know Regex?

Status
Not open for further replies.

zackiv31

Programmer
May 25, 2006
148
US
I need to match these strings, and match them into filename, and arguments. Been trying for a while now, somewhat stuck on the last one :-/

Code:
"C:\PRO GRAM\SOM DIR\ABC.Exe" /abc
C:\Program\ab fil\blah.abc /abc
C:\blah\hmm.huh /okthen
hello /crap
hi
C:\Program\ab fil\blah.abc C:\blah blah\crap.doh

The more general the better (I don't know if I have all the cases in here) Dunno if it can be done in one regex, but functionality obviously takes precedence.

If you're wondering, I'm parsing uninstall strings in Windows,
HK\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\*
 
what part is the filenames and what part is the arguments? What have you tried so far?

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Filname is the first 'part' of the line. It doesn't have to contain an argument at all. The filename may or may not have double quotes surrounding it. It doesn't necessarily have to have an extension to be defined as the end of a filename... (although maybe it does, except for system functions). The argument can contain basically anything. Both *can* have spaces.. :-/

I've done it in C#, but I'm at something like this:

Code:
Regex r = new Regex("(\"+[^\"]+\"|\\S+)\\s*(.*)");
 
Code:
C:\Program\ab fil\blah.abc /abc
How do you decide whether the spaces are in the filename or in the argument? Until the criteria you're applying can be fully described in non-programming terms, you won't be able to write a regexp that works correctly for you.
 
Filname is the first 'part' of the line

clear as mud I'm afraid.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Well, the Windows uninstaller must be able to parse them, so there must be a pattern in there somewhere (of course, I've had stuff that won't uninstall from Windows on several occasions - perhaps this is why?). We could assume that paths with spaces in have to be enclosed in double quotes, that's a fairly normal Windows practice.

Steve

[small]"Every program can be reduced by one instruction, and every program has at least one bug. Therefore, any program can be reduced to one instruction which doesn't work." (Object::perlDesignPatterns)[/small]
 
The example you provided initially does not follow that assumption. Is the assumption correct or is the example correct?

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
I guess the whole point was for you guys to give your input on it... looking at the list of 6 items, I think we can all clearly determine which ones are the Filename and the Argument... no? The argument is simply a switch or install argument for the filename... Beyond the examples I've given... it's hard to say a determining factor in seperating the two.


As for clear as mud... ya it is, but looking at it, at least for me I can tell which is which... there is only one filename, everything after it is the argument. But the filename and argument can have many different forms...
 
I can tell which is which too but that is because my brain handles fuzzy logic relatively well. For you to create a regex you need to clearly define definitive hard logic as to what constitutes the filnename portion and what constitutes the arguement.

[red]"... isn't sanity really just a one trick pony anyway?! I mean, all you get is one trick, rational thinking, but when you are good and crazy, oooh, oooh, oooh, the sky is the limit!" - The Tick[/red]
 
I agree with you, maybe my title is off... Because I know how to write regex's... I just don't know the particular logic which would be used to work on all of these.... was just asking for suggestions/ideas.
 
These are all simple to parse, no need to even discuss them:

"C:\PRO GRAM\SOM DIR\ABC.Exe" /abc
C:\Program\ab fil\blah.abc /abc
C:\blah\hmm.huh /okthen
hello /crap


these are not:

hi
C:\Program\ab fil\blah.abc C:\blah blah\crap.doh

You would have to write a set of regexps that check for various indications as to what the filename is and the argument is. "hi" if it is even valid looks pretty easy, but that last one is a doozy.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
lol.. glad we're in agreement Kevin... I had a feeling it was gonna be a multi test kind of thing... and ya, that last one is a doozy.
 
The solution might be clearer if we could see some of the real lines.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Honestly, if you could make it run on that input above, it would suffice... those are basically the same as what I've run into, having just substitured letters..

one more:
Code:
Rundll32 C:\stuff here\ohboy a\abc.exe /haha

Rundll32 being the filename, and the rest the argument
 
Scratch that last part... I don't think Rundll32 is the only part of the argument.. I think it might include the next part of it as well... hmmm
 
the real lines might provide a solution to the problem. I can't see anyway to parse that last line without applying all sorts of validation schemes first, and since I don't know all the possible permutations that might occur, I can't even begin to guess.

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
I think I've narrowed it to two cases..

1. It must start with a drive letter, then a colon, then an amount of characters until a .extension.

2. A windows system exe. "MsiExec.exe" or "Rundll32"


here's my regex for the first case:

Code:
^\s*(([a-z]|%):.+\.\S+)\s*(.*)

Seems to work well.. if anyone has any insite as to why it may not work.

this is a particular Rundll32 line that is giving me problems... partly because I don't know which is the filename and which is the argument.. :-/ (i think my regex issues are over?)

Code:
RunDll32 C:\PROGRA~1\COMMON~1\INSTAL~1\engine\6\INTEL3~1\Ctor.dll,LaunchSetup "C:\Program Files\InstallShield Installation Information\{C6FA39A7-26B1-480A-BC74-6D17531AC222}\Setup.exe" -l0x9 UNINSTALL
 


...

Took me 5 seperate Regex's (in a specific order) to determine them the way I want.. certainly not pretty, but I think fairly necessary.

Thanks for the insight.
 
Status
Not open for further replies.

Similar threads

Part and Inventory Search

Sponsor

Back
Top