Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Remove varying length prefix from filenames/string

Status
Not open for further replies.

peedepup

Technical User
Aug 21, 2011
1
US
I have a bunch extracted files that have a prefix that I want to remove (see examples). The prefixes are of different lengths so I cannot just script the removal of a specific number of characters. They all have the " - " just before the part of the file name that I would like to keep. There are also files that have no prefix. How do I go about scripting to delete all the characters up to and including the first " - ", but not doing anything to the files that have no prefix but may have a " - " in the name. I've looked at regular expressions thinking this may be the way to go (due to the ability to use patterns) but just don't grasp them, and I cannot find any other way of doing this.

DISRAD2-01 - Beautiful Soul - McCartney, Jesse (Want to delete "DISRAD2-01 - ")
CB5110-01-01 - Smoke On The Water - Deep Purple (Want to delete "CB5110-01-01 - ")
EKI54-09 - Script, The-For The First Time (Want to delete "EKI54-09 - ")
CB5110-01-06 - I Shot The Sheriff - Clapton, Eric
PR1443-05 - Dead Or Alive - You Spin Me Round

Thanx a bunch,
Biff
 
check out instr, left, right, & mid functions

Never knock on Death's door: ring the bell and run away! Death really hates that!
 
and the example in the posting right before yours :)

-Geates

"I hope I can chill and see the change - stop the bleed inside and feel again. Cut the chain of lies you've been feeding my veins; I've got nothing to say to you!"
-Infected Mushroom

"I do not offer answers, only considerations."
- Geates's Disclaimer
 
Pseudocode might look something like this:


Code:
For each filename
   pos = the position of the first occurrence of " - "
   If pos = 0 Then
      there is no prefix, leave the filename alone
   Else
      Prefix = the part of filename before "pos"
      Songname = the part of filename after "pos"
      If Prefix is a valid prefix name Then
         filename = Songname
      Else
         There is no prefix, leave the filename alone
      End If
   End If
Next

The catch is determining if the Prefix (part before the first " - ") is actually a prefix or is part of the song. Maybe if the prefix contains spaces, consider it part of the name, but if it has no spaces, remove it?
 
Here's a quick example of 1 way to do it with regular expressions:
Code:
'[URL unfurl="true"]http://msdn.microsoft.com/en-us/library/ms974570.aspx[/URL]
'[URL unfurl="true"]http://msdn.microsoft.com/en-us/library/yab2dx62.aspx[/URL]
'[URL unfurl="true"]http://www.regular-expressions.info/index.html[/URL]

set re = new regexp
dim strInput

strInput = "CB5110-01-06 - I Shot The Sheriff - Clapton, Eric"

re.Pattern = "^[a-zA-Z0-9]+(-\d+)+ - "
re.IgnoreCase = true
re.Global = false

if re.Test(strInput) then
	strOutput = re.Replace(strInput, "")
	wscript.echo("Input:" & vbcrlf & strInput & vbcrlf & "Output:" & vbcrlf & strOutput)
else
	wscript.echo(strInput & vbcrlf & "No match found")
end if

it uses guitarzan's logic of "if there is no space, remove it".

Rework it into a function so that you can pass in the string to parse as a parameter and it will be useful in a larger script.

Here's how the pattern breaks down:
"^[a-zA-Z0-9]+(-\d+)+ - "
^ means the expression must match at the beginning of the line, it will not count as a match if found anywhere else in the string.

[a-zA-Z0-9] the square brackets along with the hyphen allow you to specify a range of characters to match. In this instance: any lower case letter, any upper case letter, or any digit.
Pattern so far: "^[a-zA-Z0-9]"
Examples of matches: p, L, 8, etc.

[a-zA-Z0-9]+ the following plus sign means 1 or more of the specified characters must be present.
Pattern so far: "^[a-zA-Z0-9]+"
Examples of matches: t, Y, 7, asdf, aSdF36, 1a2B3c, etc.

Skip ahead just a little, the \d is shorthand for any digit; so -\d+ means a hyphen followed by 1 or more digits. The parentheses turns this into a group and the following + means there must be 1 or more of this entire group.
Pattern so far: "^[a-zA-Z0-9]+(-\d+)+"
Examples of matches: a-3, asdf-321, aSdFjKl-01-02-003-12345 etc.

The last part of the pattern means to exactly match literal " - " (<space>hyphen<space>) characters.
Pattern so far: "^[a-zA-Z0-9]+(-\d+)+ - "
Examples of matches: "a-3 - ", "asdf-321 - ", "aSdFjKl-01-02-003-12345 - ", "CB5110-01-06 - " etc. Quotes used to clearly show the spaces, matches would be only what is contained within the quote marks.

The replace function replaces the match with what you specify; in the case of this script, an empty string.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top