Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Stripping invalid characters from filenames 3

Status
Not open for further replies.

Mike Lewis

Programmer
Jan 10, 2003
17,516
Scotland
Some years ago, I wrote a little function that strips invalid characters from filenames. Why would you need to do that? You might need the user to specify the name of an output file. Or you might want to generate a filename from some other string, such as a customer or product name. In both those cases, you have to be sure that the name doesn't contain question marks, colons, backslashes, or other characters that aren't allowed in filenames.

So here is my function. As you can see, it simply replaces each instance of an invalid character with an underscore.

Code:
FUNCTION StripInvalidChars

* Removes all invalid characters from a filename (excluding extension)
* and replaces them with underscores.

LPARAMETERS tcIn

LOCAL lcBadChars

lcBadChars = [<>:"/\|?*]

RETURN CHRTRAN(tcIn, lcBadChars, REPLICATE("_", LEN(lcBadChars)))

Most of the time, this works well enough. But it's not perfect. One issue is that it doesn't distinguish plain filenames (stem plus extension) from full path designations (including drive and/or directories). It would be easy to modify it to retain colons and backslashes, but it would be harder to deal with those characters in the "wrong" place (e.g. more than one consecutive colon or backslash). There are a few other similar minor issues.

Then I discovered what looks like a much easier solution: a FoxTools function named CleanPath() which appears to do just what its name suggests.

To use it, you must first open the FoxTools library, like this:
[tt]
SET LIBRARY TO (HOME(1) + "foxtools")[/tt]

after which you can call CleanPath() just like any other function. You pass it the "raw" input string, and it returns the cleaned-up version. In this case, the invalid characters are completely removed (in my function, they are replaced by undescores).

However, this too is not perfect. Its biggest problem is that it doesn't recognise embedded spaces in filenames (probably because it was written in MS_DOS days). If a filename contains spaces, it simply removes them. On the other hand, it does seem to handle drives and paths correctly - including, for example, removing duplicate backslashes - at least in most cases.

I hope the above information will be useful for anyone who has this requirement. To help you decide between two methods, here are the results of a quick comparison test of the two functions:

Code:
Input		    Own function       CleanPath()	Comment (CP = CleanPath)

abc.dbf             abc.dbf            ABC.DBF   	As expected
abc def.dbf         abc def.dbf        ABCDEF.DBF    	CP removes space
abc?def.dbf         abc_def.dbf        ABCDEF.DBF    	Both correctly remove ?
c:\abc.dbf          c__abc.dbf         C:\ABC.DBF  	CP handles full path OK
c:\data\abc.dbf     c__data_abc.dbf    C:\DATA\ABC.DBF  ditto
c:\data\\abc.dbf    c__data__abc.dbf   C:\DATA\ABC.DBF  CP removes double backslash 
c::\\abc.dbf        c____abc.dbf       ABC.DBF		CP loses (invalid) drive
(empty string)      (empty string)     (empty string)   As expected
?//*|		    _____ (5 underscrs)(empty string)

I'd welcome your comments or suggestions re the above.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
The first special case that I can think of:
What about UNC paths starting with \\?

To answer myself
Code:
? CLEANPATH("\\server\share") && returns "\server\share"

So the foxtools implementation also isn't perfect with paths. I'd take it as good enough for many cases. To ask the user for a result file name, I'd perhaps set a standard output directory in his user profile documents or let him pick now via GETDIR(), which solves the path issue and then the function would only be concerned with the filestem name, and a simple CHRTRAN is good enough for that case.

As in FORCEEXT(ADDBS(basedir)+StripInvaildchars(stemname),"extension"), where you are already sure basedir is a valid output directory and "extension" the necessary extension. CLEANPATH(FORCEEXT(ADDBS(basedir)+stemname,"extension")) would work, too, the imperfection regarding UNC paths would mean basedir shouldn't be a UNC path, though.

So all in all I don't see much benefit of using CLEANPATH.


Chriss
 
I'd used JustStem() to pull out the filestem, clean that with your existing function, and then put the whole filepath back together using ForcePath() and ForceExt():
Code:
* Assuming original is in cFileWithPath
LOCAL cCleanPath, cExt, cPath, cStem
cExt = JustExt(m.cFileWithPath)
cPath = JustPath(m.cFileWithPath)
cStem = JustStem(m.cFileWithPath)

cCleanPath = ForceExt(ForcePath(StripInvalidChars(m.cStem), m.cPath), m.Ext)

Tamar
 
Nice one Mike, Chris and Tamar!


Best Regards,
Scott
MSc ISM, MIET, MASHRAE, CDCAP, CDCP, CDCS, CDCE, CTDC, CTIA, ATS, ATD

"I try to be nice, but sometimes my mouth doesn't cooperate.
 
Chris and Tamar,

Thank you both for your comments. All good points.

When I wrote StripInvalidChars(), I only intended it to be used with the filename stem, so I had no reason to think about UNC paths or any other issues concerning drive and path designations. It was only when I started experimenting with CleanPath() that I thought about those cases. I hope that my little comparison chart will be of interest to anyone thinking of using either function.

My original function was part of an error-logging routine. This generated a text file containing certain error information. The filename stem was a concatenation of the user name, date and time. I had no control over the characters the user name could contain, hence the need to strip invalid characters.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips and downloads
 
Tamar,

JustPath("\\server\\share\\filename.dbf") will still be an erroneous path.

Another validity check could be determining whether DIRECTORY(JUSTPATH(m.cCleanpath),1) is .t. before using it.

I see neither cleanpath nor justpath nor a simple STRTRAN changing double backslashes to single ones will handle the start of a path correctly when UNC is involved, so if you want to get to the bottom of everything I think you still need a function caring for UNC, perhaps also changing slashes to backslashes or such corrections you can assume to be meant that way.

There's JUSTDRIVE, which again doesn't care for UNC paths, but would strip off drive letter and colon, so it can be used to detect the normal drive letter paths.

I suggested a basedir from config you can trust to be checked as existing path. But I see why you suggest your solution, if the only erroneous part is expected in the stem name, then decomposing the full name, correcting the stem part and putting it all back together works fine. So does using the basedir.

Chriss
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top