Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex to find invalid characters in filename 1

Status
Not open for further replies.

grazinggoat

Programmer
Mar 12, 2008
41
0
0
US
I am trying to move files that have invalid characters out
of a directoy but the regex i am using is still copying
the good files that i want to keep in the log_dir

files can be like this
bill-0001.log
BILL-0120-.log
Bill-A-1234-Nov.log

The problem is those files are still being moved
can someone tell me what I am doing wrong here with my regex?

thnx inadvance!

[pre]for FILENAMES in `ls log_dir`
do
if [[ "$FILENAMES" == ^[a-zA-Z0-9.-_]+$ ]] ; then
#do nothing file is good
:
else
#badfile name
print "Found invalid file ${FILENAMES}"
mv "${FILENAMES}" /tmp/
fi
done
[/pre]
 
Hi

[tt][a-zA-Z0-9.-_][/tt] means a character that falls in one of the intervals :
[ul]
[li]a-z ( any of a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y and z )[/li]
[li]A-Z ( any of A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y and Z )[/li]
[li]0-9 ( any of 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9 )[/li]
[li].-_ ( any of ., /, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, :, ;, <, =, >, ?, @, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, [, \, ], ^ and _ )[/li]
[/ul]
Are you sure this is what you want ?


Feherke.
feherke.ga
 
Hello Feherke,

I want to be able to evaluate the FILENAME and if it see's anything in any position
other than the "-", ".", number 0-9, or alph A-Z, to mark the filename as invalid.

I thought the syntax for my regex would work but i get mixed returns
I also tried ecsaping the "-"

if [[ "$NAMES" =~ ^[a-zA-Z0-9.\-_]+$ ]] ; then

so when my script runs it should catch this as invalid because of the "#"
in the name:

Found invalid file bill01#pp.txt (which works)

but it fails here:

Found invalid file bill583-20151008104804-RETURN-GA1.txt ( which should be a valid filename)
 
Basically, I want filenames to be of the POSIX "Fully portable filenames" standard,
which lists these: A–Z a–z 0–9 . _ - as acceptable in filenames everything else should be invalid.
 
Hi

Well, the escaping with backslash ( \ ) does not work as usual in the shell's regular expressions.

However the old simple trick works : move the dash ( - ) to the end of character class to avoid being interpreted as interval :

Code:
[b]if[/b] [teal][[[/teal] [i][green]"$NAMES"[/green][/i] [teal]=~[/teal] ^[a-zA-Z0-9._[highlight]-[/highlight]]+$ [teal]]] ;[/teal] [b]then[/b]

Feherke.
feherke.ga
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top