I need to parse through a log file and extract all filenames for a given file extension. I have something ugly that is working but I know there has to be a better way... I need to exclude records that are searches for filenames with extensions (hence, the grep -v /search). I tried to avoid listing each html hex code during the strip down process but I couldn't get gwak to work with a regular expression. I tried something like; gawk '{ /%[0-9][A-F]/; print $(NF)}'. Any help would be greatly appreciated! Below is what I have so far and the desired results. Below that is the sample data I have been using.
gawk '{print $7|"sort"}' httplog.txt|grep '\.rar'|grep -v '\/search'|gawk -F \.rar '{print $1 ".rar"}'|gawk -F \/ '{print $(NF)}'|gawk -F \= '{print $(NF)}'|gawk -F %2F '{print $(NF)}'|gawk -F %3B '{print $(NF)}'|gawk -F %252B '{print $(NF)}'|gawk -F %2B '{print $(NF)}'|gawk -F + '{print $(NF)}'|gawk -F html '{print $(NF)}'|sort|uniq
Replace.Studio.Business.Edition.v7.5.Retail-FOSI.rar
Replace.Studio.Pro.v7.5.Retail-FOSI.rar
test data
-----------httplog.txt---------------
-----------httplog.txt---------------
gawk '{print $7|"sort"}' httplog.txt|grep '\.rar'|grep -v '\/search'|gawk -F \.rar '{print $1 ".rar"}'|gawk -F \/ '{print $(NF)}'|gawk -F \= '{print $(NF)}'|gawk -F %2F '{print $(NF)}'|gawk -F %3B '{print $(NF)}'|gawk -F %252B '{print $(NF)}'|gawk -F %2B '{print $(NF)}'|gawk -F + '{print $(NF)}'|gawk -F html '{print $(NF)}'|sort|uniq
Replace.Studio.Business.Edition.v7.5.Retail-FOSI.rar
Replace.Studio.Pro.v7.5.Retail-FOSI.rar
test data
-----------httplog.txt---------------
-----------httplog.txt---------------