Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Assistance Needed - Read file and alter data 2

Status
Not open for further replies.

Cybex1

Technical User
Sep 3, 2011
33
US
Normally I would think this to be an easy task but I guess I am having a bad week... I have searched but did not find a solution, hence the post.

Ok, so I have 40 or so files with data I need to clean up and build a report from. I am thinking of pulling the data I need from the files via awk. I will need to sequence thru the files in the directory one at a time, I am thinking ”ls” into an array and then a for loop. In the loop I need to read each line via sed or awk to find “Adding” at the beginning of the line. On the returned lines I need to print the file name contained in the string and add on the date which is a variable pulled from the source file name.


Files:
09-13-11-05-49-35-PM.txt
09-14-11-04-43-45-PM.txt
09-16-11-03-22-14-PM.txt

Sample lines:
(09-13-11-05-49-35-PM.txt)
Adding y:\path\path1\path2\path3\file name 1.docm 0% 1% OK
Adding y:\path\path1\path2\path3\file number 2.docm 1% 2% 3% OK
(09-14-11-04-43-45-PM.txt)
Adding y:\path\path1\path2\path3\word doc 1.doc 3% OK
(09-16-11-03-22-14-PM.txt)
Adding y:\path\path1\path2\path3\path 4\path 5\long filename used here.doc 3% OK

Desired output:
report.rpt
file name 1.docm 09/13/11
file number 2.docm 09/13/11
word doc 1.doc 09/14/11
long filename used here.doc 09/16/11

Code so far…
Code:
#!/bin/bash
	TFILES=( $(ls *-PM.txt) )

for (( i=0; i<${#TFILES[@]}; i++ ));
	do
		FDATE=`basename "${TFILES[$i]}" .txt | awk -F"-" '{print $1"/"$2"/"$3;}'`	
		FNAME= awk -F"    " '/Adding/ {print $2;}' "${TFILES[$i]}" | awk -F"\\" '{print $NF}' 
		#FNAME= awk -F"    " '/Adding/ {print $2;}' "${TFILES[$i]}" | awk -F"\\" '{print $NF}' | sed 's/$/$FDATE/'
		FDATA=$FNAME"     "$FDATE		

		#printf $FNAME \t $FDATE  # >> report.rpt
		#echo $FDATA
		#echo $FNAME | awk '{print $0 "     " $DATE}'

	done

#echo $FDATA
#echo $FNAME   # >> report.rpt
echo $FNAME

I can’t seem to get the date variable to append to each line of files from that file. I only get it at the end of the list.

Any help and/or tweaks are greatly appreciated. Please feel free to suggest better ways to do things as I enjoy learning and I know my methods are barbaric at best...

Thanks,
Cybex
 
Hi

I would prefer this Awk code instead :
Code:
awk '[blue]FNR[/blue][teal]==[/teal][purple]1[/purple][teal]{[/teal]f[teal]=[/teal][COLOR=chocolate]substr[/color][teal]([/teal][blue]FILENAME[/blue][teal],[/teal][purple]1[/purple][teal],[/teal][purple]8[/purple][teal]);[/teal][COLOR=chocolate]gsub[/color][teal]([/teal][fuchsia]/-/[/fuchsia][teal],[/teal][green][i]"/"[/i][/green][teal],[/teal]f[teal])[/teal][teal]}[/teal][fuchsia]/^Adding/[/fuchsia][teal]{[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/^Adding[ \t]+/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/.*\\/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/[ \t]+[0-9]+%.*/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]print[/color][navy]$0[/navy][teal],[/teal]f[teal]}[/teal]' *-PM.txt > report.rpt
Note that the above has to be run in the directory containing the input files. If you want to run it from elsewhere :
Code:
awk '[blue]FNR[/blue][teal]==[/teal][purple]1[/purple][teal]{[/teal]f[teal]=[/teal][blue]FILENAME[/blue][teal];[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/.*\//[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal],[/teal]f[teal]);[/teal]f[teal]=[/teal][COLOR=chocolate]substr[/color][teal]([/teal]f[teal],[/teal][purple]1[/purple][teal],[/teal][purple]8[/purple][teal]);[/teal][COLOR=chocolate]gsub[/color][teal]([/teal][fuchsia]/-/[/fuchsia][teal],[/teal][green][i]"/"[/i][/green][teal],[/teal]f[teal])[/teal][teal]}[/teal][fuchsia]/^Adding/[/fuchsia][teal]{[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/^Adding[ \t]+/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/.*\\/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/[ \t]+[0-9]+%.*/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]print[/color][navy]$0[/navy][teal],[/teal]f[teal]}[/teal]' path/to/*-PM.txt > report.rpt
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].


Feherke.
 
Feherke,

Wow, that is great. I need to pull out the awk book to see what's going on here. However, it does work! I am encountering some lines that I don't want and I wasn't aware of until this morning. There are some directory listings w/out files. see below:

Adding p:\FY2011 - FY202 Stuff\Location\Non-PDF\Old Revisions OK

Is there a way to exclude these directory listings?
 
Hi

Is the presence of the percent sign ( % ) reliable ? If yes, I would use that as condition :
Code:
awk '[blue]FNR[/blue][teal]==[/teal][purple]1[/purple][teal]{[/teal]f[teal]=[/teal][COLOR=chocolate]substr[/color][teal]([/teal][blue]FILENAME[/blue][teal],[/teal][purple]1[/purple][teal],[/teal][purple]8[/purple][teal]);[/teal][COLOR=chocolate]gsub[/color][teal]([/teal][fuchsia]/-/[/fuchsia][teal],[/teal][green][i]"/"[/i][/green][teal],[/teal]f[teal])[/teal][teal]}[/teal][highlight][teal]![/teal][fuchsia]/%/[/fuchsia][teal]{[/teal][COLOR=chocolate]next[/color][teal]}[/teal][/highlight][fuchsia]/^Adding/[/fuchsia][teal]{[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/^Adding[ \t]+/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/.*\\/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]sub[/color][teal]([/teal][fuchsia]/[ \t]+[0-9]+%.*/[/fuchsia][teal],[/teal][green][i]""[/i][/green][teal]);[/teal][COLOR=chocolate]print[/color][navy]$0[/navy][teal],[/teal]f[teal]}[/teal]' *-PM.txt > report.rpt

Feherke.
 
That worked like a charm! Now, on the output file, I am getting some ascii type characters that are showing up in gedit and other apps but not on the command line. There are four rectangles with 4 small squares in each rectangle. When I try to import the file, it is creating a separate field for these characters. Any idea what they are or how to get rid of them. They were being stripped out earlier on my script because I was on pulling the fields in awk between "four spaces" thus excluding them. However, now we are printing $0 and they are coming through... Any thoughts?


Cybex
 
Hi

Use a tool to find out those characters' codes and use their codes to remove them.

On Frugalware GNU/Linux I have the [tt]hexdump[/tt] ( util-linux package ) and [tt]od[/tt] ( coreutils package ) tools which are suitable for finding out the character codes. Beside those, Midnight Commander's editor and Vim displays the code of the character under cursor. But certainly there are other ways too.

For further assistance please post a dump of such an input line or upload a short fragment of a file somewhere.


Feherke.
 
I took 4 lines and deleted everything except the characters and ran hexdump and od. See the results below.


gentoo64 # od '/mnt/data/test.csv'
0000000 004010 004010 004012 004010 005010 004010 004010 004012
0000020 004010 005010
0000024
gentoo64 # hexdump '/mnt/data/test.csv'
0000000 0808 0808 080a 0808 0a08 0808 0808 080a
0000010 0808 0a08
0000014


Here is the output from one line only:

gentoo64 # hexdump '/mnt/data/test.csv'
0000000 0808 0808 000a
0000005
gentoo64 # od '/mnt/data/test.csv'
0000000 004010 004010 000012
0000005


Does this help?
 
od -c or cat -vet will give you output that's easier to interpret.

In the "one line" example those are 4 backspace (Control-H or \b) characters, followed by a new line. gsub(/\b/,"") should clean 'em up I think.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Annihilannic,

Thank you! That was it exactly and the added gsub cleared it up.

Thanks,
Cybex
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top