Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need help pulling and formatting text with awk

Status
Not open for further replies.

jfarmerjr

Programmer
Mar 10, 2010
21
US
I have a text file (actually many of them) that among other things has lines of text like the following:

AVCA   Advocat Q4 2009 Advocat Earnings Release $ 0.14  n/a  $ 0.21  9-Mar AMC

or

ACAD   ACADIA PHARMACEUTICALS INC Q4 2009 ACADIA PHARMACEUTICALS INC Earnings Release -$ 0.19  n/a  -$ 0.38  9-Mar AMC

The lines begin with a tab (or maybe it's just 4 spaces) There is a tab between the stock symbol and the company name, but all other fields are separated by 2 spaces.
What I need is the stock symbol, company name, estimate, actual, previous, and date in csv Using the above examples, what I'm looking for is:

AVCA,Advocat,0.14,n/a,0.21,9-Mar AMC

ACAD,ACADIA PHARMACEUTICALS INC,-0.19,n/a,-0.38,9-Mar AMC

I think awk is the right tool for the job (the awk command will make it into a bash script to do this everyday)
 
One issue I have is that there is 2 spaces after each symbol before the comma on every line after running the command
 
Hi

I do not understand this. I compared my output with the one you posted and found no difference. Please post the output you get and using TGML tags mark with colors what should not be there.

By the way, which [tt]awk[/tt] implementation are you using ?

Feherke.
 
I've attached a copy of an output file (named awktest.txt). I'm using whatever awk comes standard on ubuntu, I'm guessing it's gawk. The spaces issue really isn't a big deal, but is there anyway to exclude all the lines where the symbol (first field) has a '.' in it? Included in the share link is a file called converted.txt which is a sample of the converted html I'm starting with.
 
 https://secure.filesanywhere.com/fs/v.aspx?v=897069895a5e71b99d99
Hi

jfarmerjr said:
is there anyway to exclude all the lines where the symbol (first field) has a '.' in it?
Code:
awk -F '\\t|  +' -vOFS=, '[highlight]$2!~/\./[/highlight]{gsub(/\$ /,"");print$2,$3,$5,$6,$7,$8}' converted.txt
Regarding the "spaces issue" are you talking about the output lines looking like this ?

[tt],,,,,
,,,,,[/tt]

I guess you would like to eliminate them :
Code:
awk -F '\\t|  +' -vOFS=, '$2!~/\./[highlight]&&NF>=8[/highlight]{gsub(/\$ /,"");print$2,$3,$5,$6,$7,$8}' converted.txt

Feherke.
 
The last one is essentially perfect, thank you kindly!!! Regarding the spaces issue, every line has 2 spaces after the symbol before the comma

example:

ACAD[COLOR=red red]  [/color],ACADIA PHARMACEUTICALS INC,-0.19 ,n/a ,-0.38 ,9-Mar AMC
 
Also seems to be adding a single space to the end of every field but the second (company name) But that's not a major issue. I'd like to thank you again for your exceptional help, so...Thank you!
 
Hi

Ah. There are also some characters with code 160 ( hexadecimal a0 ).
Code:
awk -F '\\t|[highlight][[/highlight] [highlight]\\xa0][[/highlight] [highlight]\\xa0][/highlight]+' -vOFS=, '$2!~/\./&&NF>=8{gsub(/\$ /,"");print$2,$3,$5,$6,$7,$8}' converted.txt

Feherke.
 
Feherke, You Are Awesome! Works exactly as I need it to! Thanks for applying your obviously genius mind to my problem
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top