Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need some help in awk

Status
Not open for further replies.

nguyenb9

Technical User
Apr 1, 2003
55
US
I have a file that I need to take field 7-9 start with the letter "A" and replacing field 34-36

Before:
A103058888880700000WC AET01A008940 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008570 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008678 123
B01030588888807000008888880700 0F08000
A103058888880700000WC AET01A008555 123
B01030588888807000008888880700 0F08000

After:
A103058888880700000WC AET01A008940888 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008570800 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008678800 123
B01030588888807000008888880700 0F08000
A103058888880700000WC AET01A008555888 123
B01030588888807000008888880700 0F08000

Thanks for your help
 
your 'before' and 'after' don't jive with the given description.

This is the code based on you [bold]description[/bold]:

nawk -f ng.awk file.txt

here's ng.awk:

Code:
/^A/ { $0 = substr($0,1,34) substr($0,7,3) substr($0, 37) }
{print}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Code:
/^A/ { $0 = substr($0, 1, 34) substr($0, 8, 3) substr($0, 38) }
1
Output:
A103058888880700000WC AET01A008940888 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008570800 123
B01030588888807000008888880700 0F08000
A103058800880700000WC AET01A008678800 123
B01030588888807000008888880700 0F08000
A103058888880700000WC AET01A008555888 123
B01030588888807000008888880700 0F08000
[/code]
 
Since [tt]substr()[/tt] is so klugey, lets do without it.

Code:
/^A/ {
  split($0,f, /^.......|.........WC / )
  $2 = $2 f[2]
}
3.1415926536


For an introduction to Awk, see faq271-5564.
 
I don't think that futurelet meant that the gawk implementation of substr() is a kludge, I think that he
means to say that the hardcoding of string indices is not
good practice ;)
 
the gawk implementation of substr() is a kludge,
...
the hardcoding of string indices is not good practice
Why is that?

 
marsd, you can't convince him. He's part of the Perl congregation, which worships ugliness.

Seriously, my code without substr() was contrived, but I
found it satisfying to get rid of substr($0,8,3).
 
... you can't convince him.
On the contrary, I'm willing to be convinced, which was why I posted the questions. I'm not here to denigrate Awk, or try to convince you that you shouldn't use it. I've been using it for years. (Less often lately.) The questions were meant sincerely. I don't know why (or if) "the gawk implementation of substr() is a kludge," and don't see why "the hardcoding of string indices is not good practice," when the requirement of the original post was to operate on specific columns.
He's part of the Perl congregation, which worships ugliness.
It's possible to write ugly code in any language. I don't find Perl ugly. But beauty (and ugliness), as is so often said, are in the eye of the beholder. I'm not here as a missionary.
 
To make code more pleasing (and more robust), one avoids or at least hides gory details and operates at a higher level of abstraction.

It is preferable to say "take field 4" rather than to say "start at the 39th character and take the next 7 characters".

When solving this problem, it was impossible to avoid substr() entirely (or something just as ugly). However, "$2=$2 substr($0,8,3)" is preferable to "$0 = substr($0, 1, 34) substr($0, 8, 3) substr($0, 38)".


As to Perl, it is impossible to avoid ugliness when using it; the syntax demands the use of too many irritating symbols.

One Perl user's website says:
Perl
The ugly hacking language


I like Perl, I really do. I can get a lot done with it fast. But it is a HACKING language and it is uglier than hell. So in many ways I don't like it. It is kinda like C++, I'm good at it, but don't like doing it necessarily. I do think it needs some improvements, but they are "improvements" only in the philosophical sense.
 
To make code more pleasing (and more robust), one avoids or at least hides gory details and operates at a higher level of abstraction.
I agree.
It is preferable to say "take field 4" rather than to say "start at the 39th character and take the next 7 characters".

When solving this problem, it was impossible to avoid substr() entirely (or something just as ugly). However, "$2=$2 substr($0,8,3)" is preferable to "$0 = substr($0, 1, 34) substr($0, 8, 3) substr($0, 38)".
I don't agree. This isn't "more robust." It depends on the value of FS and will break if it is changed to something other than the default, while the solution using only substr will not. Hence the latter is more robust.
As to Perl, it is impossible to avoid ugliness when using it; the syntax demands the use of too many irritating symbols.
Whether or not the symbols are irritating or "ugly" is a matter of opinion. I like being able to tell at a glance whether something is a scalar, an array or a hash (associative array). I find them helpful rather than irritating. This is a matter of habituation to some extent: the symbols of any language may seem irritating or ugly until you're familiar with the language.

As for "one Perl user's website," it's possible to find almost any opinion about almost any topic expressed somewhere on the web. I don't see why any particular weight should be given to the one quoted here.

I don't need to be convinced of the merits of Awk. I'm not on a mission to convince anyone of the merits of Perl. If you don't like it, fine. Leave it alone. I won't bother you about it. The world is big enough for both, and lots more besides.

 
mikevh said:
It depends on the value of FS and will break if it is changed to something other than the default, while the solution using only substr will not. Hence the latter is more robust.
Why would one deliberately make a change to FS in this program that is already working perfectly? And if he did change FS, why would he expect $2 to be the same as it was before? A program can't be immune to sabotage.

Your solution depends on the numbers
1, 34, 8, 3, 38
If a single one of them is changed, your program will break.

Consider the possibility that we weren't shown all the
forms that the data can conceivably take. Perhaps in addition to[tt]
A103058888880700000WC AET01A008940 123

A103058888880700000WC AET01A0089402 123
[/tt]could occur. It seems most likely that the poster would want the 3 digits to be appended to the 2nd field. If so, my program would continue to work and yours would break.
If a line such as[tt]
A103058888880700000WC-AET01A008940 123
[/tt] should occur, mine would break instead of yours, but it seems to me that all of the lines would probably have fields delimited by whitespace.

Saying "overlay xyz at column 35" is more specific and less abstract than saying "the fields are separated by whitespace; append xyz to the 2nd field". Therefore, I prefer the latter. And it relieves one from having to figure out the correct column number!
 
Why would one deliberately make a change to FS in this program that is already working perfectly?
Suppose someone feeds in a different value for FS on the command line with the -F option. Why would someone do this? Well, 'cause he's an idiot, basically: doesn't know awk, doesn't know the significance of what he's doing and is only doing it 'cause someone told him to do that once before with another script, so he assumes that's how it's always done. Now the meaning of $2 has changed. But assuming the same data, the substr version still works.

If the input data is changed, all bets are off. Probably neither version would produce the correct result. My point was that there is nothing inherently more robust about the "$2" version.

 
The [tt]substr[/tt] method is the only one that does what the original poster asked for (making some assumptions about the English).

That doesn't mean that's the best way to get the job done, or even that it works in all required cases. The input data format isn't described well enough to know the "best" way of processing it.


If the original request is accurate, then the [tt]substr[/tt] method is a simple, elegant way of solving the problem.

In other words, if the English description of the problem is "Replace characters 34-36 with characters 7-9 in records where the first field starts with 'A'," then using [tt]substr[/tt] is certainly not a kludge. The problem calls for something that works with substrings at given character positions; that's what [tt]substr[/tt] does.


Now, it's possible that the original poster oversimplified the problem, and a proper description of the problem would be more like "When the first character of the first field is 'A', append characters 7-9 of the first field to the second field. In that case, a [tt]substr[/tt]-based solution probably wouldn't cut it, and a field-based on would be better.


futurelet said:
To make code more pleasing (and more robust), one avoids or at least hides gory details and operates at a higher level of abstraction.

Unless the problem can only be stated in terms of gory details, in which case dealing with gory details is necessary.

futurelet said:
It is preferable to say "take field 4" rather than to say "start at the 39th character and take the next 7 characters".

Not if we trust the original problem description. Again, if it is accurate, then the field-based solution may not be preferable, and may even be incorrect. (This would leave the input format open to criticism, but that's beside the point).

futurelet said:
Saying "overlay xyz at column 35" is more specific and less abstract than saying "the fields are separated by whitespace; append xyz to the 2nd field". Therefore, I prefer the latter. And it relieves one from having to figure out the correct column number!

That's a a good way of thinking, and if it can be applied to this problem, then it probably should be.

Suggesting the field-based method was a great idea as it gives the original poster a chance to consider whether this may in fact be a better way to solve the problem.


However, arguing about which solution is "preferable" is futile when the input format, and therefore the problem itself, hasn't been stated clearly.

Effectively, you're arguing that the solution to one problem is better than another solution to a different problem, and you have no evidence that the two problems are the same problem in the first place.


What I believe you want to be saying is that the input format should have been designed so that a field-based method would be sufficient for processing it, and that if it has indeed beeen designed so that the column number matters, then it has been poorly designed.

Most people would agree with you there. However, we unfortunately have to solve the problems that happen, not the ones that should have happened.


Or perhaps you mean to suggest that the original poster should consider restating the problem so that it involves fields instead of character positions.

That's probably a good idea, too.

Plus, if you got confirmation that a field-based method would solve the problem as effectively as a[tt]substr[/tt]-based one, you'd be in position to have an argument about which solution is better.


Arguments are fine, but they never go anywhere unless you actually argue about the same thing.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top