Sampling at regular interval 2

hill007 · Sep 20, 2004

I have a series of large data sets going upto a gigabyte size. These datasets have a X value, Y value, and a Z value.

The X and Y are coordinate locations and Z is the depth values associated with those locations.
I want to sample each of the Y value at every 50 feet interval based on the X values.
For example my data set is like this:

X Y Z
100 150 1.1
120 250 1.2
100 155 1.1
100 200 1.5
100 250 1.3
120 260 1.4
120 300 1.2

.
.
ETC.
.

My final data sets should be like, after sampling every 50 feet:

X Y Z
100 150 1.1
100 200 1.5
120 250 1.2
120 300 1.2

.
.
etc.
.
.

Thanks for any help.

futurelet · Sep 20, 2004

> I want to sample each of the Y value at every 50 feet
> interval based on the X values.

Shouldn't it be every 50 feet based on the y values?

First sort the file on the second column.

Then use this:

$2 >= threshold { print; threshold = int(($2 + 50)/50) * 50 }

hill007 · Sep 20, 2004

Hi Futurelet,

Yes, you are right it will be every 50 feet of y values.
How do I sort in awk?
Thanks.

hill007 · Sep 20, 2004

Hi futurelet,

Could you also tell me please what the code above does. I am using awk95. Where will my file go in the code?
Thanks.

futurelet · Sep 20, 2004

I would create a new sorted file using the sort program that comes with the operating system.

Under Dos, if the y-values begin in the 6th column:

sort /+6 original.dat >sorted.dat

This will work correctly only if all of the y-values have the same number of digits or if they are right justified (padded with blanks on the left).

Unix has a more sophisticated sort program, I believe.

Save the awk code as file "sample50.awk".

Run with

awk95 -f sample50.awk sorted.dat >out.dat

mikevh · Sep 20, 2004

On *nix:

sort +1 -n original.dat | awk95 -f sample50.awk > out.dat
(*nix sort counts columns beginning with 0. -n = sort as numbers.)

or, for less work on the command line:

Code:

BEGIN {
    input = ARGV[1]    
    sortcmd = "sort +1 -n " input
    while (sortcmd | getline) {
        if ($2 >= threshold) { 
            print
            threshold = int(($2 + 50)/50) * 50
        }
    }
}

and invoke as
awk95 -f sample50.awk original.dat > out.dat

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Sampling at regular interval 2

hill007

Technical User

futurelet

Programmer

hill007

Technical User

hill007

Technical User

futurelet

Programmer

mikevh

Programmer

Similar threads

Part and Inventory Search

Sponsor