Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Display consumer/service line with smallest time value

Status
Not open for further replies.

marm1

Technical User
Nov 12, 2020
9
GB
Hello All,

I have a file with three fields delimited by a comma:
field1 is a consumer name
field2 is the service used by a given consumer
filed3 is a time value

I would like to display the line for each consumer/service pair with the earlier time i.e. smallest time value

Sample input file:
Code:
$ cat data
arx-count-consumer,arx-count-service,300
arx-count-consumer,arx-count-service,500
peg-if-consumer,peg-if-service,1100
tin-mock-consumer,tin-mock-service,1500
arx-count-consumer,arx-count-service,101
tin-mock-consumer,tin-mock-service,4500
pipe-mock-consumer,pipe-mock-service,50
pipe1-mock-consumer,pipe-mock-service,510

Desired output:
Code:
arx-count-consumer,arx-count-service,101
peg-if-consumer,peg-if-service,1100
tin-mock-consumer,tin-mock-service,500
pipe-mock-consumer,pipe-mock-service,50
pipe1-mock-consumer,pipe-mock-service,510

Any assistance would be greatly appreciated. Thanks in advance.
 
for storíng minimum values, you can use an associative array, e.g. min_time_values[key] where key would be a string concatenated from consumer and service.

Then for every line from your file, you could do:
- if array element min_time_values[key] does not exist yet, then create it by setting min_time_values[key] = time_value

- if array element min_time_values[key] already exist and when time_value is less than min_time_values[key] then change it by setting min_time_values[key] = time_value
 
By the way, with data you provided the result for tin-mock-consumer,tin-mock-service coud not be
tin-mock-consumer,tin-mock-service,500
but
tin-mock-consumer,tin-mock-service,1500
 
You can post the code what have you tried so far and where you got stuck
 
Many thanks for your assistance Mikrom. Great approach to the problem.

Code:
$ cat get_min_time.awk
BEGIN { FS="," }
{
        key=sprintf("%s_%s", $1, $2)
        if(!min_time_value[key]) {
                min_time_value[key] = $3
        }

        if(min_time_value[key] > $3) {
                min_time_value[key]=$3
        }
}

END {
        for (i in min_time_value) {
          printf("%s,%s\n", i, min_time_value[i])
        }
}

Code:
$ awk -f get_min_time.awk data
peg-if-consumer_peg-if-service,1100
pipe-mock-consumer_pipe-mock-service,50
tin-mock-consumer_tin-mock-service,1500
pipe1-mock-consumer_pipe-mock-service,510
arx-count-consumer_arx-count-service,101
 
Yes, I would have done the same.
Maybe a small improvement: at the end you can use the split() function to restore the original values of consumer name and service:
Code:
...
END {
        for (i in min_time_value) {
          [highlight #FCE94F]split(i, keys, "_")[/highlight]
          [highlight #FCE94F]consumer = keys[1][/highlight]
          [highlight #FCE94F]service = keys[2][/highlight]
          printf("%s,%s,%s\n", consumer, service, min_time_value[i])
        }
}
Output:
Code:
peg-if-consumer,peg-if-service,1100
pipe-mock-consumer,pipe-mock-service,50
tin-mock-consumer,tin-mock-service,1500
pipe1-mock-consumer,pipe-mock-service,510 
arx-count-consumer,arx-count-service,101
 
Thank you Mikrom for your excellent advice.
 
If there was a further field with a token value, is there an easy method to retrieve the entire line? I am interested in the token associated with the consumer/service pair with the smallest time value. For brevity, I have supplied a subset of data.

Code:
$1 consumer
$2 service
$3 token value
$4 time value

$ cat data
arx-count-consumer,arx-count-service,1234567890abcde,300
arx-count-consumer,arx-count-service,1234567890abcdf,500
arx-count-consumer,arx-count-service,1234567890abcdg,101

Desired output:
arx-count-consumer,arx-count-service,1234567890abcdg,101

Any advice would be greatly appreciated.
 
Sorry, my description of the new requirement was poorly explained. The new objective it to get the consumer/service pair with the lowest time value ($4) and associated token ($3).

I have the correct result however I have done so via accidentally introducing a typo. Is someone able to explain why the below line works:
Code:
min_time_value[key] = $0 # $0 is a typo but it works. $4 is the correct field to use i.e time

Is there is a better method for my required purpose?

Code:
$ cat get_min_time2.data
arx-count-consumer,arx-count-service,AF1001,300
arx-count-consumer,arx-count-service,DF7001,500
peg-if-consumer,peg-if-service,WS9000,1100
tin-mock-consumer,tin-mock-service,YU8000,1500
arx-count-consumer,arx-count-service,YP8000,101
tin-mock-consumer,tin-mock-service,RR5030,4500
pipe-mock-consumer,pipe-mock-service,ZP1020,50
pipe1-mock-consumer,pipe-mock-service,PL9090,510

Code:
$ cat get_min_time2.awk

BEGIN { FS="," }
{
        key=sprintf("%s_%s", $1, $2)
        if(!min_time_value[key]) {				
                min_time_value[key] = $0 # $0 is a typo but it works. $4 is the correct field to use i.e time
        }

        if(min_time_value[key] > $4) {
                min_time_value[key]=$4
                token_value[key]=$3
        }
}

END {
        for (i in min_time_value) {
          if(i in token_value) {
            token=sprintf("%s", token_value[i])
          }
          split(i, keys, "_")
          consumer = keys[1]
          service = keys[2]
          printf("%s,%s,%s,%s\n", consumer, service, token, min_time_value[i])
        }
}

Code:
$ awk -f get_min_time2.awk get_min_time2.data
peg-if-consumer,peg-if-service,WS9000,1100
pipe-mock-consumer,pipe-mock-service,ZP1020,50
tin-mock-consumer,tin-mock-service,YU8000,1500
pipe1-mock-consumer,pipe-mock-service,PL9090,510
arx-count-consumer,arx-count-service,YP8000,101
 
i think, that it is because always applies $0 > $4 (lexicographical string comparision) so your program executes every time the body of the second if
Code:
        if(min_time_value[key] > $4) {
                min_time_value[key]=$4
                token_value[key]=$3
        }
 
to print entire lines i would do something like this:
Code:
BEGIN {
  FS = ","
}

{ 
  consumer = $1
  service = $2
  time_value = $4
  key = consumer"#"service
  if (!min_time_values[key]) { 
    min_time_values[key] = time_value
    line[key] = $0
  }
  else {
    if (time_value < min_time_values[key]) {
      min_time_values[key] = time_value
      line[key] = $0
    }
  }
}

END {
  for (key in min_time_values) {
      print line[key]
  }
}

Output:
Code:
$ cat marm1.txt
arx-count-consumer,arx-count-service,1234567890abcde,300
arx-count-consumer,arx-count-service,1234567890abcdf,500
arx-count-consumer,arx-count-service,1234567890abcdg,101

$ awk -f marm1.awk marm1.txt
arx-count-consumer,arx-count-service,1234567890abcdg,101
 
Thank you mikrom.
Your solution is a clean and simpler approach to the problem posed.
A big thank you once again - excellent coding.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top