Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

random partitions

Status
Not open for further replies.

ser81

Programmer
Feb 24, 2002
3
BE
Hi,

I'm looking for a script that generates 10 random partitions of my datafile and writes these partitions to 10 different files.

Thanks
 
Have you tried this builtin functions ?
Code:
rand()   random number r, where 0 <= r < 1
srand()  set the seed for rand() from the time of day
srand(x) x is new seed for rand()

Hope This Help
PH.
 
Something like this?

Code:
function genrand(n) {
  return int(1 + rand() * n)
}

BEGIN {
   i=1
   while ((getline < ARGV[1]) > 0) {
           filearray[i] = $0
           ++i
   }
    close(ARGV[1])
    srand()

    for (x=1 ; x <= 10 ; x++) {
        start =  genrand(i) ; end = genrand(i)
        #printf &quot;Printing partition %d of 10\n&quot;, x
        if (start < end) {
            while (end > start) {
                  print filearray[end]
                  end--
            }
        } else if (start > end) {
            while (start > end) {
                  print filearray[end]
                  start--
            } 
        } else {
            print filearray[start]
        }
    #printf &quot;Finished with iteration %d\n\n\n&quot;, x
    #system(&quot;sleep 1&quot;)
    }        
}
 
I don't think marsd's script is exactly what I want. I actually want 10 random samples from my datafile (each 1/10th of the datafile. I use this script to select a random sample from a datafile:

BEGIN {
select = 100 # random sample
remaining = 1000 # number of records
}
{
if (rand() < select/remaining) {
print $0
select--
}
remaining--
}

But if I just repeat this script 10 times, some records will be in different samples and other records won't be in a sample at all. So actually, each sample should each time be removed from the sample.
 
The only way to remove is to use an array.

Something like this maybe..I'm sorry but I don't see
the value of this exercise. It looks alot like HW :
Code:
function genrand(n) {
    return int(1 + rand() * n)
}

function numelements(arr, cnt) {
pp=0
      for (pp in arr) {
           if (arr[pp]) {
               cnt++
           }
      }
return cnt
}

 BEGIN {

       x = 1; while ((getline var[x] < ARGV[1]) > 0) {
                     x++
              }
       close(ARGV[1])
       num = int(x / 10); print num, &quot;Is element size&quot; 
       srand()
     
     gg=0
     a=0
     f=0   
     do { 
         gg++
     for (i=1 ; i <= num && (f = numelements(var)) > 0 ; i++) {
             ind = genrand(x)
             #printf &quot;Printing element %d and numelements = %d\n&quot;, ind, f
             if (var[ind] && !ind in mlist) {
                mlist[a++] = ind
                printf &quot;%s : %d\n&quot;, var[ind],ind  
                delete var[ind]
             }
      }
    } while (gg <= 10)  
}
 
You could create all ten files in one go by using the rand function to generate the output filename extension...

awk 'BEGIN{srand} {print $0 >> &quot;outfile.&quot; int(1+rand*10)}' infile
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top