Cornell University Cornell University CISER

CISER Computing

Taking Random Samples of Observations from a SAS Data

Q.  I am looking for a program which will let me take a random sample from a very large one (for example, a sample of 300 from a sample of 10000).

A.  One way of selecting a random sample from a data set is to, first, use a DATA step to generate a random vector, then use PROC sort to rearrange the data by that random vector and then select first k observations. Below is a sample program.

DATA dummy ; /* CREATE A DATA SET */
input var1 @@;
cards;
2.1 3.1 4 6 2.2 4.9 4 5 3 3.3 4 5 3 4.3 2.3 4 5 7 3 3 9 11 2
;
run;

%let k=10;                                                /* DEFINE SAMPLE SIZE */

DATA dummy ;
  SET dummy ;
  random=RANUNI(-1);
             /* GENERATE A RANDOM VECTOR */
run;

PROC SORT DATA=dummy;
 
BY random;       /* SORT OBSERVATIONS BY THE RANDOM VECTOR */
run;

DATA sample;
  SET dummy(drop=random);
  IF _N_ le &k;
                       /* SELECT THE FIRST K OBSERVATIONS */
run;

proc print;
run;