Cornell University Cornell University CISER

CISER Computing

Sorting Very Large Datasets with SAS

Q.  I'm trying to sort a very large SAS dataset (4.49G) and I'm getting the message that SAS is "out of resources".  What can I do?

A.  Here are some helpful tips passed along to us from other users as well as from SAS:

From an experienced user:

  • You can better diagnose where you're running into resource problems by setting a couple of system options: MSGLEVEL=i and FULLSTIMER. You need to determine whether you're running out of RAM or disk space. Its often the latter.
  • With regard to memory -- if, for example, you have 750M RAM, try setting the MEMSIZE option to 700M and the SORTSIZE option to 650M.  You need to leave enough room for the operating system & for SAS overhead.   *** Make sure you're not running any other processes ***.
  • About disk space -- A common cause of problems!  If you are sorting a temporary (WORK) data set, you need to have room for ***4 copies***  of the dataset in your WORK library (for Windows, 5 if you are using Unix).   If you're sorting a permanent dataset (two level name), you need room for 1 copy in the source library, 1 in the destination, and 2 in WORK.

Some more tricks -

  • Make your data set smaller: eliminate all unnecesary variables and set the LENGTH of variables to be no more than necessary (e.g., 3 for dummy variables, 4 for integers, etc,). these reduce the size of the file to be sorted dramatically & should always be done (if you haven't already). since you only have 1.7 million records but the file is 4.5 GB, i'm guessing you might not have done this yet (or else the file is inherently very wide -- see below). with data files of this size you need to think hard about how to reduce file size.
  • Try sorting subsets of the data & recombining them --
    • If the file is very 'wide', split it into multiple files which all contain the sort variables but only contain some of the other variables; you can then do a MERGE/BY to recombine them.
    • If the files are very 'long', try subsetting the file the file (1/2 the obs in one file, 1/2 in another, say), sort them separately, and then interleave them in a data step.
  • (the obvious) -- Try and avoid the sort altogether: index the file or re-think the job sequence.

Important tip: 
I can't stress enough how much reducing the 'width' of your file by dropping unnecessary variables and setting LENGTHs properly can help.  Some combination of the other methods should get you 'sorted out'

From SAS:

Is the the drive formatted as fat32 or ntfs? The fat32 drive would have a file size limit of 2.1gig and this could be causing this error.