More About CRADC
The Cornell Restricted Access Data Center (CRADC) was established in October 1999 as a NSF-sponsored pilot site for providing secure access to confidential research data. In May 2005, the Office of the Vice-Provost for Research designated CRADC as the University custodian of restricted access data sets.
The data placed on the CRADC computing system may be public use, licensed, restricted access, or HIPAA-designated Protected Health Information (PHI). Restricted access data housed on the CRADC system are provided to Cornell under agreements that are monitored by the Office of Sponsored Programs and by the Institutional Review Board for Human Participants. Protected Health Information consists of medical, treatment, clinical, and insurance records generated by medical, wellness, benefit, and insurance providers (including private employers). As a general policy, public use and licensed data will not be placed on the CRADC computing system; however, they may be occasionally placed there for the convenience of CRADC users.
[Definitions: "Restricted Access Data" - data provided by an authorized supplier that must be used with formal confidentiality protections specified in a data provider agreement. "Licensed data" - data provided by an authorized supplier under a data provider agreement that is used to protect the provider's intellectual property (copyright).]
Assistance in negotiating restricted-use data agreements with data providers
CRADC has been instrumental in the acquisition of several restricted-use data sets for the use by Cornell researchers. CRADC staff serves as the custodian for all data housed on its computing systems and implements all measures necessary to maintain the security of not only the original data but also the reports generated from these data (all reports are disclosure-proofed before their release to a researcher for distribution beyond CRADC). Some of the datasets currently housed in CRADC are as follows:
- U.S. Bureau of Labor Statistics National Longitudinal Surveys of Youth
- U.S. Equal Employment Opportunity Commission data
- European Community Household Panel data
- NICHD Study of Early Child Care
- National Longitudinal Study of Adolescent Health (AddHealth)
- China Health and Nutrition Survey
- Panel Study of Income Dynamics
- ICPSR Community Tracking Study Physician Survey
- New York Mutual Fire Insurance Co. data
- Fragile Families and Child Wellbeing Study Data
- Nairobi Stock Exchange data
- Healthcare Cost and Utilization Project (NIS and SID) data
- The Health and Retirement Study
Access to a secure computing facility for housing restricted-use data
CRADC maintains a secure computing system, a Windows domain that exceeds the U.S. Defense Department C-2 standards for trusted computing environments. The system is remotely accessible using a Remote Desktop Connection or a Terminal Services Client and the domain controller employs user-based authentication. The system enforces strict guidelines for selecting user passwords, and requires users to change their passwords periodically.
The system does not permit connection to the outside world via FTP, E-mail, Web, Print, or disk mapping facility except when specifically allowed by the data provider. The system Data Custodians may remove non-confidential summary data and programming at the request of an authorized user after verifying that the files to be removed comply with the data use agreement governing the confidential data used. All CRADC users, system administrators, and custodians, are required to sign an appropriate CRADC Computing System Data User Agreement. Data providers are required to certify their authority to provide the data and to formalize the relationship with CRADC by executing a data provider agreement.
Access to sophisticated statistical computing tools in a secure environment
The compute nodes in the CRADC system include a number of sophisticated software packages for data analysis, as well as other tools for organizing researchers' work. Currently, installed software packages include: Multi-processor enabled versions of SAS, Matlab, Compaq Visual Fortran V6, and MPIPro; single-processor enabled versions of aML, Atlas-ti, Stata SE, SPSS, Limdep/NLogit, GLIM, Genstat, Gauss, and eViews; data conversion software (StatTransfer) and other tools like TextPad, Microsoft Office, Scientific Workplace, and Adobe Acrobat.
All software is installed so that temporary files created by the application are saved in the data-user's private disk and not in areas where unauthorized users may have access. CRADC computing accounts are limited to those using restricted-use data for their research. To apply for an account contact the CRADC Manager.
- Note: Researchers who wish to have access to these tools for working with non-resticted data may obtain accounts on the CISER's general computing system.