dbGaP is a controlled-access data repository that serves as a central portal to submit, locate and request access to genomic and associated phenotypic data. It is a valuable and rapidly growing resource with over 750 studies available for access, representing over 1.2 million unique research participants. Users of dbGaP have access to a wide range of data types such as microarray, genome-wide association study, whole and targeted genomic, transcriptomic, epigenomic, and metagenomic data.
Over the years, dbGaP users have shared their feedback, and many have expressed a number of frustrations relating to the difficulty in navigating the data submission and access processes. To address these concerns, NIH has made a number of improvements to dbGaP (see Box 1). To best serve the needs of the research community and enable robust and responsible data sharing, it is imperative that new resources, tools, and data management models be developed to make the system as user-friendly and efficient as possible, as well as increase its utility.
With this in mind, NIH released today a Request for Information (RFI) seeking public comments on the data submission and access processes for dbGaP, and on the management of data within dbGaP, in order to consider options to improve and streamline these processes.
To view the RFI and for instructions on how to comment, please visit NIH Request for Information on Processes for database of Genotypes and Phenotypes (dbGaP) Data Submission, Access, and Management.
For more information on the NIH Genomic Data Sharing Policy, please visit NIH Genomic Data Sharing.
We invite all stakeholders within the genomics community to provide comment, so that NIH can take your thoughts and ideas into account while working to increase the utility of dbGaP as a data sharing tool. Comments will be accepted until April 7, 2017.
Box 1: Recent Improvements/Upgrades to dbGAP
- Development of standard data use limitations to promote consistent implementation of the consent group categories.
- Development of fillable Institutional Certification forms to standardize and expedite the Institutional Certification process for institutions.
- Implementation of user-friendly, electronic study registration, submission, DAR, project renewal, and project close-out forms.
- Development of the dbGaP Data Browser to enable viewing of controlled-access summary statistics and individual-level genotype and sequence data associated with phenotypic features, by dbGaP approved users, without the need to download datasets.
- In collaboration with the Global Alliance For Genomics and Health Beacon project, implementation of a simple web interface that allows users to query dbGaP for genomic variants of interest and their presence in the database.
- Issuance of a Position on the Use of Cloud Computing Services for Storage and Analysis of Controlled-Access Data Subject to the NIH Genomic Data Sharing Policy to allow investigators to request permission to transfer controlled-access genomic data and other associated data obtained from dbGaP to public or private cloud systems for storage and analysis.
- Creation of search filters for dbGaP datasets (e.g. data use limitations, disease area, data type).
- Assembly of two data collections that allows investigators to submit a single DAR to gain access to most of the individual-level datasets in dbGaP approved for general research use (currently includes 96 datasets), or only the aggregated data from these datasets.
- In an effort to promote transparency, the addition of a "Facts & Figures" section on the NIH GDS website to highlight current dbGaP data submission and access statistics, including DAR processing times and data management incidents.
- Development of a mechanism to establish structured partnerships with external organizations or "trusted partners".