NHGRI logo

ELSI DMS Policy Webinar FAQ

The National Human Genome Research Institute hosted an informational webinar, ELSI Research and the New NIH Data Management and Sharing (DMS) Policy, on Tuesday, May 23, 2023.  

The following questions were answered during the webinar

Are there de-identification standards available for qualitative interview data?

NHGRI recognizes that methods for de-identification of qualitative data are developing, and existing tools may have limitations. NIH guidance refers researchers to the following tools for de-identification:
 

In some cases, the repository to which you're submitting data may also provide tips or tools for de-identification.

Our IRB designated our project “not research” because we are planning only quality assurance/quality improvement surveys. Is program evaluation data considered “scientific data”?

We are awaiting input from NIH regarding this issue. If need a more immediate response, please contact an ELSI program officer.

Is it necessary to share coded transcripts or is sharing transcripts enough for qualitative data?

There is no specific requirement for sharing coded transcripts. However, since sharing coded transcripts would likely increase scientific value and facilitate study validation and replication, choosing not to do so would require explanation. Investigators are expected to justify their proposed plan for sharing or not sharing their data, metadata (e.g.  codebooks), and other associated documentation.

What are some examples of metadata? Do I need to share metadata?

Metadata and other documentation associated with a dataset allow users to understand how the data were collected and how to interpret the data. Importantly, this ensures that others can use the dataset and prevents misuse, misinterpretation, and confusion. The exact metadata or other associated documentation will vary by scientific area, study design, the type of data collected, and characteristics of the dataset.  Here are examples of metadata or other information that may be included with research data:
 

  • Methodology and procedures used to collect the data
  • Data labels
  • Definitions of variables
  • Any other information necessary to reproduce and understand the data
     

Some specific examples of metadata to consider sharing include but are not limited to:
 

  • Survey instruments with proprietary measures
  • Data collection protocols including sample and subject selection methods
  • Copies of blank, dated, stamped consent forms and IRB approvals
  • Survey codebooks including question number, question text, variable name, variable label, value labels, codes for missing, non-applicable, “don’t know,” and refusal values
  • Methods used to code open-text survey responses
  • Codebook for analyses of interviews, including a list and definition of all codes used, and coding examples
  • Steps taken to remove direct and indirect identifiers in the data
  • Description of software and analytical methods used in survey and interview data analyses
  • R code used in survey data analyses
  • A standard citation and unique identifier to facilitate attribution of data use

Is raw genomic sequence data considered identifiable or de-identifiable? What are the nuances?

Genomic data is unique to an individual. In general, the ability to discern someone’s identity from genomic data depends on the amount and region of the sequence that is available, and whether one has other information associated with the sequence data. Genomic data can also be used to identify biological relatives and populations connected by genetic ancestry. At this time, re-identifying an individual using their genomic data is time-intensive and requires special technical expertise. In the future, advanced computational methods may make it easier to leverage genomic data for re-identification. To our knowledge, attempts to re-identify individuals using genomic data in the research context have been purely for academic purposes (i.e., to illustrate theoretical risk), rather than malicious.

U.S. regulations that govern biomedical research (i.e., HIPAA and the Common Rule) do not explicitly define genomic data as an identifier. However, NIH has policy protections in place to mitigate the risk of re-identification using genomic data and actively monitors the risks to participants, including the Genomic Data Sharing Policy and the NIH Policy for Issuing Certificates of Confidentiality.

What metadata should be included with interview data and survey data?

For interview data, you ideally would want to share your interview guide and show codebook with examples of coding so people can see what you’ve done. For surveys, it is similar. Provide the survey instrument or say what they are, include a codebook for your data (anything that would help with analysis). Anything that would help people read your data and use it well.

I appreciate there are a range of repository options for sharing qualitative data from empirical ELSI research, both generalist repositories and those geared towards social science. It would be great to have the ELSI community coalesce around (perhaps a few) of these options, to enhance discoverability. This may not come from NHGRI, but are there thoughts on how this could move forward? Perhaps a way to share within the ELSI community as we start to write and enact these plans?

As we move forward, it may be helpful to list a few repositories or places for sharing that work well for different types of ELSI data sharing. NHGRI is not mandating any repository at this point, because there is no perfect fit or plan for all ELSI studies, which vary a great deal in terms of the data created and how and whether they can be shared. Moving forward, NHGRI program staff can consider whether providing sample sharing plans from grantees would be feasible and helpful. However, NHGRI does not share anyone’s grant materials without PI permission and generally waits until the project is complete so this could take some time. Researchers can also voluntarily share with each other and come together on recommendations which might be helpful.

 The Center for ELSI Resources and Analysis (CERA) or ELSIhub may be another means of sharing ELSI research tools and products. The concept for the renewal of CERA approved in February 2023 includes having CERA collect and organize information on where and how to access ELSI research data, metadata and the corresponding ELSI research products and present this information in a searchable and easily accessible format. The idea is for CERA to help ELSI researchers find and utilize relevant ELSI research data and research products. This expanded role for CERA will be in the renewal period and it may take time to build out this functionality.

I’m thinking through consent for data sharing of qualitative data and am struggling with most of my research being exempt from documentation of consent versus the value of documenting consent for or against data sharing. Any thoughts?

When considering this question, we assume the following:
 

  • “Consent” refers to legally effective informed consent as required by the Common Rule (45 CFR 46, Subpart A) for which general requirements are outlined in section 46.116. See also NHGRI’s summary of Required Elements of the Consent Form
     
  • The research under question is human subjects research where the IRB has granted a waiver of signed consent based on regulatory criteria under Common Rule section 46.117(c)(1) (i.e., research participant signature not required) and required informed consent prior to collecting qualitative data
     
  • “Documentation of consent” refers to obtaining written or electronic signature on an IRB approved written consent form)
     

Regardless of whether documentation of consent is required, maintaining records of the information shared and the activities undertaken during an informed consent process is generally a best practice. Even studies that are exempt from IRB review may include plans for obtaining consent.  Written records of information shared can help ensure a research team is consistent in their discussions with participants and can be shared with participants when feasible/appropriate to ensure they have written information about the study to refer to in the future. Maintaining a record of participant informed consent (verses documentation of consent) may be important when consent for data sharing or preferences regarding data use limitations may vary across research participants. Tracking consent at the participant level might be avoided if the only record linking the participant and the research study would be the tracking form and a breach of confidentiality could result in harm.  Finally, your institution may have some requirements related to data sharing or other work that would require written records of informed consent. 

Is AnVIL for genomic data only? Is AnVIL the primary repository ELSI researchers should use? If a different repository is used, should that choice be justified?

No, AnVIL is not for genomic data only. AnVIL supports submission of a variety of data types (including but not limited to genomic data) and supports controlled-access when such a model is needed.

To date, AnVIL is not the primary place where ELSI related data have been shared, however, NHGRI encourages you to consider using AnVIL for sharing any type of data. If you believe your data is better shared in a different repository, please briefly describe why in your grant’s data management and sharing plan. As outlined in NIH's Supplemental Policy Information: Selecting a Repository for Data Resulting from NIH-Supported Research, using a quality data repository generally improves the FAIRness (Findable, Accessible, Interoperable, and Re-usable) of the data. When selecting a repository, investigators should consider factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated. 

Last updated: August 8, 2023