Skip to main content

Data Sharing Resources

Principles of Data Sharing Checklist

National Institutes of Health (NIH) data sharing guidance and the NIH RFA for the Rare Diseases Clinical Research Network (RDCRN) (PAR-24-206) emphasize the need for data sharing that is not limited to members of a consortia, but broadly available to the community. This checklist outlines the basic elements that need to be incorporated in the informed consent language, the contractual language established between the consortium administrative core and its sites, and the consortium policies that govern data sharing, use, and publication within the consortium and with external users. The principles stated in this checklist are incorporated and further elaborated in the RDCRN Guidance for the Development of Data Sharing Policies by Individual Consortia section below.

Download a PDF of this checklist.

  1. Informed Consent for every protocol should include language that allows broad sharing of the data while protecting the confidentiality of and minimizing the risk for re-identification of the participant. Example language can be found in the RDCRN Guidance for the Development of Data Sharing Policies by Individual Consortia section below. The Informed Consent should also describe:
    1. Sharing data with the Administrative Core of the RDCRC as a limited data set.
    2. Sharing data with the Data Management and Coordinating Center (DMCC).
    3. Sharing data with other researchers for future studies.
    4. Allowing the transfer of data, without identifiers, to a federal data repository maintained by the NIH.
    5. Options for restricting data sharing that the investigators deem necessary to offer the participant (e.g., as mandated by the IRB).
  2. A subcontract between an Administrative Core (AC) and clinical sites must include language that:
    1. Makes explicit that each site agrees to enter data in a data capture system maintained by the DMCC unless otherwise specified by the sponsoring NIH institute, acknowledging that the data entered into the DMCC may meet the definition of a limited data set.
    2. Allows the AC to collate participant data in the form of a limited data set from all protocols that the site participates in.
    3. Gives the AC the authority to distribute and share data with the DMCC and the NIH.
    4. Gives the AC the authority to share the data with other third parties as governed by specific data use agreements.
    5. Gives the AC the authority to transfer the data, without direct identifiers, to a Federal data repository maintained by the NIH.
    6. References the Data Management & Sharing Plan as written in the grant application and any Data Sharing Policy between consortium sites and the administrative core.
    7. Suggested language: Pursuant to this agreement, as outlined in the Data Management & Sharing Plan of award [NIH award number] [Site] will enter [RDCRC] data in a data capture system hosted in the NCATS cloud and maintained by the DMCC on behalf of the [RDCRC] unless otherwise specified by the sponsoring NIH institute. The Administrative Core of [RDCRC] has the authority to collect information from [Site] in the form of a limited data set for the purpose of combining the data across the sites of each protocol. The Administrative Core has the authority to manage the data with the support of the RDCRN DMCC and further share the data with third parties as governed by data use agreements that the Administrative Core will prepare and execute according to the policies of the [RDCRC]. Further, the Administrative Core has the authority to transfer [RDCRC] data to Federal data repositories to fulfill their obligation to the sponsor(s). In doing so, the Administrative Core will utilize the services of the RDCRN DMCC.
  3. A Data Sharing Policy between all participating sites of an RDCRC should be consistent with the RDCRN Guidance for the Development of Data Sharing Policies by Individual Consortia and policies published by the RDCRN and the NIH. This policy should facilitate data sharing and outline the mechanisms of data sharing within the RDCRC and with external partners. The policy should clearly state the intention to transfer patient-level information, stripped of identifiers, to a federal data repository.
  4. A Data Use Policy, consistent with the guidelines published by the RDCRN, that includes a publication policy outlining how data collected in RDCRC studies is to be used, disseminated, and attributed.
  5. To the extent that the DMCC is going to support the RDCRC data sharing efforts and prepare RDCRC datasets for submission to a Federal data repository(ies), ensure that appropriate legal agreements are in place to allow the DMCC to use and further share the data (execution of Data Use Agreement).
  6. Each research participant’s informed consent selections pertaining to data sharing and future data use should be tracked in the research database in order to facilitate the designation of an appropriate consent group when the data is submitted into a Federal data repository.

RDCRN Suggestions for Establishing Data Sharing and Data Management Guidance for Individual Consortia

Data management and data sharing are key aspects of the Rare Diseases Clinical Research Network (RDCRN) and are important considerations across the life span of a consortium. The terms data management and data sharing cover a broad array of agreements governed by a host of regulations. They include agreements between investigators within a consortium, describing how data will be managed and shared internally. There are agreements between research participants and researchers as to how these individuals want their data to be managed and shared. And finally, there are agreements for how the data will be managed and shared with a repository (which applicants will have also outlined in their Data Management and Sharing Plan for each award, and must comply with as a term and condition of their award), making it available to the broader research community and public. Applicants will also outline in their Data Management and Sharing (DMS) Plan how data will be managed and shared for each award, and the DMS Plan must be complied with as a Term and Condition of award.

Currently, some of the consortia within RDCRN have advanced, internal, consortia specific, data sharing practices in place that describe how the consortia collaborators work and share information with each other, while other consortia are in the early stages of developing such practices within a consortium.

The intent of this document, drafted by RDCRN recipients, is to help provide consistency and uniformity across the RDCRN by offering non-binding suggested approaches to participating consortia related to good practices for data sharing along with key factors for consortia to consider when establishing internal data sharing plans.

 

As the RDCRN is an NIH supported network it must first and foremost be consistent with the DMS policy outlined in the Final NIH Policy for Data Management and Sharing, effective January 25, 2023. This policy should serve as the basis for individual consortium data sharing policies, along with other NIH data-specific policies as applicable. (e.g., NDAR for autism related data)

In establishing consortia specific data sharing policies, first and foremost consortia should be consistent with the Final NIH Policy for Data Management and Sharing and be consistent with FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles and any other data sharing policies indicated in the notice of award.

  • This consortium specific, internal, data sharing policy should establish how data will be shared among the consortium sites and with the Administrative Core of the consortium, as well as, in some situations, with parties external to the consortium.
  • If there is existing RDCRN legacy data, the consortium should specify any differences in policy and practice for new versus previously collected data. In writing the consortium-specific data sharing policy, the consortium should consider developing the following components:
    • A data stewardship statement, recognizing who is authorized to make final decisions about the sharing of data, and making reference to any subcontracts or other source documents establishing data stewardship.
    • A publication agreement and authorship policy defining the responsibility to publish, citation of funding, acknowledgement of consortium researchers, and priority of authorship.
    • A management solution for resolving potential conflicts of interest.
    • The recognition of different types of users requesting access to the data (i.e., internal to consortium, collaborator external to consortia, potential industry partner) prior to the data being deposited in the repository and the potential need for a Data Use/Transfer agreement.
    • For any rights to intellectual property generated by the consortium, consortium members should establish appropriate intellectual property agreements detailing the allocation and management of intellectual property rights and rights to carry out follow-on research, development, or commercialization activities, consistent with achieving the goals of the program and with applicable federal laws, regulations, and policies (see relevant FAQ).
    • A data access request procedure and related application form outlining how users will obtain access to the data.
      • The procedure should specify any considerations for different types of data releases, for example, coded limited data sets, coded de-identified data sets, anonymized data sets, and aggregate data sets.
      • Plans for how to document released data should be outlined; such documentation should include descriptions of who the data were shared with and any necessary authorizations that were completed.
      • The procedure should describe how the data can be used upon release (e.g., consistent with applicable federal laws, regulations, policies, and appropriate data use limitations and informed consents).
      • A process for sharing back the resultant data of ancillary studies with the consortium and whether/how to incorporate those results into the database may be outlined.
      • The procedure should define any financial obligations for data sharing and data use (i.e., downloading costs from a cloud environment, data services costs to prepare/aggregate the data, etc.). It is important to clarify if there are any restrictions or limitations for data release to third parties (particularly in light of the use of federal funds to support the research), if sponsor approval is needed to release the data, and if administrative fees may be applied.
    • A plan including any necessary actions to comply with EU GDPR, if the consortium gathers data from participants covered by the EU GDPR.
    • A continuity plan that details a continued strategy for sharing data with relevant stakeholders after the cooperative agreement end date.
Read more about FAIR data principles

At the heart of data sharing is the language of the informed consent, and researchers should outline plans for how data will be shared and how data will be used in future research studies, and a potentially broad audience for the data, consistent with applicable federal laws, regulations, and policies. See Appendix 1 for additional guidance.

  • Consistent with applicable federal laws, regulations, and policies, informed consent language should outline plans for the sharing of patient data, ideally in the form of what the Health Insurance Portability and Affordability Act (HIPAA) names a limited data set. Limited datasets are coded, with direct personal identifiers removed, and may contain dates; they are particularly useful for downstream analysis. Alternatively, de-identified data sets can be shared. Keep in mind that it may be easier to identify participants with rare diseases, special care should be taken to ensure that the risk of re-identification is minimized. The informed consent language should also outline sharing of data with the NIH, RDCRN DMCC (Data Management and Coordination Center), and other researchers, in accordance with an approved IRB protocol, institutional policies, and any applicable laws, consistent with applicable federal, state, local, and Tribal laws, regulations, and policies.
  • Informed consent language should also describe how the research team will protect the privacy, rights, and confidentiality of human research participants, which organizations or institutions will store and share the data, who will have access to the data, and how the data will be used now and in the future.
  • Consortia should consider the adoption of Global Unique Identifiers (GUIDs), the process of which needs to be outlined in the informed consent. The use of GUIDs is recommended to facilitate linkage of data pertaining to the same individuals without using direct identifiers. The RDCRN DMCC recommends using the NINDS Centralized GUID solution, part of the Biomedical Research Informatics Computing System (BRICS) platform, but it is understood that some studies must use other GUID generators. In such instances, the DMCC recommends that, if at all possible, the study team generate the NINDS Centralized GUID in addition to the GUID they are obligated to generate, so that a common GUID will be available for as many RDCRN protocols as is possible.

Any study that collects personal data from participants covered by data privacy laws, regulations or policies, such as the European Union General Data Protection Regulation (EU GDPR), should consider consulting with their institution’s legal advisor.

Guidance from NIH/NCATS: The NCATS RDCRN Data Repository (RDCRN-DR) is an NCATS-funded data sharing resource containing clinical research data from individuals with rare diseases who are enrolled in RDCRN-sponsored protocols. Data types in the RDCRN-DR will reside on an NCATS Federal Government server and will be harmonized to published data standards where feasible to facilitate meta-analyses and the ability to merge with external rare disease data sets. The RDCRN-DR is a highly interoperable, secure, clinical data research environment that will harmonize clinical and patient data. RDCRN-DR data use and transfer agreements developed by NCATS will contain terms and conditions consistent with NIH DMS policy and other NIH data sharing policies and federal, state, local, and Tribal laws, regulations, and policies. In those instances where RDCRN consortia-specific data sharing policies and practices cannot be aligned with NCATS and NIH data use and transfer agreements terms and conditions, the NCATS agreements shall prevail.

Starting in RDCRN cycle four, under RFA-TR-18-020, RDCRN consortia are required to share data with the RDCRN DMCC for the purpose of establishing a federal data repository. NCATS is building the RDCRN-DR as a data sharing resource containing clinical research data from individuals who are enrolled in RDCRN research. Consortia may have additional data sharing obligations indicated in their notice of award; this guidance does not interfere with NIH ICO requirements. In such circumstances, data availability will be coordinated between the RDCRN-DR and ICO-specific repository. Consortium internal policies for data sharing should be consistent with other applicable NIH sharing policies.

  • The consortium cooperative agreement and associated Data Management and Sharing Plan will guide the release of data to the RDCRN-DR; the extent and timeliness of data transfer should be incorporated into the work plan and milestones of each consortium and negotiated with NIH Program Officials.
  • Once the data is deposited by the consortia into the RDCRN-DR, governance of data sharing is the responsibility of the NIH. The RDCRN DMCC will manage the RDCRN-DR under NIH guidance. The NIH will solicit and consider input from all stakeholders in designing policies for the RDCRN-DR.
  • The administrative core of a consortium should have the authority to manage data on behalf of the sites and for the consortium as a whole and to negotiate and execute a data sharing agreement with the RDCRN DMCC, allowing the DMCC to receive and manage the data, for the purpose of populating the RDCRN-DR. This may be accomplished through data sharing language in a Materials Transfer Agreement, Subcontract Award, Data Use Agreement, etc.

Data collected under the Rare Diseases Clinical Research Consortia U54 mechanism should adhere to data standards developed by the RDCRN Data Standards Committee and approved by the RDCRN Steering Committee to ensure best data management practices and to support reuse of the data by approved researchers gaining access to the data through the RDCRN-DR. This information on standards should also be reflected in an award’s Data Management and Sharing Plan. Standards should be incorporated with the design of the study and associated database(s).

  • To the extent feasible, the RDCRN data standards should be applied to the data directly shared by the consortium according to the Consortium-Specific Data Sharing Policies.
  • The RDCRN DMCC will provide guidance on the preparation of datasets that each consortium will transfer to the RDCRN-DR.
  • Quality control of datasets will be a joint effort between the research team, consortium leadership, and the RDCRN DMCC. For datasets deposited into the RDCRN-DR, quality control will be incorporated into the submission and approval process.

Definitions:

  • Aggregate Data Set: Summary statistics compiled from multiple sources of individual-level data.
  • Coded Data Set: Identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information pertains has been replaced with a number, letter, symbol, or combination thereof (i.e., the code); and a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.
  • Anonymized Data Set: Data recorded in such a way that subjects cannot be identified or re-identified.
  • De-identified Data Set: According to HIPAA, a data set can be designated as de-identified based on one of two methods: First, an expert may determine that the dataset has a very small probability of leading to identifying an individual (“expert determination method”). Alternatively (“safe harbor method”) the data set must be stripped of 18 direct identifiers (name, social security number, etc.), and contain no elements of a date except for the year, and no territorial aggregation below the state, except for the first three digits of a ZIP code identifying an area with 20,000 or more inhabitants.
  • Limited Data Set: A limited data set is protected health information from which certain specified direct identifiers of individuals and their relatives, household members, and employers have been removed. A limited data set may be used and disclosed for research, health care operations, and public health purposes, provided the recipient enters into a data use agreement promising specified safeguards for the protected health information within the limited data set.

The informed consent language outlined below is recommended by the RDCRN but NOT required; users are welcome to modify the text to fit their specific needs. An alternative abbreviated paragraph is also included. RDCRN advises the research team to seek input from patients when designing consent/assent language.

Sharing Data with the Rare Diseases Clinical Research Network (RDCRN)

The Rare Diseases Clinical Research Network (RDCRN) is an initiative supported by the National Institutes of Health (NIH) to advance medical research on rare diseases. A long-term goal of the network is to improve diagnosis and treatment of rare disease conditions. Knowledge and data sharing is an integral part of the RDCRN because it helps scientists understand commonalities among different rare diseases and facilitates rapid advancement of research.

Your clinical information, including clinical exam results and other data (referred to as “your data”) [add other types of data such as genomic, medical images, histopathology, etc.] collected for this study may be stored in multiple locations. Your data will be stored within the National Center for Advancing Translational Sciences (NCATS) RDCRN Operational Cloud Environment at NIH and managed by the NIH-funded RDCRN Data Management and Coordinating Center (DMCC) and also will be part of a Federal data repository hosted by the NIH. [Add other locations as necessary] The NIH will be responsible for your data kept in the federal data repository. They will care for your data and make decisions about how they are used. Your clinical data may be stored in perpetuity in these locations.

The RDCRN DMCC uses several layers of protection including: password protected access, data encryption, and constant network monitoring to ensure the security of the stored clinical data. The DMCC systems comply with all applicable guidelines to ensure confidentiality, data integrity, and reliability. We will protect the confidentiality of your information to the extent possible. Your name and other identifying information will be kept locally at the clinical site you attend, in order to contact you, and will not appear in the data stored in the clinical research database or in the federal data repository. At those locations, the data will have a code that links to your identifying information. The code key will be kept in a locked location separate from your health and research information. The code key can only be accessed by people on the research team who have permission from the site investigator. [If a study utilizes e-Consent, the location of identifying information should be outlined].

You may be assigned a code number called a Global Unique Identifier (GUID) using an NIH GUID system. The GUID is a unique code made up of letters and numbers that allows researchers to share data from other studies in which you have participated without letting others know who you are. A GUID does not contain direct identifiers, and you cannot be identified using only the GUID. To generate the GUID, we will ask you for your full date of birth (day, month, year), first name at birth, last name at birth, middle name at birth (if applicable), gender at birth, city/municipality of birth, country of birth. This personally identifiable information (PII) will be processed using the NIH Centralized GUID generator software program. Once the GUID is produced, there is no way to get back to your PII. The software will not keep your PII, but will have enough information to determine if you already have a GUID assigned in the system. If you participate in another project and provide the same PII, you will be assigned the same GUID.  Your GUID will be part of our research records.

We would like to make your data, without direct personal identifiers, available for other research studies that may be done in the future. Our goal is to make more research possible to learn about health and disease. Future research may be about similar diseases or conditions to this study. However, research could also be about unrelated diseases, conditions, or other aspects of health. These studies may be done by researchers at other institutions, including commercial entities, and they may be from anywhere in the world. They may work at universities or hospitals. They may work for a government. They may work for companies to make new medicines or products, which may generate profit. You may not benefit directly from allowing your information to be shared. You will not be paid for the future sharing or future use of your data. There will be an approval process for researchers who want to work with study records that might identify you. They will have to tell the NIH, through a data access request and application process, about the research they want to do. They will have to do ethics training and their study needs to have IRB approval.

A central purpose of this study is to share data, so when you agree to participate in this study, we will share your data as described above. You can change your mind later, but researchers may still use your data that has already been shared.

Investigators who collect data as part of the RDCRN are encouraged to consider the items below when developing informed consent language.

Special Considerations:

  • Certificates of Confidentiality requirements. All applicable NIH research is automatically issued a Certificate and encumbers not only the originating investigator/institution but also anyone else that receives the data/specimen to protect it from unauthorized disclosure. This requirement does not go away just because data/specimens are deidentified.
  • The NIH encourages sharing of federally funded research data. Investigators should consider including consent language that describes the benefits of data sharing and is consistent with NIH policies.
  • Certain data types, such as genomic and imaging data, may require special consent language. For example, some genomic studies may need to include language that describes whether genetic results will be returned to participants. At a minimum, consent language should state what data types will be shared (e.g., genomic, imaging, etc.) and for what purposes (e.g., General Research Use).
  • Any study that collects personal data from participants covered by data privacy laws, regulations or policies, such as the European Union General Data Protection Regulation (EU GDPR) should consider consulting with their institution's legal advisor.
  • Research teams should consider whether the data will be shared in the future as a coded data set, which can be linked back to identifiable information, or a fully anonymized data set, which cannot be linked back to personal identifiable information. This will help in the design of the appropriate consent language.

Alternate Abbreviated Paragraph

This research is part of the Rare Diseases Clinical Research Network (RDCRN), an initiative of the National Institutes of Health (NIH) to advance medical research on rare diseases. A long-term goal of the network is to improve diagnosis and treatment of rare disease conditions. The clinical information collected for this study will be stored within the National Center for Advancing Translational Sciences (NCATS) RDCRN Operational Cloud Environment at NIH and managed by the NIH-funded RDCRN Data Management and Coordinating Center (DMCC) and also will be part of a Federal data repository hosted by the NIH and shared broadly beyond the consortium. The NIH and the data management center uses several layers of protection for the clinical data stored there. It meets all of the local and federal security requirements for research datacenters. The NIH may make your data, without direct personal identifiers, available for other research studies in the future. Future research may be about similar diseases or conditions to this study, but could also be about unrelated diseases, conditions, or other aspects of health. Researchers who want access to your data will have to tell the NIH, through a data access request and application process, about the research they want to do. They will have to do ethics training and have IRB approval to do the research. They will have to sign a legal agreement stating they will not try to find out who you are.

Fundamentals of Data Sharing Relationships

Parties

The possible parties involved in data sharing activities within the network are: a clinical research site, the institution that houses an administrative core of a consortium, the institution that houses the data management infrastructure of a consortium, the institution that functions as the RDCRN DMCC, NIH institutes that may host and/or share data generated by a consortium, partner patient advocate groups (PAGs), and other parties external to the network.

Relationships Within a Consortium

The administrative core of a consortium holds the responsibility to make data generated under the U54 mechanism available as outlined in the data sharing plan of the grant and to ensure alignment with participant consent. The administrative core should have the delegated authority from all participating clinical sites to share data with third parties and to deposit the data in an NIH-sanctioned data repository. This concept is further elaborated in the Principles of Data Sharing Checklist and the RDCRN Data Repository sections above.

When a Consortium Shares Data Internally

Within a consortium, data sharing terms should be specified within subaward language between clinical sites and the administrative core and further operationalized in a consortium data sharing policy and/or a publication and analysis plan. When partner PAGs are involved in research and contribute to data collection, they should agree to the same data sharing terms consistent with other clinical sites. Consortia should outline a data access request process and establish a data access committee for the review of requests. This concept is further elaborated in the Principles of Data Sharing Checklist and the Consortium-Specific Data Sharing Policies sections above.

When a Consortium Shares Data Externally

A consortium data sharing policy should establish a data access request procedure for external parties, which may include PAGs, that consists of a review by a data access committee prior to release of data. In most circumstances, a data use agreement between the administrative core and the external party will be necessary to specify the terms and use of the data. This concept is further elaborated in the Principles of Data Sharing Checklist and the Consortium-Specific Data Sharing Policies sections above.

When a Consortium Submits Data to an NIH-Designated Data Repository

The RDCRN is establishing a data repository to facilitate data sharing and developing policies and procedures for data ingestion. Depending on the cooperative agreement between the funding institute(s) and the institution that houses the administrative core, consortia may be required to share data with other repositories and adhere to their policies. Requirements for accepting data into a repository should be considered when designing a study and when writing a consent form. Discover the recommended consent language in APPENDIX 1 – Example of Recommended Consent Language. Any data use limitations imposed by research participants’ informed consent should be tracked as part of the study and expressly delineated when sharing participant data. Read more about the planned RDCRN Data Repository above.

NIH Resources on Data Management and Data Sharing

  • RDCRN RFA/NOFO: The Rare Diseases Clinical Research Network specifies data sharing requirements in the funding opportunity.
  • Scientific Data Sharing: Investigators funded under the Rare Diseases Clinical Research Consortia U54 mechanism are expected to share the resulting data in accordance with the NIH Policy for Data Management and Sharing (NOT-OD-21-013), through a broadly accessible NIH-sanctioned repository built for the Division of Rare Diseases Research Innovation (DRDRI)/National Center for Advancing Translational Sciences (NCATS) and operated by the RDCRN Data Management and Coordinating Center (DMCC).
  • Writing a Data Management & Sharing Plan: The NIH has provided an optional Data Management and Sharing Plan Format page as well as numerous sample plans from different institutes.
  • Webinars from the NIH: Take a deeper dive to understand the 2023 Final NIH Policy for Data Management & Sharing by viewing webinars hosted by the NIH.

This content is evolving and may change over time as new information becomes available.