Data Submission Guideline for GCAD Harmonization

To submit data, please fill out the Data Registration Template. A member from NIAGADS will work with you on data transfer. If you have any questions regarding the template forms or uploading issues, please contact NIAGADS@pennmedicine.upenn.edu.

NOTE: All documents related to the application should be provided in English. For institutions where English is not the primary language, please provide translations of documents along with the original document. Translated documents should be signed by the institutional signing official.

Required Policy Documents

The following documents are required in order to deposit and share your data through NIAGADS:

Institutional Certification for ADRD Studies that covers all subjects in your study. Multiple certifications may be required.
Signed copy of the NIA AD Genomics Sharing Plan.
Dataset Registration Template completed: 01_DSS_Dataset_Registration_Template.docx

Subject and Sample ID Re-mapping

All sequenced subjects and any non-sequenced connecting family members submitted to be harmonized with ADSP data will be renamed to fit the ADSP UID schema. NIAGADS will remap all data you submit with the ADSP UID. A separate protocol and template sheet is provided; please see the document named 02_ADSPID_Assignment_Instructions.docx for further instructions and 02_SampleID_forADSPassign_DS.xlsx to enter sample info.

Phenotypes

Phenotypes should be provided according to the ADSP format. Use the data dictionary, 03_ADSP_Phenotypes_Augmentation_DD.docx, as a reference to reformat all phenotypes to the ADSP format. A template is provided for use with the data dictionary above; 03_ADSP_Phenotypes_Augmentation_DS.xlsx.

GWAS

If available, please provide GWAS or exome chip data for all sequenced subjects and any connecting family members. Genotype data should be in PLINK binary format, preferably on build hg38, and formatted to the forward strand. This will help with calling structural variants. Please make sure that we are able to map the IDs in the GWAS to the IDs used in the sequencing.

Sequencing Data

CRAM, BAM or FASTQ

Sequencing read data can be submitted in any of formats:

FASTQ: please save all reads, including those that could not be mapped to the reference genome.
BAM: please save all reads, including those that could not be mapped to the reference genome.
CRAM: please save all reads, including those that could not be mapped to the reference genome.

No matter what your input data file format is, please include information about how the sequencing was performed (see “Sequencing Protocol/Pipeline, Kit Info, WES target regions” section below for more details). Also please send us the list of samples that will be transferred (sample manifest), as well as sequencing quality control metrics that have been collected.

Seq Protocol/Pipeline, Kit Info, WES target regions

Provide any relevant sequencing information, including the following:

Sequencing Center
Sequencer Machine
Read Length
PCR Free or PCR Amplified?
Kit Name/Version
Copy of the WES target regions if applicable
Sequencing Quality Control Metrics

NOTE: Please provide md5 checksum for every submitted data file to ensure submission completion.

Download the instructions and all template files in a .zip package, DSS_GCAD_SeqDataSubmission.zip.