During onboarding, you may submit past samples and past classified variants to the AION DB in order to carry information previously generated at your institution.
For this, three resources need to be provided to the support team. The data submission is done through direct communication with the support team. Sample files are provided to facilitate the process.
Required data
The required resources, along with the required information are:
- VCF data: Single sample VCF files with data in the format and with the contents supported by AION are required to build the variant statistics.
- VCF metadata: A TSV file with the list of submitted VCF files and noting the reference genome used for each VCF file, the sample ID (same as the one in the VCF file) and the affected status.
Detailed specifications
The following is a detailed description of allowed values for each field of the TSV files:
VCF metadata file:
- vcf_filename: the filename of the VCF file containing the extension and without any path information
-
sample_id [optional]: the sample ID contained within the VCF file
-
affected_status [optional]: the affected status of the individual. Possible values:
-
affected
-
unaffected
-
unknown
-
-
assembly_version: the assembly version used during secondary analysis. Possible values:
-
hg19/GRCh37
-
hg38/GRCh38
-
Classified variants file:
-
chr: the chromosome where the variant is located
-
position: the position of the variant
-
ref: the reference sequence at the variant position
-
alt: the variation
-
zygosity: the zygosity of the variant. Accepted values:
-
heterozygous
-
homozygous_alt
-
hemizygous_alt
-
-
past_interpretation: the classification given to each variant. Accepted values:
-
Causative
-
Pathogenic
-
Likely pathogenic
-
VUS
-
Likely benign
-
Benign
-
Artifact
-
-
sample_id [optional]: the sample ID for each variant. This is useful mainly if a previously classified variant is also in the VCF files submitted. If that is the case, then the same sample ID should be used for both.
-
affected_status [optional]: the affected status of the individual. Accepted values:
-
affected
-
unaffected
-
unknown
-
-
assembly_version: the assembly version used during secondary analysis. Accepted values:
-
hg19/GRCh37
-
hg38/GRCh38
-
-
submitted_on [optional]: the date when the variant was originally classified in YYYY-MM-DD format.
-
submitted_by [optional]: the name of the scientist that classified the variant. If they are a current user of AION, the email address linked to AION would facilitate traceability of the classified variants.