AION Database - Onboarding

During onboarding, you may submit past samples and past classified variants to the AION DB in order to carry information previously generated at your institution.

For this, three resources need to be provided to the support team. The data submission is done through direct communication with the support team. Sample files are provided to facilitate the process.

Required data

The required resources, along with the required information are:

  • VCF data: Single sample VCF files with data in the format and with the contents supported by AION are required to build the variant statistics.
  • VCF metadata: A TSV file with the list of submitted VCF files and noting the reference genome used for each VCF file, the sample ID (same as the one in the VCF file) and the affected status.
Classified variants: A TSV file containing classified variants. It should contain the following data: chromosome, position, ref, alt, zygosity, past interpretation, sample ID (same as the VCF data files the the classified variant is also in one of them), affected status, reference genome, original classification/submission time and the person who originally classified/submitted the variant.

Detailed specifications


The following is a detailed description of allowed values for each field of the TSV files:

VCF metadata file:

  • vcf_filename: the filename of the VCF file containing the extension and without any path information
  • sample_id [optional]: the sample ID contained within the VCF file

  • affected_status [optional]: the affected status of the individual. Possible values:

    • affected

    • unaffected

    • unknown

  • assembly_version: the assembly version used during secondary analysis. Possible values:

    • hg19/GRCh37

    • hg38/GRCh38

Classified variants file:

  • chr: the chromosome where the variant is located

  • position: the position of the variant

  • ref: the reference sequence at the variant position

  • alt: the variation

  • zygosity: the zygosity of the variant. Accepted values:

    • heterozygous

    • homozygous_alt

    • hemizygous_alt

  • past_interpretation: the classification given to each variant. Accepted values:

    • Causative

    • Pathogenic

    • Likely pathogenic

    • VUS

    • Likely benign

    • Benign

    • Artifact 

  • sample_id [optional]: the sample ID for each variant. This is useful mainly if a previously classified variant is also in the VCF files submitted. If that is the case, then the same sample ID should be used for both.

  • affected_status [optional]: the affected status of the individual. Accepted values:

    • affected

    • unaffected

    • unknown

  • assembly_version: the assembly version used during secondary analysis. Accepted values:

    • hg19/GRCh37

    • hg38/GRCh38

  • submitted_on [optional]: the date when the variant was originally classified in YYYY-MM-DD format.

  • submitted_by [optional]: the name of the scientist that classified the variant. If they are a current user of AION, the email address linked to AION would facilitate traceability of the classified variants.