VCF Format.

Although VCF (Variant Calling Format) is a well-recognised standard file type in bioinformatics, variant calling pipelines might generate different kinds of VCF files. We strongly recommend performing variant calling following best practices, including processes such as variant normalisation and joint variant calling. Follow this link for further information.

Uncompressed and compressed VCF files are supported in AION. The file extensions must be .vcf or .vcf.gz.

To verify that your VCF files adhere to this standard, you can simply open them. VCF files can be opened by plain text editors (Notepad, TextEdit…). Even if they are initially recognised as Virtual Contact Files in some platforms, the files can be opened by text editors. In the case of compressed files, where the extension is .vcf.gz instead of .vcf, the files should be decompressed before opening.

All sequenced chromosomes from a sample need to be in the same VCF file, i.e. splitting VCF files into chromosomes is not supported.

ℹ️ Combined VCF files (including both small variants and CNV/SV) are supported for the proband. Parents VCF files for small variants need to be submitted as single-sampled VCFs before uploading.
ℹ️  Genomic VCF files (gVCF) are not supported

image-png-Feb-21-2024-08-24-46-3211-AM

Refer to Troubleshooting to see a list of general recommendations and error codes.

Small variants VCFs

The following is the minimum VCF file required by AION for small variants:

##fileformat=VCFv4.1
##FORMAT=<ID=AD,Number=R,Type=Integer,Description=“Allelic depths for the ref and alt alleles in the order listed”>
##FORMAT=<ID=DP,Number=1,Type=Integer,Description=“Approximate read depth (reads with MQ=255 or with bad mates are filtered)“>
##FORMAT=<ID=GT,Number=1,Type=String,Description=“Genotype”>
##contig=<ID=X,length=ZZZ,assembly=YYYY>
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SampleID
chr3    239313    rs35603824    C    CG    433899    PASS  . GT:AD:DP 0/1:25,12:37

The following is a checklist to make sure your VCF file is compatible with AION:

  • Does your file have a first line specifying the file format? (required)

    • An example of this could be ##fileformat=VCFv4.1

  • Does the file contain contig lines referring to the reference genome used?

    • Lines referring to the contigs used should be included following the format ##contig=<ID=X,length=ZZZ,assembly=YYYY>.

  • Does the file contain the following columns, separated by tabulations, after the header (## lines)?

    • CHROM, POS, REF, ALT, QUAL, FILTER, INFO, FORMAT, Sample ID

    • Before these column names, only one “#” symbol should be present as shown above

    • All columns except the Sample ID, which would change from one case to the next, should be written exactly as shown here

    • Sample ID column name should contain numbers, upper and lowercase letters. Symbols such as - and _ are supported, but others should be avoided along with special characters

  • Does you file have the following FORMAT fields present: AD, DP, GT

    • Currently, AION supports the following alternative formats for the required fields:

      • For DP: AFDP

      • For AD: CLCAD2, VR, AO

    • Allelic depth is a required info for AION. Most variant callers store it in AD field, but some are missing this. AION support conversion from AO (alt observations) and RO (ref observations) to AD automatically. To avoid errors, please ensure your vcf contains either AD or AO+RO in both header and INFO or FORMAT fields.

  • Are all additional data in the columns FILTER, INFO, and FORMAT described in the header?

    • If these data are not detailed properly in the header, the file might cause an error.

    • Format of FILTER, INFO, FORMAT specifications in the header:

      ##FILTER=<ID=ID,Description="description">
      ##INFO=<ID=ID,Number=number,Type=type,Description="description",Source="source",Version="version">
      ##FORMAT=<ID=ID,Number=number,Type=type,Description="description">
  • Is your VCF file single or multi-sample?

    • In this moment, AION does not support multisample files. When submitting trio or duo cases (patient and parents), VCF files must be first called jointly, and then be submitted independently as 3 or 2 different files. If you have a jointly called multisample VCF file, you should split it into individual samples first.

CNV / SV variants VCFs

AION supports CNV and SV data in VCF file format. Only CNVs or SVs of type DUP or DEL are supported. Only variants with VCF FILTER column value PASS (or cnvLength in case of Canvas & Dragen CNV) are annotated.

There are many supported pipelines and variant callers. The following is a shortlist of supported pipelines. Your pipeline may be supported even if it is not in this list or if it a combination of these callers:

  • DRAGEN SV & CNV

  • Canvas

  • Manta

  • ExomeDepth

  • CNVkit

  • Codex

Additionally, AION is considering a subset of all variants as relevant variant based on variant quality score, length and gnomAD SV frequency data. AION implements a default filtering for all callers, and a specific length-based filtering for DRAGEN, Canvas and Manta.

Relevant Variants Inclusion Criteria:

  • P (Pathogenic) and LP (Likely Pathogenic) and variants by ACMG classification
  • VUS (Variant of Uncertain Significance) variants by ACMG classification associated with a known disease gene
  • Variants with gnomAD frequency ≤ 1%, if gnomAD data are available

For VCFs generated by Illumina tools, we additionally apply length based criteria and consider as relevant:

  • LENGTH > 10Kb for Canvas or DRAGEN CNV variants
  • 2kb < LENGTH ≤ 10Kb for Manta or DRAGEN SV variants

Additional resources: