For optimising your workflow, aside from the detailed information and the files needed to launch a case (Launching cases from pre-uploaded files) notice that for exomes (WES) and panels, before starting to run cases, you will need to provide the following;
- BED file specific for the targeted region for the data based on the kit used during the library preparation for the samples.
- FASTQ files for 5-20 samples of both sex for the CNV reference cohort generation see the section below for more details and instructions how to select your reference samples
To provide both the BED file(s) and the files to generate your reference cohort(s) for the CNV analysis please contact support@nostos-genomics.com so that we can guide and support you establishing the settings for your pipeline.
Why are reference samples important for CNV calling in non-WGS samples?
CNV detection methods based on read depth in whole-genome sequencing (WGS) generally rely on the assumption that reads are evenly distributed across the genome. This uniformity allows variations in read depth to signal the presence of CNVs. However, this assumption does not hold true for whole-exome sequencing (WES) and targeted sequencing. The reason for this is that the capture probes used to target specific genomic regions have variable efficiencies depending on the region. This variation introduces significant biases in the number of reads mapped to each region, complicating CNV detection. ExomeDepth requires 5-10 reference samples to help correct for these biases, which arise due to the inconsistent capture efficiency across exons or targeted regions.
How do I select my reference files?
To ensure accurate CNV detection, the reference samples should share the following characteristics with the test sample (the sample being analysed):
- Most importantly: The samples must be prepared using the same library preparation method and sequenced using the same sequencing platform.
- Ideally: The reference samples should come from unrelated individuals, excluding samples from the same family as the samples being analysed, as related individuals can introduce unwanted similarities. In the event that the customer does not have an independent cohort, Nostos secondary analysis pipeline has implemented a dynamic exclusion of the current sample analysed to avoid biases in the CNV calling.
- For CNV detection in sex chromosomes, all samples (both reference and test) must be from individuals of the same sex, either all male or all female. Using samples of different sexes would result in unreliable CNV calls on the sex chromosomes. -> This will be handled by Nostos automatically, but when transferring cohort files, both sex need to be represented in the cohort
The process for cohort generation
The cohort generation is fully handled by Nostos Genomics Support team as part of the onboarding of new customer / primary analysis workflows to the scope of secondary analysis, but the reference samples and BED file need to be made available to Nostos team before processing of samples can happen. In case user tries to launch a analysis using a BED file for which we the setup of cohort is incomplete, the following message is shown: