Episode Description
Chaldebas M et al., The American Journal of Human Genetics - Chaldebas et al. present 5ULTRA, a computational pipeline that integrates uORF databases, Kozak-motif features, splicing prediction, and a random-forest score to detect and prioritize 5′ UTR variants predicted to alter protein translation. The score correlates with proteomic and MPRA measures and is applied to population, somatic, GWAS, and rare-disease datasets to nominate candidate functional variants. Key terms: 5' UTR, uORF, Kozak motif, translation regulation, machine learning.
Study Highlights:
The authors developed 5ULTRA to annotate SNVs, indels, and splicing variants that create/disrupt uORFs or alter Kozak strength, integrating comprehensive uORF databases and SpliceAI. A random-forest 5ULTRA score trained on HGMD and gnomAD distinguishes likely translation-impacting variants and achieved strong cross-validation performance and AUC = 0.82 on an independent ClinVar test. The score correlates with cis-pQTL effect sizes (Spearman rho = 0.57) and with MPRA ribosome-load measurements (rho = 0.78). Genome-wide screening found thousands of candidate variants, highlighted rare/conserved signals in disease genes, and nominated examples in cancer, GWAS loci, and rare infections.
Conclusion:
5ULTRA provides a validated, transcript-aware framework to detect and prioritize 5′ UTR variants that modulate translation, offering mechanistic hypotheses for noncoding variant interpretation in rare disease, cancer, and complex-trait genetics; the tool and data are publicly available under a CC BY license.
QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-03-30.
QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Substantive auditing focused on the scientific content described in the transcript and its alignment with the AJHG article: 5ULTRA architecture, features, SpliceAI integration, validation metrics, somatic/GWAS/infectious disease applications, limitations, and open-source availability.
- transcript topics: 5′ UTR regulatory elements (Kozak motif, uORFs) and translation initiation; 5ULTRA methodology and data integration (MANE transcripts, uORFdb, Ribo-uORF, SpliceAI); Machine-learning scoring (17 features; PhyloP conservation as key predictor; uORF/k Kozak annotations); Model validation (ClinVar, cross-validation AUC, accuracy); Correlation with proteomics and MPRA data (cis-pQTL, ΔMRL); Somatic cancer applications (NRAS and ABI1 examples; splicing effects; N-terminal extensions)
QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 7
- claims flagged for review: 0
- metadata checks passed: 7
- metadata issues found: 0
Metadata Audited:
- article_doi
- article_title
- article_journal
- license
- episode_title
- episode_number
- season
- reference
Factual Items Audited:
- 5ULTRA identifes and prioritizes 5′ UTR variants that affect translation via uORFs and Kozak motifs
- 17 features used by the 5ULTRA random forest model; PhyloP conservation of uORF start codon as the strongest predictor
- Genome-wide analysis: ~28 million 5′ UTR variants; ~137k predicted to affect translation via URFs or Kozak changes
- ClinVar independent test AUC ≈ 0.82 and ClinVar threshold-based accuracy ≈ 80.8%
- Cross-validation 5-fold AUC ≈ 0.981; MPRA and pQTL data show concordant translation effects (ΔMRL, Spearman ρ values ~0.78; 5ULTRA vs cis-pQTL ρ ≈ 0.57)
QC result: Pass.
Chapters
- (00:00:08) - Genome Wide Detection of Human 5 UTR Variants
- (00:06:41) - How a Deep Learning Algorithm Can Identify Dangerous Human Variants
- (00:12:35) - 5 Ultra: The computational genetics of cancer
- (00:18:46) - How to decode the secrets of the human genome