Base by Base

·S2 E395

395: Extended sequence context shapes mutational bias in Escherichia coli

June 18
22 mins

Episode Description

Green R et al., PNAS - Collating >100,000 base-pair substitutions from 32 mutation-accumulation experiments, this study shows that sequence context well beyond adjacent bases — up to ±6 bp and even hundreds of bp — shapes mutational biases in E. coli and interacts with DNA repair. Key terms: mutational bias, sequence context, Escherichia coli, mismatch repair, mononucleotide runs.

Study Highlights:
The authors analyzed 117,807 base-pair substitutions from 32 MA experiments and quantified nucleotide frequencies up to ±6 bp (and sliding windows to 1,000 bp) around mutation sites. Extended context effects vary by substitution type, DNA repair background (proofreading and MMR), and replication strand. Mononucleotide runs (notably AC3+ and GC3+) are strong hotspots consistent with transient misalignment; GC3+ can increase G:C→C:G transversions by orders of magnitude. Broader GC% biases persist hundreds of base pairs and are modulated by MMR activity.

Conclusion:
Extended sequence context and its interaction with proofreading, mismatch repair, and replication strand identity create complex, BPS-specific mutational signatures in E. coli, improving the resolution of mutation-rate predictions and highlighting long-range and motif-specific hotspots.

Music:
Enjoy the music based on this article at the end of the episode.

Article title:
Extended sequence context shapes mutational bias in Escherichia coli

First author:
Green R

Journal:
PNAS

DOI:
10.1073/pnas.2601345123

Reference:
Green R., Jago M.J., Knight C.G., Czernuszka M.R., Denisova S., Krašovec R., Lagator M. Extended sequence context shapes mutational bias in Escherichia coli. PNAS. 2026;123(23):e2601345123. doi:10.1073/pnas.2601345123.

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.com/episodes/extended-sequence-context-mutational-bias-e-coli

QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-06-18.

QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited sections covering extended sequence context (±6 bp), mononucleotide run hotspots (AC3+, GC3+), GC3+ and G:C→C:G transversions, 5′ preceding nucleotide effects, leading vs lagging strand replication, and GC-content effects up to 1000 bp.
- transcript topics: Extended sequence context (±6 bp); Mononucleotide runs and transient misalignment; GC3+ hotspot and other motifs; DNA proofreading and mismatch repair effects; Leading vs lagging strand replication and context biases; Regional GC-content effects up to 1000 bp

QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 6
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0

Metadata Audited:
- article_doi
- article_title
- article_journal
- license

Factual Items Audited:
- Extended context up to ±6 bp influences mutational bias beyond trinucleotide context
- Mononucleotide runs AC3+ and GC3+ are mutational hotspots; GC3+ increases G:C→C:G transversions up to ~10^4-fold
- A strong GC3+ hotspot near GG sequences can reach extremely large fold increases; in some backgrounds ~50,000-fold for G:C→C:G transversions at GG C7
- 5′ preceding nucleotide biases modulate GC3+ hotspot strength
- Leading vs lagging strand replication affects context biases; biases stronger when purine templates the leading strand
- Regional GC-content up to 1,000 bp away contributes to mutational bias; AT-rich regions show differential effects depending on repair status (MMR/proofreading)

QC result: Pass.

See all episodes