What is Linkage Disequilibrium?
Linkage disequilibrium (LD) occurs when alleles at different loci are associated non-randomly. If knowing the allele at one locus tells you something about the allele at another locus, those loci are in LD.
Gamete Frequencies
Consider two biallelic loci: A/a and B/b. There are four possible gamete types:
At linkage equilibrium, gamete frequencies equal the products of allele frequencies. Any deviation from this is disequilibrium.
Measuring LD: The D Statistic
The coefficient of linkage disequilibrium, D, measures the departure from equilibrium:
Equivalently:
- D = 0: Linkage equilibrium (random association)
- D > 0: Coupling phase excess (AB and ab more common)
- D < 0: Repulsion phase excess (Ab and aB more common)
Decay of LD
Recombination breaks down LD over time. With recombination rate r between loci:
The half-life of LD is:
Interactive LD Decay
Adjust the recombination rate to see how quickly LD decays. Tightly linked loci (low r) maintain LD much longer than loosely linked or unlinked loci.
LD decays exponentially toward zero. Free recombination (r = 0.5) gives fastest decay.
Normalized Measures
Because D depends on allele frequencies, normalized measures are often used:
D' (D-prime)
Where Dmax is the maximum possible D given the allele frequencies. |D'| = 1 indicates no recombination has occurred between these alleles.
r² (Correlation)
Ranges from 0 to 1. Used extensively in GWAS for determining tagging power of SNPs.
Sources of LD
Population Processes
- Genetic drift: Random associations accumulate in finite populations
- Population admixture: Mixing populations with different allele frequencies
- Population bottlenecks: Founder effects create LD from sampling
- Selection: Hitchhiking of linked variants with selected allele
Genomic Factors
- Physical linkage: Nearby loci have low recombination rates
- Chromosomal inversions: Suppress recombination in heterozygotes
- Epistatic selection: Favorable allele combinations maintained
LD in Population Genetics
Genetic Mapping
LD is the basis of association mapping. Disease-causing variants are identified through their LD with nearby marker loci — we don't need to genotype the causal variant itself if we can detect a linked marker.
Haplotype Structure
Regions of high LD form haplotype blocks — sets of alleles that are inherited together. Block boundaries occur where recombination is common.
Selective Sweeps
When positive selection rapidly increases an allele's frequency, linked variants "hitchhike" along, creating extended regions of high LD. Detecting such patterns helps identify recent selection.
LD and Effective Population Size
The expected LD in a population reflects its history. At equilibrium between drift (creating LD) and recombination (breaking it down):
This relationship allows estimation of effective population size from LD patterns, particularly useful for inferring historical population sizes.