← Theory

Linkage Disequilibrium

Non-random association of alleles at different loci

What is Linkage Disequilibrium?

Linkage disequilibrium (LD) occurs when alleles at different loci are associated non-randomly. If knowing the allele at one locus tells you something about the allele at another locus, those loci are in LD.

Key insight: Despite the name, linkage disequilibrium doesn't require physical linkage. Any non-random association creates LD, though it decays faster for unlinked loci.

Gamete Frequencies

Consider two biallelic loci: A/a and B/b. There are four possible gamete types:

Gamete Frequency Expected (equilibrium)
AB x₁₁ p_A × p_B
Ab x₁₂ p_A × q_B
aB x₂₁ q_A × p_B
ab x₂₂ q_A × q_B

At linkage equilibrium, gamete frequencies equal the products of allele frequencies. Any deviation from this is disequilibrium.

Measuring LD: The D Statistic

The coefficient of linkage disequilibrium, D, measures the departure from equilibrium:

D = x₁₁ × x₂₂ - x₁₂ × x₂₁ Coefficient of linkage disequilibrium

Equivalently:

D = x₁₁ - p_A × p_B Deviation of AB gamete from expectation
  • D = 0: Linkage equilibrium (random association)
  • D > 0: Coupling phase excess (AB and ab more common)
  • D < 0: Repulsion phase excess (Ab and aB more common)

Decay of LD

Recombination breaks down LD over time. With recombination rate r between loci:

D(t) = D₀ × (1-r)^t Exponential decay of LD

The half-life of LD is:

t₁/₂ = ln(2) / ln(1/(1-r)) ≈ 0.693/r Half-life of linkage disequilibrium

Interactive LD Decay

Adjust the recombination rate to see how quickly LD decays. Tightly linked loci (low r) maintain LD much longer than loosely linked or unlinked loci.

0.10
0.20
Half-life 6.6 gen

LD decays exponentially toward zero. Free recombination (r = 0.5) gives fastest decay.

Normalized Measures

Because D depends on allele frequencies, normalized measures are often used:

D' (D-prime)

D' = D / D_max Ranges from -1 to +1

Where Dmax is the maximum possible D given the allele frequencies. |D'| = 1 indicates no recombination has occurred between these alleles.

r² (Correlation)

r² = D² / (p_A × q_A × p_B × q_B) Squared correlation coefficient

Ranges from 0 to 1. Used extensively in GWAS for determining tagging power of SNPs.

Sources of LD

Population Processes

  • Genetic drift: Random associations accumulate in finite populations
  • Population admixture: Mixing populations with different allele frequencies
  • Population bottlenecks: Founder effects create LD from sampling
  • Selection: Hitchhiking of linked variants with selected allele

Genomic Factors

  • Physical linkage: Nearby loci have low recombination rates
  • Chromosomal inversions: Suppress recombination in heterozygotes
  • Epistatic selection: Favorable allele combinations maintained
New mutations arise in complete LD with all other variants on the same chromosome. This LD erodes over time through recombination, creating a "molecular clock" for estimating mutation age.

LD in Population Genetics

Genetic Mapping

LD is the basis of association mapping. Disease-causing variants are identified through their LD with nearby marker loci — we don't need to genotype the causal variant itself if we can detect a linked marker.

Haplotype Structure

Regions of high LD form haplotype blocks — sets of alleles that are inherited together. Block boundaries occur where recombination is common.

Selective Sweeps

When positive selection rapidly increases an allele's frequency, linked variants "hitchhike" along, creating extended regions of high LD. Detecting such patterns helps identify recent selection.

LD and Effective Population Size

The expected LD in a population reflects its history. At equilibrium between drift (creating LD) and recombination (breaking it down):

E[r²] ≈ 1 / (4N_e × r + 1) Expected LD under drift-recombination balance

This relationship allows estimation of effective population size from LD patterns, particularly useful for inferring historical population sizes.