Hypothesis Testing

crabML provides publication-ready hypothesis tests for detecting natural selection using likelihood ratio tests (LRT).

Quick Testing for Positive Selection

The simplest way to test for positive selection:

from crabml import positive_selection

results = positive_selection(
    alignment='alignment.fasta',
    tree='tree.nwk',
    test='both'  # Runs M1a vs M2a and M7 vs M8
)

# Check M1a vs M2a
if results['M1a_vs_M2a'].significant(0.05):
    print("Positive selection detected!")
    omega = results['M1a_vs_M2a'].omega_positive
    print(f"ω for positively selected sites: {omega:.2f}")

# Check M7 vs M8 (more conservative)
if results['M7_vs_M8'].significant(0.05):
    print("Positive selection confirmed by M7 vs M8")

Site-Class Model Tests

M1a vs M2a: Nearly Neutral vs Positive Selection

The standard test for positive selection.

Null model (M1a): Two site classes

ω₀ < 1 (purifying selection)
ω₁ = 1 (neutral evolution)

Alternative model (M2a): Three site classes

ω₀ < 1 (purifying)
ω₁ = 1 (neutral)
ω₂ > 1 (positive selection)

from crabml import m1a_vs_m2a

result = m1a_vs_m2a('alignment.fasta', 'tree.nwk')

# View formatted summary
print(result.summary())

# Access specific values
print(f"LRT statistic: {result.LRT:.2f}")
print(f"P-value: {result.pvalue:.6f}")
print(f"Degrees of freedom: {result.df}")

if result.significant(0.05):
    print(f"Proportion of sites under selection: {result.p_positive:.1%}")
    print(f"ω for positively selected sites: {result.omega_positive:.2f}")

Output example:

================================================================================
Likelihood Ratio Test for Positive Selection
================================================================================

Test: M1a vs M2a

NULL MODEL (M1a):
  Log-likelihood: -902.503872
  Parameters:
    p0 = 0.4923 (proportion ω < 1)
    ω0 = 0.0538
    κ  = 2.2945

ALTERNATIVE MODEL (M2a):
  Log-likelihood: -899.998568
  Parameters:
    p2 = 0.2075 (proportion ω > 1)
    ω2 = 3.4472

LIKELIHOOD RATIO TEST:
  LRT statistic: 5.0106
  Degrees of freedom: 2
  P-value: 0.0817

CONCLUSION:
  No significant evidence for positive selection (α = 0.05)
================================================================================

M7 vs M8: Beta vs Beta + Omega

A more conservative test using beta distributions.

Null model (M7): Beta distribution constrained to 0 < ω < 1

Alternative model (M8): Beta distribution + additional class with ω > 1

from crabml import m7_vs_m8

result = m7_vs_m8('alignment.fasta', 'tree.nwk')

if result.significant(0.01):  # More stringent threshold
    print("Strong evidence for positive selection")

Interpretation:

More conservative than M1a vs M2a
Better for datasets with complex omega distributions
Less prone to false positives

M8a vs M8: Null Test with 50:50 Mixture

An alternative null model fixing the selection class at ω = 1.

Null model (M8a): Beta + ω = 1 class

Alternative model (M8): Beta + ω > 1 class

from crabml import m8a_vs_m8

result = m8a_vs_m8('alignment.fasta', 'tree.nwk')

# This test uses 50:50 mixture of chi-square distributions
print(f"P-value (50:50 mixture): {result.pvalue:.6f}")

Note: This test accounts for the boundary constraint (ω = 1) using a 50:50 mixture of χ² distributions with df=0 and df=1.

Branch-Site Tests

Branch-Site Model A Test

Tests for positive selection on specific phylogenetic lineages.

from crabml import branch_site_test

# Tree with branch labels: #0 = background, #1 = foreground
tree_str = "((human,chimp) #1, (mouse,rat) #0);"

result = branch_site_test(
    alignment='alignment.fasta',
    tree=tree_str
)

print(result.summary())

if result.significant(0.05):
    omega2 = result.alt_params['omega2']
    p2 = result.foreground_positive_proportion
    print(f"Positive selection on foreground branches!")
    print(f"ω₂ = {omega2:.2f} (dN/dS for selected sites)")
    print(f"{p2:.1%} of sites under selection")

Use cases:

Detecting adaptive evolution after gene duplication
Testing for selection on primate-specific lineages
Identifying sites under selection in specific clades

Interpreting results:

The test compares:

Null: ω₂ = 1 on foreground branches
Alternative: ω₂ free to vary on foreground branches

Significant result means some sites experience positive selection (ω > 1) specifically on the foreground lineage.

Branch Model Tests

Multi-Ratio vs One-Ratio

Tests whether different lineages have different average selection pressures.

from crabml import branch_model_test

tree_str = "((human,chimp) #1, (mouse,rat) #0);"

result = branch_model_test(
    alignment='alignment.fasta',
    tree=tree_str
)

if result.significant(0.05):
    omega_fg = result.alt_params['omega1']
    omega_bg = result.alt_params['omega0']
    print(f"Different selection on foreground (ω={omega_fg:.3f}) "
          f"vs background (ω={omega_bg:.3f})")

Free-Ratio Test

Tests whether each branch has a different omega (exploratory).

from crabml import free_ratio_test

result = free_ratio_test('alignment.fasta', 'tree.nwk')

if result.significant(0.05):
    print("Significant omega variation across branches")
    # View branch-specific omegas
    print(result.alt_result.omega_dict)

Warning: This test is highly parameter-rich and prone to overfitting. Use with caution and interpret conservatively.

Understanding the Results

LRTResult Object

All hypothesis tests return an LRTResult object with:

result = m1a_vs_m2a(align, tree)

# Test statistics
result.LRT          # Likelihood ratio test statistic
result.df           # Degrees of freedom
result.pvalue       # P-value from chi-square distribution
result.significant(alpha)  # Boolean, is p-value < alpha?

# Model results
result.null_result  # Full null model result
result.alt_result   # Full alternative model result

# Convenience properties (model-dependent)
result.omega_positive      # ω for positive selection class
result.p_positive          # Proportion under positive selection

# Export
result.to_dict()    # Dictionary representation
result.to_json()    # JSON export
result.summary()    # Formatted text summary

Statistical Considerations

Significance Thresholds

Standard threshold: α = 0.05

More conservative (for multiple testing): α = 0.01 or Bonferroni correction

# Multiple testing correction
n_tests = 5
alpha_bonferroni = 0.05 / n_tests

results = [
    m1a_vs_m2a(align1, tree),
    m1a_vs_m2a(align2, tree),
    m1a_vs_m2a(align3, tree),
    m1a_vs_m2a(align4, tree),
    m1a_vs_m2a(align5, tree),
]

significant = [r for r in results if r.significant(alpha_bonferroni)]
print(f"{len(significant)}/{n_tests} genes with positive selection")

Degrees of Freedom

Different tests have different degrees of freedom:

M1a vs M2a: df = 2 (adds p2 and ω₂)
M7 vs M8: df = 2 (adds p0 and ω_s)
M8a vs M8: df = 1 (only ω changes from 1 to free)
Branch-site: df = 1 (only ω₂ changes from 1 to free)
Multi-ratio vs M0: df = k-1 (k = number of branch labels)
Free-ratio vs M0: df = n-2 (n = number of branches)

Multiple Testing

When testing multiple genes or multiple hypotheses:

Bonferroni correction: Divide α by number of tests
FDR control: Use Benjamini-Hochberg procedure
Permutation tests: For small sample sizes

from scipy.stats import false_discovery_control

pvalues = [result.pvalue for result in results]
significant_fdr = false_discovery_control(pvalues, alpha=0.05)

Power Considerations

Factors affecting power:

Alignment length (more sites = more power)
Number of sequences (more phylogenetic signal)
Strength of selection (larger ω easier to detect)
Proportion of sites under selection

Recommendations:

Minimum ~100 codons for reliable detection
At least 8-10 sequences for good phylogenetic coverage
Branch-site tests need sufficient foreground branch length

Best Practices

Run multiple tests: Both M1a vs M2a AND M7 vs M8
Check convergence: Ensure optimization converged
Visualize results: Plot omega distributions, site-specific posteriors
Biological validation: Check if selected sites make biological sense
Multiple datasets: Test across related genes/species

Publication Checklist

When reporting hypothesis test results:

☐ Report both test names (e.g., “M1a vs M2a”)

☐ Report LRT statistic, df, and p-value

☐ Report parameter estimates for both models

☐ State significance threshold used (α = ?)

☐ Report proportion of sites under selection

☐ Report ω estimate for selection class

☐ Describe biological interpretation

☐ Provide alignment statistics (n sequences, n codons)

Example for Methods Section

We tested for positive selection using crabML v0.2.0 (Kern, 2025),
which implements the models of Yang et al. (2000). We performed
likelihood ratio tests comparing M1a (nearly neutral) vs M2a
(positive selection) and M7 (beta) vs M8 (beta + ω) models.
Significance was assessed using a chi-square distribution with
α = 0.05. For genes showing significant evidence of positive
selection, we report the proportion of sites under selection and
the estimated ω for the positively selected class.

Common Pitfalls

1. Overfitting with small datasets

Don’t use complex models (M8, branch-site) with < 50 codons

2. Ignoring convergence warnings

Always check that optimization converged properly

3. P-hacking

Don’t test many models and only report significant ones

4. Ignoring biological context

Statistical significance ≠ biological significance

5. Wrong null model

Ensure you’re using the appropriate null (M8a for M8, not M7)