Model Guide
crabML implements 15 codon substitution models for detecting natural selection. All models are fully validated against PAML reference outputs.
Model Categories
Models are organized into three categories based on how omega (dN/dS) varies:
Site-class models: omega varies across sites but not branches
Branch models: omega varies across branches but not sites
Branch-site models: omega varies across both sites and branches
Site-Class Models
These models allow the ratio of non-synonymous to synonymous substitutions (ω = dN/dS) to vary across codon sites.
M0: One-Ratio Model
The simplest model with a single omega for all sites.
from crabml import optimize_model
result = optimize_model("M0", "alignment.fasta", "tree.nwk")
print(f"omega = {result.omega:.4f}")
Use case: Baseline model, average selection across gene
Parameters: - kappa (transition/transversion ratio) - omega (dN/dS)
M1a: Nearly Neutral
Two site classes: purifying selection (ω₀ < 1) and neutral evolution (ω₁ = 1).
result = optimize_model("M1a", "alignment.fasta", "tree.nwk")
print(f"p0 = {result.params['p0']:.3f} (proportion ω < 1)")
print(f"omega0 = {result.params['omega0']:.3f}")
Use case: Null model for M1a vs M2a test
Parameters: - kappa - p0 (proportion of sites with ω₀) - omega0 (ω for purifying selection, 0 < ω₀ < 1)
M2a: Positive Selection
Three site classes: purifying (ω₀ < 1), neutral (ω₁ = 1), and positive selection (ω₂ > 1).
result = optimize_model("M2a", "alignment.fasta", "tree.nwk")
print(f"Omegas: {result.omegas}") # [omega0, 1.0, omega2]
print(f"Proportions: {result.proportions}") # [p0, p1, p2]
if result.params['omega2'] > 1:
print(f"Positive selection detected: ω₂ = {result.params['omega2']:.2f}")
Use case: Detect positive selection (alternative to M1a)
Parameters: - kappa - p0, p1 (proportions, p2 = 1 - p0 - p1) - omega0, omega2
M3: Discrete Model
K discrete omega categories with estimated proportions.
result = optimize_model("M3", "alignment.fasta", "tree.nwk", K=3)
print(f"Omegas: {result.omegas}")
print(f"Proportions: {result.proportions}")
Use case: Exploratory analysis of omega variation
Parameters: - kappa - K omega values - K-1 proportion parameters
M7: Beta Distribution
Beta distribution for omega constrained to (0, 1).
result = optimize_model("M7", "alignment.fasta", "tree.nwk")
print(f"Beta parameters: p={result.params['p']:.2f}, q={result.params['q']:.2f}")
Use case: Null model for M7 vs M8 test (no positive selection)
Parameters: - kappa - p, q (beta distribution shape parameters)
M8: Beta + Omega
Beta distribution (0 < ω < 1) plus an additional class allowing ω > 1.
result = optimize_model("M8", "alignment.fasta", "tree.nwk")
if result.params['omega_s'] > 1:
p_sel = result.params['p1']
omega_sel = result.params['omega_s']
print(f"{p_sel:.1%} of sites with ω = {omega_sel:.2f}")
Use case: Detect positive selection (alternative to M7)
Parameters: - kappa - p, q (beta distribution) - p0 (proportion in beta) - omega_s (omega for selection class)
M8a: Beta + Omega = 1
Like M8 but with the additional class fixed at ω = 1.
result = optimize_model("M8a", "alignment.fasta", "tree.nwk")
Use case: Null model for M8a vs M8 test
Parameters: - kappa - p, q (beta distribution) - p0 (proportion in beta)
Additional Site Models
M4 (Frequencies): Five fixed omegas with variable proportions
M5 (Gamma): Gamma distribution for omega
M6 (2Gamma): Mixture of two gamma distributions
M9 (Beta & Gamma): Mixture of beta and gamma distributions
m4 = optimize_model("M4", align, tree)
m5 = optimize_model("M5", align, tree)
m6 = optimize_model("M6", align, tree)
m9 = optimize_model("M9", align, tree)
Branch Models
Branch models allow omega to vary across phylogenetic lineages.
Free-Ratio Model
Estimates independent omega for each branch in the tree.
from crabml import optimize_branch_model
result = optimize_branch_model("free-ratio", "alignment.fasta", "tree.nwk")
# View all branch-specific omegas
for branch, omega in result.omega_dict.items():
print(f"{branch}: ω = {omega:.3f}")
Use case: Exploratory analysis of lineage-specific selection
Warning: Highly parameter-rich, prone to overfitting with small datasets
Parameters: One omega per branch (n-1 for n species)
Multi-Ratio Model
Different omega for labeled branch groups.
# Label branches in tree: #0 = background, #1 = foreground
tree_str = "((human,chimp) #1, (mouse,rat) #0);"
result = optimize_branch_model("multi-ratio", align, tree_str)
print(f"Foreground ω (primates): {result.foreground_omega:.3f}")
print(f"Background ω (rodents): {result.background_omega:.3f}")
Use case: Test for lineage-specific selection (recommended over free-ratio)
Parameters: One omega per unique branch label
Branch-Site Models
Branch-site models combine site variation with lineage-specific effects.
Branch-Site Model A
Four site classes with different omega on foreground vs background branches:
Class 0: Conserved (ω₀ < 1) on all branches
Class 1: Neutral (ω = 1) on all branches
Class 2a: Conserved on background, positive selection (ω₂ > 1) on foreground
Class 2b: Neutral on background, positive selection on foreground
from crabml import optimize_branch_site_model
tree_str = "((human,chimp) #1, (mouse,rat) #0);"
# Alternative model (ω₂ free)
alt = optimize_branch_site_model("model-a", align, tree_str)
print(f"ω₀ (conserved): {alt.omega0:.3f}")
print(f"ω₂ (positive selection): {alt.omega2:.3f}")
print(f"Sites under selection: {alt.foreground_positive_proportion:.1%}")
print(f"Site class proportions: {alt.proportions}")
Use case: Detect positive selection on specific lineages
Parameters: - kappa - omega0, omega2 - p0, p1 (proportions)
Branch-Site Model A Null
Same as Model A but with ω₂ fixed to 1 for hypothesis testing.
null = optimize_branch_site_model("model-a", align, tree_str, fix_omega=True)
assert null.omega2 == 1.0 # Fixed at neutral
Use case: Null model for branch-site test
Model Selection
Choosing the Right Model
For detecting positive selection:
Start with standard tests: - M1a vs M2a (site-class test) - M7 vs M8 (more conservative)
If you suspect lineage-specific selection: - Branch-site Model A
For exploratory analysis: - M3 (discrete omegas) - Free-ratio (branch-specific, use with caution)
For estimating average selection:
M0 provides overall omega across gene
For detailed omega distribution:
M7/M8 model beta distribution
M3 estimates discrete categories
Computational Considerations
Speed (fastest to slowest):
M0 (one class)
M1a, M7 (two classes / beta)
M2a, M8, M8a (three classes)
M3, M9 (K classes)
Branch models (many omegas)
Branch-site models (complex structure)
Sample size requirements:
M0, M1a, M2a: Work with small datasets (< 10 sequences)
M7, M8: Benefit from moderate datasets (10-50 sequences)
Branch models: Need sufficient phylogenetic diversity
Branch-site models: Need both sequence diversity and alignment length
Model Comparison
Comparing Results
from crabml import compare_results
m0 = optimize_model("M0", align, tree)
m1a = optimize_model("M1a", align, tree)
m2a = optimize_model("M2a", align, tree)
# Compare models
comparison = compare_results([m0, m1a, m2a])
print(comparison)
This shows log-likelihoods, parameters, and omega estimates side-by-side.
PAML Compatibility
All crabML models are validated against PAML:
Log-likelihoods match within 0.01 units
Parameter estimates match within 1% relative error
Same model parameterizations as PAML
Tested on multiple diverse datasets
You can directly compare crabML results with PAML output files.