Quick Start Guide

This guide will get you up and running with crabML in minutes.

crabML can be used either via the command-line interface or as a Python library.

Command-Line Interface

The fastest way to get started is with the crabml command:

# Site-class model tests (positive selection)
crabml site-model -s alignment.fasta -t tree.nwk --test both

# Branch model tests (lineage-specific selection)
crabml branch-model -s alignment.fasta -t labeled_tree.nwk --test multi-ratio

# Branch-site model test (site + lineage selection)
crabml branch-site -s alignment.fasta -t labeled_tree.nwk

# Fit a single model
crabml fit -m M0 -s alignment.fasta -t tree.nwk

For complete CLI documentation, see Command-Line Interface.

Python Library

Your First Analysis

Let’s fit a simple M0 (one-ratio) model to an alignment:

from crabml import optimize_model

# Fit M0 model
result = optimize_model("M0", "alignment.fasta", "tree.nwk")

# View results
print(result.summary())

# Access specific parameters
print(f"omega (dN/dS) = {result.omega:.4f}")
print(f"kappa (ts/tv) = {result.kappa:.4f}")
print(f"log-likelihood = {result.lnL:.2f}")

The output will look like:

======================================================================
MODEL: M0
======================================================================

Log-likelihood:       -906.017441
Number of parameters: 13

PARAMETERS:
  kappa (ts/tv) = 4.5402
  omega (dN/dS) = 0.8066

TREE:
  7 sequences
  11 branches (optimized)

======================================================================

Testing for Positive Selection

The most common analysis is testing for positive selection. crabML makes this easy:

from crabml import positive_selection

# Run both standard tests
results = positive_selection(
    alignment='alignment.fasta',
    tree='tree.nwk',
    test='both'  # Runs M1a vs M2a and M7 vs M8
)

# Check M1a vs M2a test
m1a_m2a = results['M1a_vs_M2a']
print(m1a_m2a.summary())

if m1a_m2a.significant(0.05):
    print("Positive selection detected!")
    print(f"ω for positively selected sites: {m1a_m2a.omega_positive:.2f}")

Individual Tests

You can also run individual tests:

from crabml import m1a_vs_m2a, m7_vs_m8

# M1a (nearly neutral) vs M2a (positive selection)
result = m1a_vs_m2a('alignment.fasta', 'tree.nwk')
print(f"P-value: {result.pvalue:.6f}")

# M7 (beta distribution) vs M8 (beta + omega > 1)
result = m7_vs_m8('alignment.fasta', 'tree.nwk')
print(f"LRT statistic: {result.LRT:.2f}")

Working with Different Model Types

Site-Class Models

Site-class models allow omega (dN/dS) to vary across sites:

from crabml import optimize_model

# Simple models
m0 = optimize_model("M0", align, tree)  # One omega for all sites

# Models for testing positive selection
m1a = optimize_model("M1a", align, tree)  # Nearly neutral
m2a = optimize_model("M2a", align, tree)  # Positive selection

# Beta distribution models
m7 = optimize_model("M7", align, tree)   # Beta (omega < 1)
m8 = optimize_model("M8", align, tree)   # Beta + omega > 1

# Access site class information
print(f"Site classes: {m2a.n_site_classes}")
print(f"Omega values: {m2a.omegas}")
print(f"Proportions: {m2a.proportions}")

Branch Models

Branch models allow omega to vary across lineages:

from crabml import optimize_branch_model

# Tree with branch labels: #0 = background, #1 = foreground
tree_str = "((human,chimp) #1, (mouse,rat) #0);"

# Multi-ratio model (recommended)
result = optimize_branch_model("multi-ratio", align, tree_str)
print(f"Primate omega: {result.foreground_omega:.3f}")
print(f"Rodent omega: {result.background_omega:.3f}")

# Free-ratio model (exploratory)
result = optimize_branch_model("free-ratio", align, tree)
print(result.omega_dict)  # All branch-specific omegas

Branch-Site Models

Branch-site models detect positive selection on specific sites and lineages:

from crabml import optimize_branch_site_model

tree_str = "((human,chimp) #1, (mouse,rat) #0);"

# Alternative model (omega2 free)
alt = optimize_branch_site_model("model-a", align, tree_str)
print(f"Positive selection omega: {alt.omega2:.3f}")
print(f"Sites under selection: {alt.foreground_positive_proportion:.1%}")

# Null model (omega2 = 1) for hypothesis testing
null = optimize_branch_site_model("model-a", align, tree_str, fix_omega=True)

File Formats

crabML automatically detects file formats:

Alignments:

  • FASTA format (.fa, .fasta)

  • PHYLIP format (.phy)

Trees:

  • Newick format in file (.nwk, .tree)

  • Newick string directly in code

Example:

# All of these work:
result = optimize_model("M0", "data.fasta", "tree.nwk")
result = optimize_model("M0", "data.phy", "tree.tree")
result = optimize_model("M0", "data.fa", "((A,B),(C,D));")

Exporting Results

Results can be exported to various formats:

result = optimize_model("M2a", align, tree)

# Dictionary
data = result.to_dict()

# JSON file
result.to_json("results.json")

# Print summary
print(result.summary())

What’s Next?