Command-Line Interface
crabML provides a streamlined command-line interface for common analyses. The crabml command is automatically installed when you install the package.
Overview
The CLI provides five main commands:
crabml site-model: Run site-class model tests (M1a vs M2a, M7 vs M8)crabml branch-model: Run branch model tests (multi-ratio, free-ratio)crabml branch-site: Run branch-site model testscrabml fit: Fit single codon substitution modelscrabml simulate: Generate synthetic sequences under evolutionary models
All commands support multiple output formats (text, JSON, TSV) and can write results to files.
Getting Help
Use --help to see available options for any command:
crabml --help
crabml site-model --help
crabml branch-model --help
crabml branch-site --help
crabml fit --help
crabml simulate --help
crabml site-model
Run standard likelihood ratio tests for detecting positive selection.
Basic Usage
# Run M7 vs M8 test
crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test m7m8
# Run M1a vs M2a test
crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test m1m2
# Run both tests
crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test both
Output Formats
# Human-readable text (default)
crabml site-model -s alignment.fasta -t tree.nwk --test both
# JSON output (for parsing/pipelines)
crabml site-model -s alignment.fasta -t tree.nwk --test both --format json -o results.json
# TSV output (for Excel/R)
crabml site-model -s alignment.fasta -t tree.nwk --test both --format tsv -o results.tsv
Options
-s, --alignment PATH: Path to alignment file (FASTA or PHYLIP) [required]-t, --tree PATH: Path to tree file (Newick format) [required]--test TYPE: Which test to run:m1m2,m7m8,both, orall[default: both]--format FORMAT: Output format:text,json, ortsv[default: text]--output, -o PATH: Write output to file instead of stdout--maxiter INT: Maximum optimization iterations [default: 500]--alpha FLOAT: Significance threshold for tests [default: 0.05]--no-m0-init: Skip M0 initialization (not recommended)--quiet: Suppress progress output--verbose: Show detailed optimization progress
Example Output (Text Format)
Test 2: M7 (Beta) vs M8 (Beta + positive selection)
--------------------------------------------------------------------------------
Null (M7): lnL = -902.510 parameters = {...}
Alternative (M8): lnL = -899.999 parameters = {...}
Likelihood Ratio Test:
2ΔlnL = 5.02 df = 2 p-value = 0.0812
Result: No significant evidence for positive selection (p > 0.05)
Example Output (JSON Format)
{
"M7_vs_M8": {
"test_name": "M7 vs M8",
"lnL_null": -902.510,
"lnL_alt": -899.999,
"LRT": 5.022,
"pvalue": 0.0812,
"significant": false
}
}
crabml fit
Fit a specific codon substitution model to your data.
Basic Usage
# Fit M0 model
crabml fit -m M0 -s alignment.fasta -t tree.nwk
# Fit M8 with custom settings
crabml fit -m M8 -s alignment.fasta -t tree.nwk --maxiter 1000 --verbose
# Output as JSON
crabml fit -m M2a -s alignment.fasta -t tree.nwk --format json -o m2a_result.json
Supported Models
M0: One-ratio model (single ω for all sites)
M1a: Nearly neutral model (purifying and neutral)
M2a: Positive selection model (purifying, neutral, and positive)
M3: Discrete model (K=3 discrete ω classes)
M7: Beta distribution model (ω constrained to 0-1)
M8: Beta + ω>1 model (positive selection)
M8a: Beta + ω=1 model (null for M8)
Options
-m, --model NAME: Model name [required]-s, --alignment PATH: Path to alignment file (FASTA or PHYLIP) [required]-t, --tree PATH: Path to tree file (Newick format) [required]--format FORMAT: Output format:textorjson[default: text]--output, -o PATH: Write output to file instead of stdout--maxiter INT: Maximum optimization iterations [default: 500]--no-m0-init: Skip M0 initialization (not recommended for complex models)--quiet: Suppress progress output--verbose: Show detailed optimization progress
Example Output
======================================================================
MODEL: M0
======================================================================
Log-likelihood: -906.017441
Number of parameters: 13
PARAMETERS:
kappa (ts/tv) = 4.5402
omega (dN/dS) = 0.8066
TREE:
7 sequences
11 branches (optimized)
======================================================================
crabml branch-model
Test for lineage-specific selection using branch models.
Basic Usage
# Multi-ratio test (different omega for labeled branches)
crabml branch-model -s alignment.fasta -t labeled_tree.nwk --test multi-ratio
# Free-ratio test (independent omega for each branch)
crabml branch-model -s alignment.fasta -t tree.nwk --test free-ratio
Supported Tests
multi-ratio: Different ω for labeled branch groups (recommended) - Tests whether different phylogenetic lineages experience different selection pressures - Tree must have branch labels (#0, #1, etc.) to specify foreground/background - More statistically powerful than free-ratio with fewer parameters
free-ratio: Independent ω for each branch (exploratory) - Estimates one ω per branch in the tree - Highly parameter-rich (n-1 omega parameters for n species) - Prone to overfitting with small datasets - Use with caution
Options
-s, --alignment PATH: Path to alignment file (FASTA or PHYLIP) [required]-t, --tree PATH: Path to tree file (Newick format, with branch labels for multi-ratio) [required]--test TYPE: Which test to run:multi-ratioorfree-ratio[default: multi-ratio]--format FORMAT: Output format:text,json, ortsv[default: text]--output, -o PATH: Write output to file instead of stdout--maxiter INT: Maximum optimization iterations [default: 1000]--alpha FLOAT: Significance threshold for test [default: 0.05]--quiet: Suppress progress output--verbose: Show detailed optimization progress
Tree Format with Branch Labels
For multi-ratio tests, the tree must have branch labels:
((human,chimp) #1, (mouse,rat) #0);
#0: Background branches#1: Foreground branches
Example Output
================================================================================
Branch Model Test Results
================================================================================
Test: Multi-ratio vs M0
--------------------------------------------------------------------------------
Null (M0): lnL = -906.017 parameters = {'omega': 0.807}
Alternative (Multi): lnL = -903.245 parameters = {'omega0': 0.654, 'omega1': 1.234}
Likelihood Ratio Test:
2ΔlnL = 5.54 df = 1 p-value = 0.0186
Result: LINEAGE-SPECIFIC SELECTION DETECTED (p < 0.05)
Background ω = 0.654
Foreground ω = 1.234
Foreground is 1.9x faster evolving
crabml branch-site
Test for positive selection on specific lineages using branch-site Model A.
Basic Usage
# Tree must have branch labels: #0 (background), #1 (foreground)
crabml branch-site -s alignment.fasta -t labeled_tree.nwk
# With custom settings
crabml branch-site -s alignment.fasta -t labeled_tree.nwk --maxiter 1000 --alpha 0.01
# Output as JSON
crabml branch-site -s alignment.fasta -t labeled_tree.nwk --format json -o results.json
Tree Format with Branch Labels
The tree must have branch labels to specify foreground and background branches:
((human,chimp) #1, (mouse,rat) #0);
#0: Background branches (standard selection)#1: Foreground branches (test for positive selection)
Options
-s, --alignment PATH: Path to alignment file (FASTA or PHYLIP) [required]-t, --tree PATH: Path to tree file with branch labels (Newick format) [required]--format FORMAT: Output format:text,json, ortsv[default: text]--output, -o PATH: Write output to file instead of stdout--maxiter INT: Maximum optimization iterations [default: 500]--alpha FLOAT: Significance threshold for test [default: 0.05]--quiet: Suppress progress output--verbose: Show detailed optimization progress
crabml simulate
Generate synthetic codon sequences under various evolutionary models. Useful for validation, power analysis, and benchmarking.
Basic Usage
# M0: Single omega model
crabml simulate m0 -t tree.nwk -o sim.fasta -l 1000 --omega 0.3
# M2a: Positive selection model
crabml simulate m2a -t tree.nwk -o sim.fasta -l 1000 \
--p0 0.5 --p1 0.3 --omega0 0.1 --omega2 2.5
# M7: Beta distribution
crabml simulate m7 -t tree.nwk -o sim.fasta -l 1000 --p 2 --q 5
# M8: Beta + positive selection
crabml simulate m8 -t tree.nwk -o sim.fasta -l 1000 \
--p0 0.8 --p 2 --q 5 --omega-s 2.5
Available Models
m0: Single omega across all sitesm1a: Nearly neutral (purifying + neutral)m2a: Positive selection (purifying + neutral + positive)m7: Beta distribution for omega in (0,1)m8: Beta distribution + positive selection class
Common Options
-t, --tree PATH: Input tree file (Newick format with branch lengths) [required]-o, --output PATH: Output FASTA file [required]-l, --length INT: Sequence length in codons [required]--kappa FLOAT: Transition/transversion ratio [default: 2.0]-r, --replicates INT: Number of replicates to simulate [default: 1]--seed INT: Random seed for reproducibility-q, --quiet: Suppress progress messages
Multiple Replicates
# Simulate 10 replicates
crabml simulate m2a -t tree.nwk -o sim.fasta -l 500 \
--p0 0.5 --p1 0.3 --omega0 0.1 --omega2 2.5 -r 10
# Creates: sim_rep1.fasta, sim_rep2.fasta, ..., sim_rep10.fasta
Outputs
For all models:
FASTA file with simulated sequences
<output>.params.json: Parameters used for simulation
For M2a and M8 (positive selection models):
<output>.site_classes.txt: Site class assignments<output>.positive_sites.txt: Sites under positive selection (omega > 1)
For detailed documentation on simulation parameters and workflows, see Sequence Simulation.
Integration with Pipelines
The CLI is designed to work well in pipelines and scripts:
JSON Output for Parsing
# Run test and parse with jq
crabml site-model -s alignment.fasta -t tree.nwk --format json | jq '.M7_vs_M8.pvalue'
# Save JSON for later analysis
crabml fit -m M0 -s alignment.fasta -t tree.nwk --format json -o results.json
TSV Output for Spreadsheets
# Generate TSV for multiple genes
for gene in gene1 gene2 gene3; do
crabml site-model -s ${gene}.fasta -t ${gene}.nwk --format tsv --quiet
done > all_results.tsv
Exit Codes
0: Success1: Analysis error (e.g., optimization failed, invalid model)2: Argument error (e.g., missing file, invalid options)
Batch Processing
#!/bin/bash
# Process multiple alignments
for alignment in *.fasta; do
gene=$(basename $alignment .fasta)
echo "Processing $gene..."
crabml site-model \
-s $alignment \
-t ${gene}.nwk \
--test both \
--format json \
-o ${gene}_results.json \
--quiet
if [ $? -eq 0 ]; then
echo " Success!"
else
echo " Failed!"
fi
done
Tips and Best Practices
Use JSON for pipelines: The JSON output format is ideal for parsing and integrating with other tools.
Always specify output files: Use
-oto write results to files rather than relying on stdout redirection, especially in complex pipelines.Start with default settings: The default settings (M0 initialization, 500 iterations) work well for most datasets.
Use quiet mode for batch jobs: Add
--quietwhen processing many files to reduce log output.Check exit codes: In scripts, always check the exit code to detect failures.
Increase maxiter for complex models: Models like M8 on large datasets may need more iterations. Try
--maxiter 1000if optimization doesn’t converge.