Command-Line Interface
======================

crabML provides a streamlined command-line interface for common analyses. The ``crabml`` command is automatically installed when you install the package.

Overview
--------

The CLI provides five main commands:

* ``crabml site-model``: Run site-class model tests (M1a vs M2a, M7 vs M8)
* ``crabml branch-model``: Run branch model tests (multi-ratio, free-ratio)
* ``crabml branch-site``: Run branch-site model tests
* ``crabml fit``: Fit single codon substitution models
* ``crabml simulate``: Generate synthetic sequences under evolutionary models

All commands support multiple output formats (text, JSON, TSV) and can write results to files.

Getting Help
------------

Use ``--help`` to see available options for any command:

.. code-block:: bash

   crabml --help
   crabml site-model --help
   crabml branch-model --help
   crabml branch-site --help
   crabml fit --help
   crabml simulate --help

crabml site-model
-----------------

Run standard likelihood ratio tests for detecting positive selection.

Basic Usage
^^^^^^^^^^^

.. code-block:: bash

   # Run M7 vs M8 test
   crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test m7m8

   # Run M1a vs M2a test
   crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test m1m2

   # Run both tests
   crabml site-model -s lysozyme.fasta -t lysozyme.nwk --test both

Output Formats
^^^^^^^^^^^^^^

.. code-block:: bash

   # Human-readable text (default)
   crabml site-model -s alignment.fasta -t tree.nwk --test both

   # JSON output (for parsing/pipelines)
   crabml site-model -s alignment.fasta -t tree.nwk --test both --format json -o results.json

   # TSV output (for Excel/R)
   crabml site-model -s alignment.fasta -t tree.nwk --test both --format tsv -o results.tsv

Options
^^^^^^^

* ``-s, --alignment PATH``: Path to alignment file (FASTA or PHYLIP) [required]
* ``-t, --tree PATH``: Path to tree file (Newick format) [required]
* ``--test TYPE``: Which test to run: ``m1m2``, ``m7m8``, ``both``, or ``all`` [default: both]
* ``--format FORMAT``: Output format: ``text``, ``json``, or ``tsv`` [default: text]
* ``--output, -o PATH``: Write output to file instead of stdout
* ``--maxiter INT``: Maximum optimization iterations [default: 500]
* ``--alpha FLOAT``: Significance threshold for tests [default: 0.05]
* ``--no-m0-init``: Skip M0 initialization (not recommended)
* ``--quiet``: Suppress progress output
* ``--verbose``: Show detailed optimization progress

Example Output (Text Format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: text

   Test 2: M7 (Beta) vs M8 (Beta + positive selection)
   --------------------------------------------------------------------------------
   Null (M7):           lnL = -902.510    parameters = {...}
   Alternative (M8):    lnL = -899.999    parameters = {...}

   Likelihood Ratio Test:
     2ΔlnL = 5.02    df = 2    p-value = 0.0812

   Result: No significant evidence for positive selection (p > 0.05)

Example Output (JSON Format)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: json

   {
     "M7_vs_M8": {
       "test_name": "M7 vs M8",
       "lnL_null": -902.510,
       "lnL_alt": -899.999,
       "LRT": 5.022,
       "pvalue": 0.0812,
       "significant": false
     }
   }

crabml fit
----------

Fit a specific codon substitution model to your data.

Basic Usage
^^^^^^^^^^^

.. code-block:: bash

   # Fit M0 model
   crabml fit -m M0 -s alignment.fasta -t tree.nwk

   # Fit M8 with custom settings
   crabml fit -m M8 -s alignment.fasta -t tree.nwk --maxiter 1000 --verbose

   # Output as JSON
   crabml fit -m M2a -s alignment.fasta -t tree.nwk --format json -o m2a_result.json

Supported Models
^^^^^^^^^^^^^^^^

* **M0**: One-ratio model (single ω for all sites)
* **M1a**: Nearly neutral model (purifying and neutral)
* **M2a**: Positive selection model (purifying, neutral, and positive)
* **M3**: Discrete model (K=3 discrete ω classes)
* **M7**: Beta distribution model (ω constrained to 0-1)
* **M8**: Beta + ω>1 model (positive selection)
* **M8a**: Beta + ω=1 model (null for M8)

Options
^^^^^^^

* ``-m, --model NAME``: Model name [required]
* ``-s, --alignment PATH``: Path to alignment file (FASTA or PHYLIP) [required]
* ``-t, --tree PATH``: Path to tree file (Newick format) [required]
* ``--format FORMAT``: Output format: ``text`` or ``json`` [default: text]
* ``--output, -o PATH``: Write output to file instead of stdout
* ``--maxiter INT``: Maximum optimization iterations [default: 500]
* ``--no-m0-init``: Skip M0 initialization (not recommended for complex models)
* ``--quiet``: Suppress progress output
* ``--verbose``: Show detailed optimization progress

Example Output
^^^^^^^^^^^^^^

.. code-block:: text

   ======================================================================
   MODEL: M0
   ======================================================================

   Log-likelihood:       -906.017441
   Number of parameters: 13

   PARAMETERS:
     kappa (ts/tv) = 4.5402
     omega (dN/dS) = 0.8066

   TREE:
     7 sequences
     11 branches (optimized)

   ======================================================================

crabml branch-model
-------------------

Test for lineage-specific selection using branch models.

Basic Usage
^^^^^^^^^^^

.. code-block:: bash

   # Multi-ratio test (different omega for labeled branches)
   crabml branch-model -s alignment.fasta -t labeled_tree.nwk --test multi-ratio

   # Free-ratio test (independent omega for each branch)
   crabml branch-model -s alignment.fasta -t tree.nwk --test free-ratio

Supported Tests
^^^^^^^^^^^^^^^

* **multi-ratio**: Different ω for labeled branch groups (recommended)
  - Tests whether different phylogenetic lineages experience different selection pressures
  - Tree must have branch labels (#0, #1, etc.) to specify foreground/background
  - More statistically powerful than free-ratio with fewer parameters

* **free-ratio**: Independent ω for each branch (exploratory)
  - Estimates one ω per branch in the tree
  - Highly parameter-rich (n-1 omega parameters for n species)
  - Prone to overfitting with small datasets
  - Use with caution

Options
^^^^^^^

* ``-s, --alignment PATH``: Path to alignment file (FASTA or PHYLIP) [required]
* ``-t, --tree PATH``: Path to tree file (Newick format, with branch labels for multi-ratio) [required]
* ``--test TYPE``: Which test to run: ``multi-ratio`` or ``free-ratio`` [default: multi-ratio]
* ``--format FORMAT``: Output format: ``text``, ``json``, or ``tsv`` [default: text]
* ``--output, -o PATH``: Write output to file instead of stdout
* ``--maxiter INT``: Maximum optimization iterations [default: 1000]
* ``--alpha FLOAT``: Significance threshold for test [default: 0.05]
* ``--quiet``: Suppress progress output
* ``--verbose``: Show detailed optimization progress

Tree Format with Branch Labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For multi-ratio tests, the tree must have branch labels:

.. code-block:: text

   ((human,chimp) #1, (mouse,rat) #0);

* ``#0``: Background branches
* ``#1``: Foreground branches

Example Output
^^^^^^^^^^^^^^

.. code-block:: text

   ================================================================================
   Branch Model Test Results
   ================================================================================

   Test: Multi-ratio vs M0
   --------------------------------------------------------------------------------
   Null (M0):              lnL = -906.017    parameters = {'omega': 0.807}
   Alternative (Multi):    lnL = -903.245    parameters = {'omega0': 0.654, 'omega1': 1.234}

   Likelihood Ratio Test:
     2ΔlnL = 5.54    df = 1    p-value = 0.0186

   Result: LINEAGE-SPECIFIC SELECTION DETECTED (p < 0.05)
     Background ω = 0.654
     Foreground ω = 1.234
     Foreground is 1.9x faster evolving

crabml branch-site
------------------

Test for positive selection on specific lineages using branch-site Model A.

Basic Usage
^^^^^^^^^^^

.. code-block:: bash

   # Tree must have branch labels: #0 (background), #1 (foreground)
   crabml branch-site -s alignment.fasta -t labeled_tree.nwk

   # With custom settings
   crabml branch-site -s alignment.fasta -t labeled_tree.nwk --maxiter 1000 --alpha 0.01

   # Output as JSON
   crabml branch-site -s alignment.fasta -t labeled_tree.nwk --format json -o results.json

Tree Format with Branch Labels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The tree must have branch labels to specify foreground and background branches:

.. code-block:: text

   ((human,chimp) #1, (mouse,rat) #0);

* ``#0``: Background branches (standard selection)
* ``#1``: Foreground branches (test for positive selection)

Options
^^^^^^^

* ``-s, --alignment PATH``: Path to alignment file (FASTA or PHYLIP) [required]
* ``-t, --tree PATH``: Path to tree file with branch labels (Newick format) [required]
* ``--format FORMAT``: Output format: ``text``, ``json``, or ``tsv`` [default: text]
* ``--output, -o PATH``: Write output to file instead of stdout
* ``--maxiter INT``: Maximum optimization iterations [default: 500]
* ``--alpha FLOAT``: Significance threshold for test [default: 0.05]
* ``--quiet``: Suppress progress output
* ``--verbose``: Show detailed optimization progress

crabml simulate
---------------

Generate synthetic codon sequences under various evolutionary models. Useful for validation, power analysis, and benchmarking.

Basic Usage
^^^^^^^^^^^

.. code-block:: bash

   # M0: Single omega model
   crabml simulate m0 -t tree.nwk -o sim.fasta -l 1000 --omega 0.3

   # M2a: Positive selection model
   crabml simulate m2a -t tree.nwk -o sim.fasta -l 1000 \
       --p0 0.5 --p1 0.3 --omega0 0.1 --omega2 2.5

   # M7: Beta distribution
   crabml simulate m7 -t tree.nwk -o sim.fasta -l 1000 --p 2 --q 5

   # M8: Beta + positive selection
   crabml simulate m8 -t tree.nwk -o sim.fasta -l 1000 \
       --p0 0.8 --p 2 --q 5 --omega-s 2.5

Available Models
^^^^^^^^^^^^^^^^

* ``m0``: Single omega across all sites
* ``m1a``: Nearly neutral (purifying + neutral)
* ``m2a``: Positive selection (purifying + neutral + positive)
* ``m7``: Beta distribution for omega in (0,1)
* ``m8``: Beta distribution + positive selection class

Common Options
^^^^^^^^^^^^^^

* ``-t, --tree PATH``: Input tree file (Newick format with branch lengths) [required]
* ``-o, --output PATH``: Output FASTA file [required]
* ``-l, --length INT``: Sequence length in codons [required]
* ``--kappa FLOAT``: Transition/transversion ratio [default: 2.0]
* ``-r, --replicates INT``: Number of replicates to simulate [default: 1]
* ``--seed INT``: Random seed for reproducibility
* ``-q, --quiet``: Suppress progress messages

Multiple Replicates
^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   # Simulate 10 replicates
   crabml simulate m2a -t tree.nwk -o sim.fasta -l 500 \
       --p0 0.5 --p1 0.3 --omega0 0.1 --omega2 2.5 -r 10

   # Creates: sim_rep1.fasta, sim_rep2.fasta, ..., sim_rep10.fasta

Outputs
^^^^^^^

For all models:

* FASTA file with simulated sequences
* ``<output>.params.json``: Parameters used for simulation

For M2a and M8 (positive selection models):

* ``<output>.site_classes.txt``: Site class assignments
* ``<output>.positive_sites.txt``: Sites under positive selection (omega > 1)

For detailed documentation on simulation parameters and workflows, see :doc:`simulation`.

Integration with Pipelines
---------------------------

The CLI is designed to work well in pipelines and scripts:

JSON Output for Parsing
^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   # Run test and parse with jq
   crabml site-model -s alignment.fasta -t tree.nwk --format json | jq '.M7_vs_M8.pvalue'

   # Save JSON for later analysis
   crabml fit -m M0 -s alignment.fasta -t tree.nwk --format json -o results.json

TSV Output for Spreadsheets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   # Generate TSV for multiple genes
   for gene in gene1 gene2 gene3; do
     crabml site-model -s ${gene}.fasta -t ${gene}.nwk --format tsv --quiet
   done > all_results.tsv

Exit Codes
^^^^^^^^^^

* ``0``: Success
* ``1``: Analysis error (e.g., optimization failed, invalid model)
* ``2``: Argument error (e.g., missing file, invalid options)

Batch Processing
^^^^^^^^^^^^^^^^

.. code-block:: bash

   #!/bin/bash
   # Process multiple alignments

   for alignment in *.fasta; do
     gene=$(basename $alignment .fasta)
     echo "Processing $gene..."

     crabml site-model \
       -s $alignment \
       -t ${gene}.nwk \
       --test both \
       --format json \
       -o ${gene}_results.json \
       --quiet

     if [ $? -eq 0 ]; then
       echo "  Success!"
     else
       echo "  Failed!"
     fi
   done

Tips and Best Practices
------------------------

1. **Use JSON for pipelines**: The JSON output format is ideal for parsing and integrating with other tools.

2. **Always specify output files**: Use ``-o`` to write results to files rather than relying on stdout redirection, especially in complex pipelines.

3. **Start with default settings**: The default settings (M0 initialization, 500 iterations) work well for most datasets.

4. **Use quiet mode for batch jobs**: Add ``--quiet`` when processing many files to reduce log output.

5. **Check exit codes**: In scripts, always check the exit code to detect failures.

6. **Increase maxiter for complex models**: Models like M8 on large datasets may need more iterations. Try ``--maxiter 1000`` if optimization doesn't converge.