Input/Output

Classes for reading and writing sequence alignments and phylogenetic trees.

Sequence Alignment

Alignment

class crabml.io.sequences.Alignment(names, sequences, n_species, n_sites, seqtype)[source]

Multiple sequence alignment.

Attributes

nameslist[str]

Sequence names/labels

sequencesndarray, shape (n_species, n_sites)

Encoded sequences as integer arrays

n_speciesint

Number of sequences

n_sitesint

Number of sites (alignment length)

seqtypestr

Sequence type (‘codon’, ‘aa’, ‘dna’)

names: list[str]
sequences: ndarray
n_species: int
n_sites: int
seqtype: str
classmethod from_phylip(filepath, seqtype='codon')[source]

Parse PHYLIP format alignment file.

Custom parser for PAML-style PHYLIP format (sequential). The first line contains n_sequences and sequence_length. Each sequence starts with a name line, followed by sequence data.

Return type:

Alignment

Parameters
filepathPath or str

Path to PHYLIP format file

seqtypestr

Sequence type: ‘codon’, ‘aa’, or ‘dna’

Returns
Alignment

Parsed alignment

Examples
>>> aln = Alignment.from_phylip("lysozyme.txt", seqtype='codon')
>>> aln.n_species
7
>>> aln.n_sites
130
classmethod from_fasta(filepath, seqtype='codon')[source]

Parse FASTA format alignment file.

Return type:

Alignment

Parameters
filepathPath or str

Path to FASTA format file

seqtypestr

Sequence type: ‘codon’, ‘aa’, or ‘dna’

Returns
Alignment

Parsed alignment

Examples
>>> aln = Alignment.from_fasta("alignment.fasta", seqtype='codon')
to_phylip(filepath)[source]

Write alignment to PHYLIP format file.

Return type:

None

Parameters
filepathPath or str

Output file path

to_fasta(filepath)[source]

Write alignment to FASTA format file.

Return type:

None

Parameters
filepathPath or str

Output file path

__init__(names, sequences, n_species, n_sites, seqtype)

Phylogenetic Trees

Tree

class crabml.io.trees.Tree(root, n_nodes, n_leaves, leaf_names)[source]

Phylogenetic tree.

Attributes

rootTreeNode

Root node of the tree

n_nodesint

Total number of nodes

n_leavesint

Number of leaf nodes

leaf_nameslist[str]

Names of leaf nodes

root: TreeNode
n_nodes: int
n_leaves: int
leaf_names: list[str]
classmethod from_newick(newick_string)[source]

Parse Newick format tree string.

Return type:

Tree

Parameters
newick_stringstr

Newick format tree

Returns
Tree

Parsed tree

postorder()[source]

Return nodes in post-order traversal (leaves to root).

Return type:

list[TreeNode]

Returns
list[TreeNode]

Nodes in post-order

get_branches()[source]

Get all branches as (parent, child) pairs.

Return type:

list[tuple[TreeNode, TreeNode]]

Returns
list[tuple[TreeNode, TreeNode]]

List of (parent, child) tuples for each branch

get_branch_labels()[source]

Get integer branch labels for branch-site models.

Converts string labels like ‘#0’, ‘#1’ to integers. Branches without labels are assigned 0 (background).

Return type:

list[int]

Returns
list[int]

Branch labels as integers (0=background, 1=foreground, etc.)

validate_branch_site_labels()[source]

Validate branch labels for branch-site models.

Branch-site models (Model A, A1) require exactly 2 label types: - 0 (background) - 1 (foreground)

Return type:

None

Raises
ValueError

If labels are not valid for branch-site models

to_newick()[source]

Convert tree to Newick format string.

Return type:

str

Returns
str

Tree in Newick format

__init__(root, n_nodes, n_leaves, leaf_names)