picasso.CloneTree module
CloneTree: Phylogenetic tree analysis and visualization for PICASSO results.
This module provides the CloneTree class for integrating phylogenetic trees with clone assignments and CNA data. It enables comprehensive analysis and visualization of phylogenetic reconstruction results, with specific support for noisy scRNA-seq- inferred CNA data patterns.
Classes
- CloneTree
Integrates phylogenetic trees, clone assignments, and CNA profiles for comprehensive analysis and visualization of tumor evolution patterns.
Examples
Basic usage with PICASSO results:
>>> from picasso import Picasso, CloneTree, load_data
>>>
>>> # Load example data and run PICASSO phylogenetic inference
>>> cna_data = load_data()
>>> picasso = Picasso(cna_data)
>>> picasso.fit()
>>>
>>> # Create CloneTree for analysis and visualization
>>> phylogeny = picasso.get_phylogeny()
>>> assignments = picasso.get_clone_assignments()
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data)
>>>
>>> # Generate visualizations
>>> clone_tree.plot_alterations(save_as='heatmap.pdf')
>>> clone_tree.plot_clone_sizes(save_as='sizes.pdf')
Notes
The CloneTree class is designed to handle: - Integration of phylogenetic trees with cellular data - Aggregation of noisy CNA profiles by clone - Visualization of clonal evolution patterns - Export to publication-ready formats
See also
PicassoMain phylogenetic inference algorithm
itol_utilsFunctions for iTOL visualization export
utilsData preprocessing utilities
- class picasso.CloneTree.CloneTree(phylogeny, clone_assignments, character_matrix, clone_aggregation='mode', metadata=None)[source]
Bases:
object- __init__(phylogeny, clone_assignments, character_matrix, clone_aggregation='mode', metadata=None)[source]
Initialize a CloneTree for analysis and visualization of phylogenetic reconstruction results.
CloneTree integrates phylogenetic trees from PICASSO with clone assignments and CNA data to provide comprehensive analysis and visualization capabilities. It handles the aggregation of noisy scRNA-seq-inferred CNA profiles by clone and supports various downstream analyses.
- Parameters:
phylogeny (ete3.Tree) – The phylogenetic tree with terminal clones as leaves, typically obtained from the PICASSO model via get_phylogeny(). Internal nodes represent ancestral clones and splitting events.
clone_assignments (pd.DataFrame) – DataFrame with cell/sample identifiers as index and a ‘clone_id’ column containing clone assignments. Should correspond to the leaves of the phylogeny. Typically obtained from PICASSO via get_clone_assignments().
character_matrix (pd.DataFrame) – The CNA character matrix where rows are cells/samples and columns are genomic features (genes, chromosome arms, bins). Values represent inferred copy number states. Should contain the same samples as in clone_assignments.
clone_aggregation ({'mode', 'mean'}, default='mode') – Method for aggregating CNA profiles within each clone: - ‘mode’: Use most frequent copy number state (recommended for noisy data) - ‘mean’: Use average copy number (not yet implemented)
metadata (pd.DataFrame, optional) – Additional sample metadata for visualization and analysis. Index should match character_matrix. Common examples include cell type annotations, sample origin, experimental conditions.
- clone_profiles
Aggregated CNA profiles for each clone (rows=clones, columns=genomic features).
- Type:
pd.DataFrame
- clone_profiles_certainty
Confidence/certainty scores for each aggregated profile value.
- Type:
pd.DataFrame
- clone_assignments
DataFrame with cell/sample identifiers as index and clone assignments.
- Type:
pd.DataFrame
- character_matrix
The CNA character matrix with cells as rows and genomic features as columns.
- Type:
pd.DataFrame
- metadata
Additional sample metadata for visualization and analysis.
- Type:
Optional[pd.DataFrame]
- Raises:
AssertionError – If clone_assignments lacks ‘clone_id’ column, if phylogeny leaves don’t match clone assignments, if sample indices don’t match between DataFrames, or if clone_aggregation method is invalid.
Examples
Basic usage with PICASSO results:
>>> from picasso import Picasso, CloneTree, load_data >>> >>> # Load example data and run PICASSO >>> character_matrix = load_data() >>> picasso = Picasso(character_matrix) >>> picasso.fit() >>> >>> # Create CloneTree for analysis >>> phylogeny = picasso.get_phylogeny() >>> assignments = picasso.get_clone_assignments() >>> clone_tree = CloneTree(phylogeny, assignments, character_matrix) >>> >>> # Analyze results >>> print(f"Number of clones: {len(clone_tree.clone_profiles)}") >>> clone_tree.plot_alterations(save_as='cna_heatmap.pdf') >>> clone_tree.plot_clone_sizes(save_as='clone_sizes.pdf')
With metadata for enhanced visualization:
>>> import pandas as pd >>> # Add cell type metadata (example) >>> metadata = pd.DataFrame({'cell_type': ['TypeA'] * 50 + ['TypeB'] * 50}, ... index=character_matrix.index) >>> clone_tree = CloneTree(phylogeny, assignments, character_matrix, ... metadata=metadata) >>> clone_tree.plot_alterations(metadata=metadata[['cell_type']])
Notes
Design Considerations for Noisy Data: - Modal aggregation reduces impact of outlier cells within clones - Confidence scores help identify uncertain clone profiles - Visualization functions highlight clone-specific patterns
Clone Profile Aggregation: - Mode aggregation finds most common copy number state per feature per clone - Handles missing data and ties in noisy scRNA-seq data - Certainty scores indicate reliability of aggregated values
Visualization Capabilities: - Heatmaps show clone-specific CNA patterns - Clone size distributions reveal clonal architecture - Integration with iTOL for publication-quality figures
See also
PicassoMain class for phylogenetic inference from CNA data
plot_alterationsCreate heatmap visualization of CNA profiles
plot_clone_sizesVisualize clone size distribution
get_sample_phylogenyGenerate sample-level phylogenetic tree
- aggregate_clones(aggregation_method)[source]
Aggregate CNA profiles within each clone to create representative clone profiles.
Combines individual cell CNA profiles within each clone into single representative profiles using statistical aggregation. This reduces noise and creates clean clone-level CNA signatures for downstream analysis and visualization.
- Parameters:
aggregation_method (str) – Method for aggregating CNA values within clones: - ‘mode’: Use most frequent copy number state (recommended for noisy data) - ‘mean’: Use average copy number (not yet implemented)
- Returns:
First DataFrame: Aggregated clone profiles with clones as rows and genomic features as columns. Values represent the aggregated copy number states. Second DataFrame: Certainty/confidence scores for each aggregated value, indicating reliability of the aggregation.
- Return type:
tuple of (pd.DataFrame, pd.DataFrame)
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> profiles, certainty = clone_tree.aggregate_clones('mode') >>> print(f"Clone profiles shape: {profiles.shape}") >>> print(f"Average certainty: {certainty.mean().mean():.2f}")
Notes
Modal Aggregation: - Finds the most common copy number state for each feature within each clone - Handles ties by selecting the first modal value - Provides certainty scores based on frequency of the modal state - Robust to outlier cells within clones - Facilitates visualization of CNA patterns across clones
Design for Noisy Data: - Modal aggregation reduces impact of noise and technical artifacts - Certainty scores help identify unreliable aggregated values - Particularly effective for scRNA-seq-inferred CNA data
- Raises:
NotImplementedError – If aggregation_method is ‘mean’ (not yet implemented).
ValueError – If aggregation_method is not ‘mode’ or ‘mean’.
See also
get_modal_clone_profilesInternal method implementing modal aggregation
- get_most_ancestral_clone()[source]
Identify the most ancestral clone based on CNA profile complexity.
Determines which clone represents the most ancestral state by counting the number of copy number alterations (deviations from neutral state). This is useful for rooting phylogenetic trees and understanding evolutionary relationships.
- Returns:
Clone identifier of the most ancestral clone (fewest alterations).
- Return type:
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> ancestral = clone_tree.get_most_ancestral_clone() >>> print(f"Most ancestral clone: {ancestral}") >>> >>> # Use for tree rooting >>> clone_tree.root_tree(ancestral)
Notes
Ancestral State Assumptions: - Copy number state 0 is considered the ancestral/neutral state - Clones with more alterations are considered more derived - Useful for establishing evolutionary directionality
Algorithm: 1. Count non-zero states for each clone in aggregated profiles 2. Select clone with minimum alteration count 3. Return clone identifier
Use Cases: - Rooting phylogenetic trees for visualization - Identifying putative normal/founder cell populations - Understanding tumor evolution trajectories
See also
root_treeMethod to root the phylogeny using an outgroup clone
clone_profilesAggregated CNA profiles used for ancestral inference
- root_tree(outgroup)[source]
Root the phylogenetic tree using a specified outgroup clone.
Establishes evolutionary directionality by setting a designated clone as the outgroup, which becomes the root of the tree. This is essential for proper interpretation of evolutionary relationships and visualization.
- Parameters:
outgroup (str) – Identifier of the clone to use as outgroup. Must be present in the phylogenetic tree leaves. Often the most ancestral clone identified by get_most_ancestral_clone().
- Return type:
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> >>> # Root with most ancestral clone >>> ancestral = clone_tree.get_most_ancestral_clone() >>> clone_tree.root_tree(ancestral) >>> >>> # Or root with specific clone >>> clone_tree.root_tree('1-0-STOP')
Notes
Effects of Rooting: - Changes tree topology and evolutionary interpretation - Affects all subsequent tree-based analyses - Resets sample phylogeny (if previously generated) - Essential for proper tree visualization
Outgroup Selection Guidelines: - Use most ancestral clone (fewest alterations) when possible - Consider biological knowledge about cell populations - Avoid clones with many unique alterations
Implementation Details: - Uses ete3’s set_outgroup() method - Invalidates cached sample phylogeny - Tree structure is modified in-place
- Raises:
AssertionError – If outgroup is not found among the tree leaves.
See also
get_most_ancestral_cloneIdentify suitable outgroup candidates
get_clone_phylogenyAccess the rooted phylogenetic tree
get_sample_phylogenyGenerate sample-level tree from rooted clone tree
- get_clone_phylogeny()[source]
Access the clone-level phylogenetic tree.
Returns the phylogenetic tree where leaves represent clones (terminal cell populations) and internal nodes represent ancestral populations. This is the primary tree structure used for evolutionary analysis.
- Returns:
Phylogenetic tree with clones as leaves. Tree may be rooted or unrooted depending on whether root_tree() has been called.
- Return type:
ete3.Tree
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> tree = clone_tree.get_clone_phylogeny() >>> print(f"Tree has {len(tree.get_leaves())} clones") >>> print("Clone names:", tree.get_leaf_names()) >>> >>> # Tree manipulation >>> if not tree.is_root(): ... print("Tree is rooted") >>> >>> # Export to Newick format >>> newick_str = tree.write()
Notes
Tree Structure: - Leaves represent terminal clones from PICASSO analysis - Internal nodes represent inferred ancestral states - Branch structure reflects evolutionary relationships - Node names correspond to clone identifiers
Tree States: - May be rooted (after root_tree()) or unrooted - Tree topology reflects PICASSO splitting hierarchy - Compatible with standard phylogenetic analysis tools
Use Cases: - Phylogenetic visualization and analysis - Export to external tools (iTOL, FigTree, etc.) - Evolutionary distance calculations - Tree-based clustering validation
See also
get_sample_phylogenyGet expanded tree with individual cells
root_treeRoot the tree for proper evolutionary interpretation
- get_sample_phylogeny()[source]
Generate expanded phylogenetic tree with individual cells as leaves.
Creates a detailed tree where each cell/sample appears as a separate leaf, while maintaining the clone-based evolutionary structure. Cells within the same clone are attached as children of their respective clone nodes.
- Returns:
Expanded phylogenetic tree where leaves represent individual cells/samples rather than clones. Clone nodes become internal nodes with cells as children.
- Return type:
ete3.Tree
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> sample_tree = clone_tree.get_sample_phylogeny() >>> print(f"Tree has {len(sample_tree.get_leaves())} cells") >>> >>> # Access cell-specific information >>> for leaf in sample_tree.get_leaves(): ... print(f"Cell {leaf.name}") ... if clone_tree.metadata is not None: ... print(f" Metadata: {leaf.features}")
Notes
Tree Construction: - Starts with clone phylogeny as backbone - Adds individual cells as children of clone nodes - Preserves evolutionary relationships at clone level - Enables cell-level analysis within phylogenetic context
Metadata Integration: - If metadata provided, adds features to cell nodes - Features accessible via leaf.features or leaf.get_feature() - Enables metadata-aware tree visualization
Performance Considerations: - Tree generated on first call, then cached - Cache invalidated when tree is re-rooted - Large datasets may produce complex trees
Use Cases: - Cell-level phylogenetic visualization - Metadata mapping onto evolutionary structure - Detailed iTOL annotations - Single-cell evolutionary analysis
See also
get_clone_phylogenyAccess the underlying clone tree structure
metadataCell-level metadata integrated into tree nodes
- infer_evolutionary_changes()[source]
Infer evolutionary changes along phylogenetic tree branches.
Reconstructs the specific copy number alterations that occurred at each internal node of the phylogenetic tree by analyzing transitions between ancestral and derived clone profiles. This method is planned for future implementation.
- Raises:
NotImplementedError – This method is not yet implemented. Future versions will support ancestral state reconstruction and evolutionary change mapping.
Notes
Planned Functionality: - Ancestral state reconstruction for internal tree nodes - Identification of specific CNA events along branches
Potential Applications: - Understanding CNA acquisition patterns - Identifying driver vs passenger alterations - Validating phylogenetic relationships
See also
clone_profilesAggregated clone CNA profiles used for inference
get_clone_phylogenyPhylogenetic tree structure for change mapping
- Return type:
- plot_alterations(metadata=None, cmap='coolwarm', show=True, save_as=None, center=None)[source]
Create clustered heatmap visualization of CNA profiles with clone annotations.
Generates a comprehensive heatmap showing copy number alterations across all cells, with cells grouped by clone assignment and colored sidebars indicating clone membership and optional metadata categories.
- Parameters:
metadata (pd.DataFrame, optional) – Additional metadata for enhanced visualization. Index should match character_matrix. Each column represents a metadata category (e.g., cell_type, treatment, tissue). Will be displayed as colored sidebars.
cmap (str, default='coolwarm') – Matplotlib colormap for the main heatmap. Common choices: - ‘coolwarm’: Blue-white-red for CNAs (deletions-neutral-amplifications) - ‘RdBu_r’: Red-blue reversed - ‘viridis’: Perceptually uniform colormap
show (bool, default=True) – Whether to display the plot interactively.
save_as (str, optional) – File path to save the plot. Supports common formats (.pdf, .png, .svg). Recommended: use .pdf for publication quality.
center (float, optional) – Value at which to center the colormap. If None, uses default centering. For CNA data, typically 0 (neutral copy number) or 2 (diploid).
- Return type:
Examples
Basic heatmap with clone annotations:
>>> from picasso import Picasso, CloneTree, load_data >>> >>> # Create CloneTree >>> cna_data = load_data() >>> picasso = Picasso(cna_data) >>> picasso.fit() >>> clone_tree = CloneTree(picasso.get_phylogeny(), ... picasso.get_clone_assignments(), ... cna_data) >>> >>> # Basic visualization >>> clone_tree.plot_alterations(save_as='cna_heatmap.pdf')
Enhanced visualization with metadata:
>>> import pandas as pd >>> >>> # Add cell type metadata >>> metadata = pd.DataFrame({ ... 'cell_type': ['Malignant'] * 80 + ['Normal'] * 20, ... 'tissue': ['Primary'] * 60 + ['Metastasis'] * 40 ... }, index=cna_data.index) >>> >>> # Create enhanced heatmap >>> clone_tree.plot_alterations(metadata=metadata, ... cmap='RdBu_r', ... center=0, ... save_as='enhanced_heatmap.pdf')
Notes
Visualization Features: - Cells automatically grouped by clone assignment - Clone-specific color sidebar for easy identification - Optional metadata sidebars for additional context - Configurable color schemes for different data types
Layout Organization: - Rows: Individual cells/samples - Columns: Genomic features (chromosome arms, genes, etc.) - Left sidebars: Clone assignments + optional metadata - Main heatmap: Copy number alteration values
Color Interpretation: - Clone sidebar: Each clone gets a distinct color - Metadata sidebars: Categorical values get distinct colors - Main heatmap: Continuous colormap for CNA values
Best Practices: - Use ‘coolwarm’ colormap for copy number data - Center colormap at neutral copy number (typically 0 or 2) - Save as PDF for publication-quality figures - Include relevant metadata for biological context
See also
plot_clone_sizesVisualize clone size distribution
clone_profilesAccess aggregated clone CNA profiles
seaborn.clustermapUnderlying plotting function used
- plot_clone_sizes(show=True, save_as=None)[source]
Visualize the distribution of clone sizes in the phylogenetic tree.
Creates a histogram showing how many cells belong to each clone, providing insights into clonal architecture, diversity, and potential dominant/rare clones within the analyzed population.
- Parameters:
- Return type:
Examples
Basic clone size visualization:
>>> from picasso import Picasso, CloneTree, load_data >>> >>> # Create CloneTree and visualize clone sizes >>> cna_data = load_data() >>> picasso = Picasso(cna_data) >>> picasso.fit() >>> clone_tree = CloneTree(picasso.get_phylogeny(), ... picasso.get_clone_assignments(), ... cna_data) >>> >>> # Display clone size distribution >>> clone_tree.plot_clone_sizes()
Save without displaying:
>>> # Save to file without showing >>> clone_tree.plot_clone_sizes(show=False, save_as='clone_sizes.pdf')
Analyze clone architecture:
>>> # Get clone sizes for analysis >>> assignments = picasso.get_clone_assignments() >>> clone_sizes = assignments['clone_id'].value_counts() >>> print(f"Largest clone: {clone_sizes.max()} cells") >>> print(f"Smallest clone: {clone_sizes.min()} cells") >>> print(f"Mean clone size: {clone_sizes.mean():.1f} cells") >>> >>> # Visualize >>> clone_tree.plot_clone_sizes(save_as='clone_architecture.pdf')
Notes
Plot Features: - Histogram showing distribution of clone sizes - X-axis: Clone size (number of cells per clone) - Y-axis: Number of clones with that size - Kernel density estimate (KDE) overlay for smooth distribution - Automatic binning based on data range
Interpretation: - Right-skewed distribution: Few large clones dominate - Uniform distribution: Balanced clonal architecture - Left-skewed distribution: Many small clones, rare large ones
Technical Considerations: - Clone sizes depend on PICASSO parameters (min_clone_size, etc.) - Very small clones may indicate noise or over-splitting - Very large clones may indicate under-splitting or homogeneity
See also
plot_alterationsVisualize CNA profiles with clone annotations
clone_assignmentsAccess raw clone assignment data
get_clone_assignmentsGet clone assignments from PICASSO analysis
- static calc_mode(series)[source]
Calculate the statistical mode (most frequent value) of a pandas Series.
Computes the most common value in a series, handling edge cases where no mode exists or multiple modes are present. Used for aggregating copy number states within clones.
- Parameters:
series (pd.Series) – Input data series containing numeric values (typically copy number states).
- Returns:
The most frequent value in the series. Returns None if series is empty or all values are NaN. If multiple modes exist, returns the first one.
- Return type:
Examples
>>> import pandas as pd >>> data = pd.Series([1, 1, 2, 2, 2, 3]) >>> CloneTree.calc_mode(data) 2 >>> >>> # Handle ties >>> tie_data = pd.Series([1, 1, 2, 2]) >>> CloneTree.calc_mode(tie_data) # Returns first mode 1
Notes
Uses pandas Series.mode() method internally
Handles empty series gracefully by returning None
For ties, returns the first modal value (arbitrary but consistent)
Designed for integer copy number data but works with any numeric type
See also
calc_mode_freqCalculate frequency of the modal value
get_modal_clone_profilesMain method using this utility
- static calc_mode_freq(series)[source]
Calculate the frequency (proportion) of the modal value in a pandas Series.
Computes what fraction of values in the series match the most frequent value. This provides a confidence measure for modal aggregation - higher frequencies indicate more reliable consensus within the data.
- Parameters:
series (pd.Series) – Input data series containing numeric values (typically copy number states).
- Returns:
Proportion of values matching the modal value, between 0.0 and 1.0. Returns 0.0 if series is empty or contains only NaN values.
- Return type:
Examples
>>> import pandas as pd >>> # High consensus >>> data = pd.Series([2, 2, 2, 2, 1]) >>> CloneTree.calc_mode_freq(data) 0.8 # 4 out of 5 values are modal >>> >>> # Perfect consensus >>> uniform = pd.Series([1, 1, 1, 1]) >>> CloneTree.calc_mode_freq(uniform) 1.0 >>> >>> # Low consensus (tie) >>> mixed = pd.Series([1, 2, 3, 4]) >>> CloneTree.calc_mode_freq(mixed) 0.25 # Each value appears once
Notes
Interpretation Guide: - 1.0: Perfect consensus, all values identical - 0.8-0.9: Strong consensus with few outliers - 0.5-0.7: Moderate consensus, some heterogeneity - <0.5: Weak consensus, high heterogeneity
Use in Clone Analysis: - Quality metric for clone coherence - Confidence score for aggregated profiles - Filter for reliable clone assignments - Identifies noisy or heterogeneous clones
See also
calc_modeCalculate the actual modal value
get_modal_clone_profilesMain method using this utility for confidence scores
- get_modal_clone_profiles()[source]
Compute modal (most frequent) copy number states for each clone.
Aggregates CNA profiles within each clone by finding the most common copy number state for each genomic feature. Also computes confidence scores based on the frequency of the modal state.
- Returns:
- modal_profilespd.DataFrame
Clone profiles with modal copy number states. Rows are clones, columns are genomic features. Values are the most frequent copy number state within each clone.
- modal_frequenciespd.DataFrame
Confidence scores for modal states. Same structure as modal_profiles but values represent the proportion of cells with the modal state (0.0 to 1.0, where 1.0 indicates all cells have the same state).
- Return type:
tuple of (pd.DataFrame, pd.DataFrame)
Examples
>>> clone_tree = CloneTree(phylogeny, assignments, cna_data) >>> profiles, frequencies = clone_tree.get_modal_clone_profiles() >>> >>> # Examine profile quality >>> avg_confidence = frequencies.mean().mean() >>> print(f"Average modal confidence: {avg_confidence:.2f}") >>> >>> # Find highly confident features >>> confident_features = frequencies.columns[frequencies.mean() > 0.8] >>> print(f"High confidence features: {len(confident_features)}")
Notes
Modal Aggregation Process: 1. Group cells by clone assignment 2. For each clone-feature combination, find most frequent copy number state 3. Calculate frequency of modal state as confidence measure 4. Handle ties by selecting first modal value
Confidence Interpretation: - 1.0: All cells in clone have identical copy number state - 0.5-0.9: Majority consensus with some variation - <0.5: High heterogeneity, unreliable modal state
Noise Handling: - Modal aggregation naturally filters outlier cells - Confidence scores identify unreliable aggregations - Particularly effective for noisy scRNA-seq-inferred CNAs
Applications: - Generate clean clone signatures for visualization - Quality control for clone assignments - Feature selection based on clone coherence
See also
calc_modeStatic method for computing modal values
calc_mode_freqStatic method for computing modal frequencies
aggregate_clonesPublic interface using this method