Quick Start Guide

This guide will get you up and running with PICASSO in just a few minutes. PICASSO reconstructs phylogenetic trees from noisy copy number alteration (CNA) data derived from single-cell RNA sequencing.

Note

Before starting, make sure you have installed PICASSO following the Installation guide.

5-Minute Tutorial

Step 1: Load Your Data

PICASSO works with CNA data as a pandas DataFrame where rows are cells and columns are genomic features:

import picasso
import pandas as pd

# Load example data (or use your own)
cna_data = picasso.load_data()
print(f"Dataset: {cna_data.shape[0]} cells × {cna_data.shape[1]} features")

# Examine the data structure
print(cna_data.head())

The data should contain integer copy number states (e.g., -1=deletion, 0=neutral, 1=amplification).

Step 2: Basic Phylogeny Reconstruction

For most datasets, the default parameters work well:

# Initialize PICASSO with default parameters
model = picasso.Picasso(cna_data)

# Reconstruct the phylogeny
model.fit()

# Extract results
phylogeny = model.get_phylogeny()
clone_assignments = model.get_clone_assignments()

print(f"Reconstructed {len(phylogeny.get_leaves())} clones")

Step 3: Analyze Results

Use CloneTree for enhanced analysis and visualization:

# Create tree analyzer
tree = picasso.CloneTree(phylogeny, clone_assignments, cna_data)

# Root the tree at the most ancestral clone
outgroup = tree.get_most_ancestral_clone()
tree.root_tree(outgroup)

# Visualize clone sizes and alterations
tree.plot_clone_sizes()
tree.plot_alterations()

Step 4: Export Results

Export for further analysis or publication:

# Get clone phylogeny as Newick string
clone_tree = tree.get_clone_phylogeny()
newick_string = clone_tree.write()

# Save results
clone_assignments.to_csv('clone_assignments.csv')

# Export for iTOL visualization
heatmap_annotation = picasso.itol_utils.dataframe_to_itol_heatmap(cna_data)
with open('itol_heatmap.txt', 'w') as f:
    f.write(heatmap_annotation)

That’s it! You’ve successfully reconstructed a phylogeny from CNA data.

Understanding Your Data

Input Format

PICASSO expects a pandas DataFrame with: - Rows: Individual cells/samples - Columns: Genomic features (chromosome arms, genes, bins) - Values: Integer copy number states

Common encodings: - -2, -1: Deletions (homozygous, heterozygous) - 0: Neutral copy number - 1, 2, 3+: Amplifications (single, double, triple+)

Data Quality Considerations

PICASSO is designed for noisy scRNA-seq-inferred CNAs, but data quality affects results:

# Check data characteristics
print(f"Copy number range: {cna_data.min().min()} to {cna_data.max().max()}")
print(f"Missing values: {cna_data.isnull().sum().sum()}")
print(f"Feature variance: {cna_data.var().describe()}")

Feature Filtering (Optional)

Remove uninformative features to speed up analysis:

# Remove features with >99% modal values (optional)
n_features_before = cna_data.shape[1]
modal_threshold = 0.99

feature_modality = (cna_data.values == cna_data.mode(axis=0).values).mean(axis=0)
informative_features = feature_modality < modal_threshold

cna_filtered = cna_data.loc[:, informative_features]
print(f"Filtered from {n_features_before} to {cna_filtered.shape[1]} features")

Parameter Selection

For Clean Data

If your CNA data has low noise (e.g., from scDNA-seq or well-validated inference):

model = picasso.Picasso(
    cna_data,
    min_clone_size=5,              # Smaller clones OK
    assignment_confidence_threshold=0.7,  # Lower confidence OK
    terminate_by='BIC'             # BIC-based termination
)

For Noisy Data

If your data is very noisy (typical for scRNA-seq-inferred CNAs):

model = picasso.Picasso(
    cna_data,
    min_clone_size=20,             # Larger clones for robustness
    max_depth=10,                  # Limit depth to avoid over-fitting
    assignment_confidence_threshold=0.85,  # Higher confidence required
    assignment_confidence_proportion=0.9,  # Most cells must be confident
    terminate_by='probability'     # Confidence-based termination
)

Parameter Guidelines

min_clone_size: 5-10 for clean data, 10-50+ for noisy data
assignment_confidence_threshold: 0.7 for clean data, 0.8-0.9 for noisy data
max_depth: Unlimited for clean data, 8-15 for noisy data to prevent over-fitting
terminate_by: ‘BIC’ for clean data, ‘probability’ for noisy data

Common Workflows

Workflow 1: Standard Analysis

# 1. Load and examine data
data = picasso.load_data()

# 2. Reconstruct phylogeny
model = picasso.Picasso(data, min_clone_size=10)
model.fit()

# 3. Analyze results
tree = picasso.CloneTree(model.get_phylogeny(),
                        model.get_clone_assignments(),
                        data)

# 4. Generate visualizations
tree.plot_clone_sizes()
tree.plot_alterations()

Workflow 2: Parameter Exploration

# Try different minimum clone sizes
clone_sizes = [5, 10, 20, 50]
results = {}

for size in clone_sizes:
    model = picasso.Picasso(data, min_clone_size=size)
    model.fit()
    n_clones = len(model.get_phylogeny().get_leaves())
    results[size] = n_clones
    print(f"Min clone size {size}: {n_clones} clones")

Workflow 3: Noisy Data Pipeline

# 1. Filter features
modality = (data.values == data.mode(axis=0).values).mean(axis=0)
filtered_data = data.loc[:, modality < 0.95]

# 2. Use conservative parameters
model = picasso.Picasso(
    filtered_data,
    min_clone_size=25,
    max_depth=12,
    assignment_confidence_threshold=0.85,
    assignment_confidence_proportion=0.9
)

# 3. Fit and analyze
model.fit()
tree = picasso.CloneTree(model.get_phylogeny(),
                        model.get_clone_assignments(),
                        filtered_data)

Output Interpretation

Clone Assignments

The clone assignments DataFrame shows which clone each cell belongs to:

assignments = model.get_clone_assignments()
print(assignments.head())

# Clone size distribution
clone_sizes = assignments['clone_id'].value_counts()
print("Clone sizes:", clone_sizes.head())

Phylogenetic Tree

The phylogeny represents evolutionary relationships:

phylogeny = model.get_phylogeny()

# Tree statistics
print(f"Number of leaves (clones): {len(phylogeny.get_leaves())}")
print(f"Tree depth: {phylogeny.get_farthest_leaf()[1]}")

# Leaf names correspond to clone IDs
leaf_names = phylogeny.get_leaf_names()
print("Clone IDs:", leaf_names[:5])

Tree Analysis

CloneTree provides additional insights:

# Get modal CNA profiles for each clone
modal_profiles = tree.get_modal_clone_profiles()

# Infer evolutionary changes along branches
changes = tree.infer_evolutionary_changes()
print(f"Detected {len(changes)} evolutionary events")

Next Steps

Now that you understand the basics:

Explore Examples: See Detailed Examples for detailed tutorials on specific use cases
Parameter Tuning: Learn how to optimize parameters for your specific data type
Advanced Analysis: Discover CloneTree’s advanced features for phylogenetic analysis
Visualization: Create publication-ready figures with iTOL integration
API Reference: Consult API Reference for complete function documentation

Need Help?

Check the Detailed Examples for similar use cases
Consult the API Reference for detailed parameter descriptions
Visit our GitHub Issues for questions
Review the original publication for algorithmic details