Quick Start Guide

This guide will get you up and running with PICASSO in just a few minutes. PICASSO reconstructs phylogenetic trees from noisy copy number alteration (CNA) data derived from single-cell RNA sequencing.

Note

Before starting, make sure you have installed PICASSO following the Installation guide.

5-Minute Tutorial

Step 1: Load Your Data

PICASSO works with CNA data as a pandas DataFrame where rows are cells and columns are genomic features:

import picasso
import pandas as pd

# Load example data (or use your own)
cna_data = picasso.load_data()
print(f"Dataset: {cna_data.shape[0]} cells × {cna_data.shape[1]} features")

# Examine the data structure
print(cna_data.head())

The data should contain integer copy number states (e.g., -1=deletion, 0=neutral, 1=amplification).

Step 2: Basic Phylogeny Reconstruction

For most datasets, the default parameters work well:

# Initialize PICASSO with default parameters
model = picasso.Picasso(cna_data)

# Reconstruct the phylogeny
model.fit()

# Extract results
phylogeny = model.get_phylogeny()
clone_assignments = model.get_clone_assignments()

print(f"Reconstructed {len(phylogeny.get_leaves())} clones")

Step 3: Analyze Results

Use CloneTree for enhanced analysis and visualization:

# Create tree analyzer
tree = picasso.CloneTree(phylogeny, clone_assignments, cna_data)

# Root the tree at the most ancestral clone
outgroup = tree.get_most_ancestral_clone()
tree.root_tree(outgroup)

# Visualize clone sizes and alterations
tree.plot_clone_sizes()
tree.plot_alterations()

Step 4: Export Results

Export for further analysis or publication:

# Get clone phylogeny as Newick string
clone_tree = tree.get_clone_phylogeny()
newick_string = clone_tree.write()

# Save results
clone_assignments.to_csv('clone_assignments.csv')

# Export for iTOL visualization
heatmap_annotation = picasso.itol_utils.dataframe_to_itol_heatmap(cna_data)
with open('itol_heatmap.txt', 'w') as f:
    f.write(heatmap_annotation)

That’s it! You’ve successfully reconstructed a phylogeny from CNA data.

Understanding Your Data

Input Format

PICASSO expects a pandas DataFrame with: - Rows: Individual cells/samples - Columns: Genomic features (chromosome arms, genes, bins) - Values: Integer copy number states

Common encodings: - -2, -1: Deletions (homozygous, heterozygous) - 0: Neutral copy number - 1, 2, 3+: Amplifications (single, double, triple+)

Data Quality Considerations

PICASSO is designed for noisy scRNA-seq-inferred CNAs, but data quality affects results:

# Check data characteristics
print(f"Copy number range: {cna_data.min().min()} to {cna_data.max().max()}")
print(f"Missing values: {cna_data.isnull().sum().sum()}")
print(f"Feature variance: {cna_data.var().describe()}")

Feature Filtering (Optional)

Remove uninformative features to speed up analysis:

# Remove features with >99% modal values (optional)
n_features_before = cna_data.shape[1]
modal_threshold = 0.99

feature_modality = (cna_data.values == cna_data.mode(axis=0).values).mean(axis=0)
informative_features = feature_modality < modal_threshold

cna_filtered = cna_data.loc[:, informative_features]
print(f"Filtered from {n_features_before} to {cna_filtered.shape[1]} features")

Parameter Selection

For Clean Data

If your CNA data has low noise (e.g., from scDNA-seq or well-validated inference):

model = picasso.Picasso(
    cna_data,
    min_clone_size=5,              # Smaller clones OK
    assignment_confidence_threshold=0.7,  # Lower confidence OK
    terminate_by='BIC'             # BIC-based termination
)

For Noisy Data

If your data is very noisy (typical for scRNA-seq-inferred CNAs):

model = picasso.Picasso(
    cna_data,
    min_clone_size=20,             # Larger clones for robustness
    max_depth=10,                  # Limit depth to avoid over-fitting
    assignment_confidence_threshold=0.85,  # Higher confidence required
    assignment_confidence_proportion=0.9,  # Most cells must be confident
    terminate_by='probability'     # Confidence-based termination
)

Parameter Guidelines

  • min_clone_size: 5-10 for clean data, 10-50+ for noisy data

  • assignment_confidence_threshold: 0.7 for clean data, 0.8-0.9 for noisy data

  • max_depth: Unlimited for clean data, 8-15 for noisy data to prevent over-fitting

  • terminate_by: ‘BIC’ for clean data, ‘probability’ for noisy data

Common Workflows

Workflow 1: Standard Analysis

# 1. Load and examine data
data = picasso.load_data()

# 2. Reconstruct phylogeny
model = picasso.Picasso(data, min_clone_size=10)
model.fit()

# 3. Analyze results
tree = picasso.CloneTree(model.get_phylogeny(),
                        model.get_clone_assignments(),
                        data)

# 4. Generate visualizations
tree.plot_clone_sizes()
tree.plot_alterations()

Workflow 2: Parameter Exploration

# Try different minimum clone sizes
clone_sizes = [5, 10, 20, 50]
results = {}

for size in clone_sizes:
    model = picasso.Picasso(data, min_clone_size=size)
    model.fit()
    n_clones = len(model.get_phylogeny().get_leaves())
    results[size] = n_clones
    print(f"Min clone size {size}: {n_clones} clones")

Workflow 3: Noisy Data Pipeline

# 1. Filter features
modality = (data.values == data.mode(axis=0).values).mean(axis=0)
filtered_data = data.loc[:, modality < 0.95]

# 2. Use conservative parameters
model = picasso.Picasso(
    filtered_data,
    min_clone_size=25,
    max_depth=12,
    assignment_confidence_threshold=0.85,
    assignment_confidence_proportion=0.9
)

# 3. Fit and analyze
model.fit()
tree = picasso.CloneTree(model.get_phylogeny(),
                        model.get_clone_assignments(),
                        filtered_data)

Output Interpretation

Clone Assignments

The clone assignments DataFrame shows which clone each cell belongs to:

assignments = model.get_clone_assignments()
print(assignments.head())

# Clone size distribution
clone_sizes = assignments['clone_id'].value_counts()
print("Clone sizes:", clone_sizes.head())

Phylogenetic Tree

The phylogeny represents evolutionary relationships:

phylogeny = model.get_phylogeny()

# Tree statistics
print(f"Number of leaves (clones): {len(phylogeny.get_leaves())}")
print(f"Tree depth: {phylogeny.get_farthest_leaf()[1]}")

# Leaf names correspond to clone IDs
leaf_names = phylogeny.get_leaf_names()
print("Clone IDs:", leaf_names[:5])

Tree Analysis

CloneTree provides additional insights:

# Get modal CNA profiles for each clone
modal_profiles = tree.get_modal_clone_profiles()

# Infer evolutionary changes along branches
changes = tree.infer_evolutionary_changes()
print(f"Detected {len(changes)} evolutionary events")

Next Steps

Now that you understand the basics:

  1. Explore Examples: See Detailed Examples for detailed tutorials on specific use cases

  2. Parameter Tuning: Learn how to optimize parameters for your specific data type

  3. Advanced Analysis: Discover CloneTree’s advanced features for phylogenetic analysis

  4. Visualization: Create publication-ready figures with iTOL integration

  5. API Reference: Consult API Reference for complete function documentation

Need Help?