scsplice¶
Single-cell alternative-splicing analysis for the scverse ecosystem.
scsplice is the Python port of the R package splikit. It analyses splice-junction count data in single-cell RNA-seq, treating each event as a pair of inclusion (M1) and exclusion (M2) counts derived from local junction variants (LJVs). The package is AnnData-native — junctions live on the var axis, M1 and M2 sit in layers, and downstream analysis composes naturally with scanpy.
Public API¶
| Function | Module | Purpose |
|---|---|---|
read_starsolo |
scsplice.io |
Ingest STARsolo Solo.out/SJ/ for one or more samples into a splicing AnnData. Supports tissue_positions= for Visium / spatial samples. |
read_starsolo_gene |
scsplice.io |
Ingest Solo.out/Gene/ into a cell × gene AnnData with raw counts in X. Drop-in for scanpy.pp.normalize_total and scvi-tools. Supports tissue_positions= and squidpy obsm["spatial"]. |
read_starsolo_velocyto |
scsplice.io |
Ingest Solo.out/Velocyto/ into an AnnData with layers["spliced"], layers["unspliced"], layers["ambiguous"]. Handles both modern (split-file) and legacy (stacked matrix.mtx) STARsolo wire formats. Drop-in for scvelo. |
make_m2 |
scsplice.tl |
Build the exclusion matrix M2 from M1 and LJV group_id (C++ kernel, OpenMP-parallel) |
highly_variable_events |
scsplice.pp |
Select highly variable splicing events via per-library binomial deviance |
pseudo_correlation |
scsplice.tl |
Per-event signed pseudo-R² (Cox-Snell / Nagelkerke) against an external predictor matrix |
All three readers share the same (sample_dirs, sample_ids) call shape and produce AnnDatas with identical obs_names, making multi-modal pipelines a clean set-intersection.
Quick start¶
import scsplice as scs
adata = scs.io.read_starsolo(
sj_dirs=["sample1/Solo.out/SJ", "sample2/Solo.out/SJ"],
sample_ids=["s1", "s2"],
)
scs.tl.make_m2(adata, n_threads=8)
scs.pp.highly_variable_events(adata, min_row_sum=50, n_threads=8)
See Getting started for the full install and walkthrough.
Acknowledgements¶
scsplice is a Python port of the R package splikit, developed by the Computational and Statistical Genomics (CSG) Laboratory at McGill University. We gratefully acknowledge the original R splikit authors and the CSG Lab for the foundational algorithm and reference implementation.
Design principles¶
- AnnData-native. Junctions are
var, cells areobs, M1 and M2 arelayers. No custom objects —scanpy,scvi-tools, and the rest of the ecosystem work on the output without adapters. - Bit-exact parity with R splikit. M2 is bit-identical; HVE deviance agrees to
rtol=1e-10; pseudo-correlation agrees tortol=1e-7. The cross-language regression suite lives on thevalidationbranch. - Two C++ kernels, one thin Python layer.
make_m2andhighly_variable_eventsdelegate to pybind11-wrapped Eigen kernels; the Python layer handles validation, AnnData conventions, and dispatch only. - Intentionally narrow scope. HVG, plotting, and silhouette utilities are not included —
scanpy,pyranges, andsklearnalready cover those.
Status¶
v1.0. The six functions in the table above are the complete v1.0 API.
R package: splikit (CRAN) — same algorithm, R / AnnData-agnostic design.
Brand note: the hexagonal seagull logo is the original R splikit artwork, shared between both packages to signal that the Python and R implementations are the same algorithm with a unified brand. In dark mode the logo is rendered on a white pill background so the black-on-white illustration remains legible against the dark site background.