The Cell Is Not a Photograph

January 2026 · single-cell genomics, measurement

There is a kind of figure you see all the time in developmental biology papers. It is a UMAP embedding, that kidney-bean shaped cloud of dots that every tool spits out, and there is a color gradient swept across it from blue to yellow. The gradient is labeled "pseudotime." There is probably an arrow drawn by the authors, indicating which way they think the cells are flowing. The caption says something like "Cells progress through a continuous trajectory of differentiation from state A to state B." It looks convincing. It appears in maybe every fourth paper. And the whole thing rests, quietly, on an assumption that almost nobody has stopped to examine, which is that you can figure out what a cell was doing by killing it and reading its contents.

I want to be careful here. I am not trying to trash single-cell RNA sequencing, which has been genuinely transformative for biology. It gave us the first comprehensive atlases of human tissues. It revealed cell type compositions that bulk methods would never have resolved. It turned up rare populations that changed our understanding of development and disease. This is not a post about how the technology is bad. This is a post about the specific ways the field has drifted from using the technology to describe what it can describe (what cells look like at a given moment) into using it to make claims about what the technology cannot support (how cells change, what direction they move, what caused what). The gap between those two things is where most of the confusion in the field lives, and it is worth being careful about.

The problem is fundamentally structural. The most informative molecular measurements we have, meaning sequencing and mass spectrometry and immunofluorescence, all require you to destroy the cell to make the measurement. You lyse it. You fix it. You grind it up. You get one measurement per cell, ever, and then that particular cell is gone. Any trajectory you draw through a single-cell dataset is an inference across a population of different cells measured at one moment, not a recording of any individual cell moving through time. This sounds like a pedantic point, and for some applications it is. But for questions about dynamics it turns out to be devastating. A continuum of states in a UMAP embedding could be a real developmental progression, where the cells at the "early" end are actually progressing toward the cells at the "late" end. Or it could be a stable mixture of three distinct cell types with some doublets that happen to express features of both. You cannot distinguish these two situations from the sequencing data alone. The geometry of the embedding is the same either way.

There is also the averaging problem, which gets discussed less but is doing a lot of damage. If a gene switches on briefly in 5% of cells, bulk RNA-seq will see a gentle 5% signal, which looks like weak constitutive expression, not a sharp pulse. If a drug treatment uniformly upregulates a gene in every cell in your sample, but the drug also shifts the composition of the sample by killing off a subpopulation that was expressing the gene at high baseline, the bulk measurement will register a decrease in expression. This is Simpson's paradox, and it shows up in biology constantly. The actual biology lives in the distribution of behaviors across cells. The mean discards the distribution and just gives you a number, which is often an actively misleading number.

And then there is the timescale problem, which I think is the most important of the three, and which almost nobody talks about. Ion channels gate in microseconds. Calcium transients last milliseconds. Kinase signaling cascades propagate over minutes. Transcriptional responses take tens of minutes to hours. Epigenetic remodeling unfolds across hours to days. Developmental fate decisions play out over days to weeks. This is seven orders of magnitude of biologically meaningful timescales, and no single measurement technology spans more than two or three of them. Every tool is implicitly a bandpass filter, and you are always choosing which slice of the temporal landscape to see. Sequencing, because it takes time and destroys the sample, sees something like the minute-to-hour band, which happens to be the transcriptional response band, which is a fine band to see if you care about transcriptional responses. It is exactly the wrong band if you care about signaling dynamics or long-term fate commitment.

No single measurement technology spans the relevant timescales. Sequencing captures the middle. It misses both ends.

The field has responded to these problems with an impressive amount of computational ingenuity. You cannot watch a cell over time with sequencing, so we built tools that try to infer time from static snapshots. The story of each of these tools follows basically the same arc, which goes: a clever idea that seems to work on the datasets it was designed for, a flurry of follow-up papers extending it to other systems, a slow accumulation of failure modes documented in methods-focused critical reviews, and eventually a kind of wary understanding in the community that the tool is useful in a narrow range of conditions and overused everywhere else. I want to walk through the three big ones (pseudotime, RNA velocity, and optimal transport) because understanding what each one actually does and where it breaks is the clearest way to see why the whole approach has a ceiling.

Pseudotime: an ordering that is not time

The basic idea of pseudotime is simple enough. You have a UMAP or a diffusion map of your cells. You pick one cell as the "root," meaning the cell you think is at the beginning of the process. Then you measure how far every other cell is from the root, along the structure of the embedding, and you use that distance as a stand-in for developmental time. Monocle, which was the first widely used implementation, fits a principal graph through the cells in reduced-dimensional space and measures pseudotime as geodesic distance from the root along this graph (Trapnell et al., Nature Biotechnology, 2014). Diffusion pseudotime is a bit more sophisticated: it constructs a transition probability matrix between cells based on their expression similarity, then computes pseudotime from the diffusion distance relative to the root (Haghverdi et al., Nature Methods, 2016). PAGA takes a more cautious approach and refuses to impose a continuous ordering at all, instead just drawing a graph of which clusters are connected to which other clusters and letting you look at the topology (Wolf et al., Genome Biology, 2019).

These methods work well, and I want to give them credit for this, in specific situations. When you have a system with strong unidirectional transcriptional change (hematopoiesis, intestinal crypt differentiation, spermatogenesis), well-separated cell states, dense sampling of the intermediate states, and enough prior biological knowledge to pick the root correctly, pseudotime recovers something that correlates with real developmental time. In those situations, it is genuinely useful, and I do not want to pretend otherwise.

The problem is that the failure modes are not edge cases, they are the default for anything more complicated than those textbook examples, and they are routinely ignored. Here is what I mean. Pseudotime methods assume the underlying process is a tree, meaning cells start somewhere and progress along branches toward one or more endpoints. This is a strong assumption. It breaks completely for any process with cycles in it, and the cell cycle is an obvious example, but also for convergent trajectories (where two different starting points produce the same end state) and for systems running multiple parallel processes simultaneously. If you run pseudotime on data from a tissue where cells are cycling and differentiating at the same time, and you do not explicitly regress out the cell cycle signal, the algorithm will happily weave the cycle through your developmental trajectory and produce spurious branches.

The second issue is that pseudotime has no units. A large difference in pseudotime value might correspond to minutes of real time or to months. There is no way to know from the pseudotime value alone. This is not a small problem. When a paper plots a gene's expression on the y-axis against pseudotime on the x-axis and shows a sharp transition, you cannot tell whether you are looking at a rapid switch that happens in hours or a slow drift that happens over days. You can make a plot that looks like either, depending on which cells get assigned which pseudotime values. The y-axis is a real measurement. The x-axis is a model prediction dressed up to look like a measurement.

The third issue is that root selection is arbitrary without outside knowledge. If you flip the root to the other end of the trajectory, the algorithm will produce an equally self-consistent ordering, just reversed. Pseudotime does not know which end is the beginning. It only knows how far things are from whatever you told it was the beginning. If you pick the wrong cell as the root, you get a perfectly valid but completely backward story about development.

The fourth issue is technical but severe, which is that the distance calculations underlying all these methods get badly distorted by dropout. Dropout is when a gene is actually expressed in a cell but not captured by the sequencing, which happens a lot in droplet-based single-cell methods. Modern 10x Chromium count matrices are typically 65 to 90 percent zeros. Some of those zeros are real zeros, meaning the gene is actually off in that cell. Some are dropout, meaning the gene is on but the mRNA was not captured. At the single-cell level, you cannot reliably tell these apart. Distance measures treat them the same, which means a cell with high dropout looks artificially "different" from its neighbors in ways that are purely technical noise and not biology.

The point I want to make about pseudotime, and I think this generalizes to the other methods, is that the output of the algorithm is not data. It is a model prediction, computed from actual data using a specific set of assumptions, and you have to know the assumptions to know whether the prediction means anything. Papers that plot pseudotime on the x-axis and simply label it "time" are making a quiet category error, and the field has mostly let this become normalized. Every time I see this figure I want to write in the margin: that is not time, that is the output of a graph traversal algorithm on an embedding that was itself the output of dimensionality reduction on a sparse count matrix with a lot of dropout. Calling it time does not make it time.

RNA velocity, or arrows that may be pointing the wrong way

RNA velocity was a genuinely clever idea when it appeared. The observation at its core is this: when mRNA gets transcribed from DNA, it initially contains intronic sequence, which then gets spliced out to produce the mature messenger RNA. If a gene is being actively upregulated, there is more intronic RNA in the cell than you would expect at steady state, because the transcription has outrun the splicing. If a gene is being downregulated, there is less intronic RNA than expected. So if you can separately measure spliced and unspliced reads for every gene in a single-cell dataset, you can estimate, for each gene in each cell, whether the gene is currently on its way up or on its way down. Aggregate that across all the genes in the cell and you get a vector: an arrow in gene expression space pointing in the direction the cell is transcriptionally moving. Project these arrows onto a UMAP and you have streamlines across your embedding, showing the flow of differentiation (La Manno et al., Nature, 2018).

This was published in Nature in 2018 and the community went wild. Here, finally, was a way to get direction out of a snapshot. You did not have to pick a root cell anymore. The cells themselves would tell you where they were going. scVelo extended the original method substantially by fitting gene-specific kinetic parameters rather than assuming a single steady-state model (Bergen et al., Nature Biotechnology, 2020), and the improvement on simple, well-behaved datasets was real.

Then people started using it on harder datasets, and things got uncomfortable. The Bergen group themselves published a critical review in 2021 documenting failure modes (Bergen et al., Molecular Systems Biology, 2021). There are a lot, and they are not edge cases. Transcriptional boosts, meaning coordinated step-changes in transcription rate during processes like erythroid maturation, produce negative velocity estimates exactly during the phase when expression is increasing. The model breaks at the thing it most wants to see. Multiple kinetic regimes in different subpopulations produce conflicting slopes in the phase-space plots that the algorithm relies on, and single-parameter models cannot untangle them. In droplet-based sequencing, fewer than 5 percent of reads actually map to introns, so the raw signal underlying velocity is extremely sparse and biased toward certain gene classes. And then at the end of all this, the high-dimensional velocity vectors get projected down onto a 2D UMAP for visualization, which introduces additional smoothing that can produce streamlines pointing in completely arbitrary directions.

Gorin, Svensson, and Pachter wrote a methods paper a couple of years later that was, I think, the harshest honest assessment of the field (Gorin et al., PLoS Computational Biology, 2022). They concluded that RNA velocity was "not yet reliable enough to be actionable" and documented fundamental incompatibilities between the underlying biophysics (a real stochastic process with known structure) and the signal processing framework the method applies to it, plus a strong dependence on arbitrary hyperparameter choices. You can get different streamlines on the same data by tuning the smoothing differently.

The field has not given up. veloVI wraps velocity inference in a variational autoencoder and quantifies uncertainty, so at least you know when the model does not believe its own predictions (Gayoso et al., Nature Methods, 2024). UniTVelo drops the ODE framework entirely and models spliced dynamics with radial basis functions, handling sparse unspliced signal more gracefully (Gao et al., Nature Communications, 2022). A recent benchmarking study evaluated 14 velocity methods across 17 datasets. No single method dominated. All of them failed on at least one dataset. The honest summary is that RNA velocity is a reasonable heuristic for simple well-separated bifurcations with strong transcriptional gradients, and in those settings it often produces streamlines that look roughly like the known biology. For complex tissues, slow processes, or anything with circular topology, the arrows are model outputs with substantial unquantified uncertainty, and taking them literally is a mistake.

Optimal transport, which at least uses real time

Optimal transport methods represent a genuine conceptual step up, because unlike pseudotime and velocity, they actually require you to collect samples at multiple real experimental time points. You are no longer pretending time emerges from the embedding. You measured at day 2 and day 4 and day 6, and the algorithm tries to work out the most likely flow of mass between the distributions you observed.

The canonical version of this is Waddington-OT (Schiebinger et al., Cell, 2019), which treats cell populations at each time point as probability distributions in gene expression space and computes the entropically regularized optimal transport coupling between consecutive distributions. The optimization finds the transport plan that minimizes total displacement cost, which is a mathematically principled way of saying "each cell at the later time point most likely came from the closest cell at the earlier time point." It also accounts for cells being born and dying between time points, using something called unbalanced optimal transport. The Schiebinger group applied this to iPSC reprogramming across 315,000 cells sampled at 39 time points with 12-hour intervals, and they found branching fate structures that pseudotime had missed completely. It is a real methodological advance.

There have been improvements. LineageOT integrates CRISPR lineage barcodes into the cost function so that you can use actual family relationships between cells, rather than just transcriptomic similarity, to infer the transport plan (Forrow and Schiebinger, Nature Communications, 2021). This matters a lot for cases where cells from different lineages end up with similar expression profiles, because a naive OT based on expression alone would happily pair them up as ancestor and descendant. moscot scales the entire framework to millions of cells and tens of time points using GPU-accelerated low-rank matrix approximations (Klein et al., Nature, 2025), which is what you need for modern atlas-scale experiments.

CellRank 2 takes a more pragmatic approach to the whole problem (Weiler et al., Nature Methods, 2024). Rather than committing to a single source of directional information, it composes multiple weak signals into a unified transition matrix. The framework is modular: you can plug in kernels based on RNA velocity, pseudotime, developmental potential scoring, or, most interestingly, a real-time kernel that combines within-time-point dynamics with OT couplings between experimental time points. Downstream, it uses Generalized Perron Cluster Cluster Analysis, a method borrowed from molecular dynamics, to identify macrostates and compute fate probabilities as absorption probabilities on a Markov chain. The results are often substantially better than any single-signal method. And the benchmarks are telling. In hematopoiesis datasets, the velocity-based kernel predicted erythroblast-to-monocyte transitions that simply do not occur in real hematopoiesis. The real-time and metabolic-labeling kernels got the hierarchy right. In gut organoid differentiation, the velocity-based analysis failed to identify any of the lineages. The metabolic labeling kernel identified all of them. This is not a subtle improvement. It is the difference between wrong biology and right biology.

OT-based methods are, I think, substantially more principled than pseudotime or velocity, and for the right experimental designs they give you something close to an honest picture of population dynamics. But they still do not track individual cells. The output is a population-level mapping, meaning you learn the flow of the distribution but you cannot follow any particular cell through time. And the experimental design requirements are demanding in ways that most labs cannot meet. The Schiebinger study had 39 time points. Most labs collect 3 to 6. Three to six time points is basically before-and-after with a few intermediate sanity checks, and it is not enough to resolve the kinds of dynamical features these methods were designed to recover.

A brief detour on foundation models

There has been a lot of excitement over the last couple of years about single-cell foundation models (scGPT, Geneformer, scFoundation) as tools for understanding cellular dynamics. I want to push back on this, because I think it is mostly misplaced.

The training data for these models contains zero temporal information. They are pretrained on static expression snapshots using masked gene prediction, which is a co-expression task. You mask out some genes, the model predicts their values from the other genes. There is no notion of before and after anywhere in the training objective. When you take one of these trained models and use it for "trajectory analysis," the temporal ordering is not coming from the model. It is coming from an external tool (usually pseudotime or velocity) that the foundation model had nothing to do with. The model contributes a learned embedding space. The dynamics come from elsewhere, with all the same problems we just discussed.

The models also do not obviously outperform simpler methods on the tasks they were actually designed for. Kedzierska and colleagues benchmarked scGPT, Geneformer, and scFoundation on cell type classification, which is arguably the task most aligned with their training objective (Kedzierska et al., Genome Biology, 2025). The foundation models were matched or beaten by logistic regression on PCA embeddings. Logistic regression. On PCA embeddings. A separate sparse autoencoder analysis found that of 48 tested transcription factors, only 3 produced regulatory-target-specific feature responses, which is a polite way of saying that the models had learned co-expression patterns but not regulatory relationships. This is not a failure of architecture. It is a consequence of the data. You cannot learn causation from observational snapshots, no matter how many parameters you have, because the causation is not in the snapshots. The Bitter Lesson applies to architectures. It does not apply to data that is missing the signal you need.

Do perturbations fix this?

The natural counterargument to everything I have said so far is that Perturb-seq fixes the problem. Observational data cannot recover causal structure, fine. But genome-scale CRISPR screens with single-cell readout are doing real interventions. You knock out a gene and measure what happens. That is the standard definition of an experimental manipulation, and it is the strongest evidence for causal claims. Dixit and colleagues showed the approach works at scale back in 2016 (Dixit et al., Cell, 2016), and Replogle and colleagues later demonstrated genome-wide Perturb-seq across millions of cells, functionally annotating tens of thousands of genes (Replogle et al., Cell, 2022). The experiments are real interventions. Interventions are what causal inference needs.

This argument is half right, and the other half is where I want to spend some time.

The half that is right is that perturbations genuinely do give you stronger causal footing than observation. If you delete a gene and something downstream changes, you have much better grounds for calling the first thing a cause of the second thing than you would have by just watching them co-vary. Perturb-seq has produced real insights into gene function, and it has mapped large portions of the regulatory network at a resolution that pure observation could not have achieved.

The half that is wrong is the assumption that a perturbation experiment, by itself, recovers the causal graph you actually want. Here is the problem. Standard Perturb-seq reads out a single endpoint snapshot after the perturbation has had time to propagate through the network. What you measure is not the direct effect of the perturbation. It is the sum of the direct effect plus all the indirect downstream effects plus all the feedback loops that have equilibrated in the time between the edit and the readout. If you knock out a transcription factor, its direct targets change. Their targets change. The targets of those targets change. By the time you sequence, the whole cascade has worked itself out, and the distribution of cell states you observe is the final result of all of it. Looking at that endpoint, you cannot cleanly separate the direct effects from the indirect ones. Multiple different causal graphs can produce the same endpoint distribution after equilibration, which is exactly the same identifiability problem we had with observational data, just dressed differently.

The people working on this know it. A recent review of the field concluded, with a kind of tired honesty, that there is currently no accepted method for inferring gene regulatory networks from perturbation response data, despite the field having spent years trying. The data the experiments produce fundamentally underdetermines the graph. The reason is the same reason that has been appearing throughout this whole post: you cannot recover dynamics from a snapshot, and a perturbation followed by an endpoint measurement is just a snapshot of a post-perturbation state.

The fix, when it comes, is going to look like combining perturbations with temporal resolution. There are methods that do this now, though they are still early. PerturbSci-Kinetics pairs pooled CRISPR screens with metabolic labeling, which lets you measure the nascent (newly transcribed) RNA alongside the steady-state population (Xu et al., Nature Biotechnology, 2024). Instead of an endpoint snapshot, you get a short-horizon dynamic response to each perturbation. Sci-FATE2 does something similar at substantially improved data quality (Cao et al., 2020). The Scribe framework is useful for thinking about this quantitatively: it compares causal network recovery from genuine time-series data against the same analysis applied to pseudotime-ordered data, and the difference is enormous. True temporal coupling gives you an AUC around 0.86 for causal recovery. Pseudotime gives you near-chance (Qiu et al., Cell Systems, 2020). Intervention alone does not close the gap. Intervention combined with real time does.

So Perturb-seq is not useless. I want to be clear about that. It is a powerful tool for mapping perturbation-response signatures and for generating hypotheses about gene function at scale. But a parts list with response signatures attached is not a mechanistic model of the circuit. For the questions I actually care about (what genes enforce setpoints, how the system returns to equilibrium when pushed, which control laws are active at which developmental moment), you need the time course. Not just the response. And the technology for doing that at genome scale in developing tissue does not quite exist yet.

What actual measurement of dynamics looks like

I have spent most of this post being negative, arguing that various tools do not do what people often claim they do. I want to spend some time on the positive side, because there are tools that do measure cellular dynamics honestly, and I think they point toward what the integrated measurement architecture of the next decade should look like. These tools are generally less famous than single-cell sequencing, partly because they are harder to use and partly because they do not produce the atlas-scale datasets that currently drive most of the hype. But they are measuring something real, which puts them in a different category.

Live-cell biosensors

The best current approach for high-throughput signaling dynamics is a class of reporter called a kinase translocation reporter, or KTR, introduced by Regot and colleagues in 2014 (Regot et al., Cell, 2014). The design is elegant, and it took me a while to appreciate how clever it is. You build a fusion protein consisting of a substrate peptide (which gets phosphorylated by the kinase you want to measure), a nuclear localization signal, and a nuclear export signal. In the unphosphorylated state, the nuclear import dominates, so the reporter sits in the nucleus. When the kinase is active and phosphorylates the substrate, the phosphorylation tips the import-export balance toward the export signal, and the reporter relocates to the cytoplasm. You measure the cytoplasm-to-nucleus ratio of fluorescence by imaging single cells over time. The ratio is a direct, dynamic readout of kinase activity, with the advantage that it does not require bleaching or fixation and can run for hours or days.

KTRs come in multiple colors, which means you can multiplex. Three, four, even five simultaneously in the same cell. ERK, JNK, p38, PKA, and Akt reporters all exist. Regot demonstrated three-kinase readout in single cells, and subsequent work has combined KTRs with calcium reporters like GCaMP8s to get simultaneous kinase and calcium signaling in the same cell. The dynamic range is typically 3 to 10-fold change in cytoplasm-to-nucleus ratio, which is substantially better than most FRET sensors, and the imaging requirements are standard widefield microscopy with conventional segmentation. A reasonably equipped lab with Cellpose and a widefield scope can do four-channel signaling dynamics in a thousand cells simultaneously, which is remarkable when you think about what it would have taken even a decade ago.

For mRNA-level dynamics, the MS2/PP7 coat protein system gives you genuinely ground-truth measurements at the cost of throughput. The idea is to insert a cassette of RNA stem-loops (24 copies is typical) into your gene of interest, and then co-express the coat protein of a bacteriophage that binds specifically to those stem-loops, fused to GFP. Each mRNA containing the cassette becomes decorated with GFP and visible as a diffraction-limited spot. An actively transcribing gene locus looks like a bright nuclear punctum that turns on and off as individual transcription events happen. You can measure transcriptional bursting directly, with sub-second temporal resolution (Bertrand et al., Molecular Cell, 1998; Larson et al., Science, 2011). If you put MS2 loops at the 5' end and PP7 loops at the 3' end of the same gene, you can measure RNA polymerase elongation rate from the time delay between the two signals, which works out to around 1.5 kilobases per minute in Drosophila embryos (Garcia et al., Current Biology, 2013). This is measurement in the strict sense of the word. You are watching the molecule you are claiming to watch, and the measurement corresponds to the thing you are claiming to measure.

Lineage recording

Molecular recording systems use CRISPR-induced mutations as heritable barcodes that accumulate during cell division, encoding lineage history in the cell's own genome for later readout. The early systems, GESTALT (McKenna et al., Science, 2016) and CARLIN (Bowling et al., Cell, 2020), established that you could reconstruct cell division histories from the pattern of indels that accumulate in an engineered target array. These have real limitations though. The target sites get fully edited early in the experiment, creating a ceiling on how long you can record. And identical indels can arise independently in unrelated cells, producing false signatures of shared ancestry. The signal-to-noise ratio gets worse as you go deeper into the lineage.

The current generation of recorders has addressed most of this. DNA Typewriter, which I think is the most architecturally interesting of the recent systems, uses prime editing instead of double-strand breaks to write sequential insertions into an engineered recording tape (Choi et al., Nature, 2022). Because the insertions are unidirectional and each one targets the next unused position on the tape, temporal order gets encoded by physical position on the DNA. You can read off the history in order by sequencing the tape. In the original paper, they reconstructed monophyletic lineages of 3,257 cells across more than 20 generations and 25 days of culture. Newer base-editing systems push this substantially further, using Cas12a-mediated adenine base editing across dozens of synthetic target sites to store thousands of bits of information per cell, reaching 40+ cell divisions deep without saturating. Because there are no double-strand breaks, there are no inter-site deletions, which eliminates one of the major failure modes of the earlier systems.

The most significant recent development is that some of these recorders now work together with spatial readout. PEtracer couples prime editing lineage marks with MERFISH spatial transcriptomics, and they have used it to reconstruct three-dimensional tumor growth in vivo across hundreds of thousands of cells (Koblan et al., Science, 2025). The combination of lineage history, transcriptional state, and spatial position, measured in the same cells, is now technically possible for the first time. This is the kind of measurement that could actually answer questions about how tissues self-organize, because it gives you the three pieces you need (what the cell was doing, where it came from, where it is now) in a single integrated experiment.

Label-free approaches

Every fluorescent label perturbs the cell to some degree. Fluorescent proteins fold slowly, oligomerize, and often alter the behavior of their fusion partners. Organic dyes generate reactive oxygen species during imaging. Genetic modification changes regulation in ways that are hard to predict. For long-term dynamics, where phototoxicity and photobleaching accumulate over hours of imaging, label-free techniques start to look genuinely attractive rather than merely theoretically attractive.

Stimulated Raman scattering (SRS) microscopy is the most developed of these, and it is one of the most underappreciated techniques in the field (Freudiger et al., Science, 2008). SRS probes molecular bonds directly via inelastic photon scattering, which means you get chemical specificity without any labels at all. Two synchronized pulsed laser beams, with a frequency difference matched to a specific Raman-active vibration in the molecules you care about, coherently stimulate the transition. The signal amplification over spontaneous Raman scattering is around 10⁸, which gets you to video-rate imaging. The killer application for dynamics is something called the cell-silent spectral window. If you feed cells deuterated metabolites (deuterated glucose, amino acids, fatty acids), the C-D bond produces a strong Raman peak around 2100 to 2200 cm^-1, which is a region where endogenous cellular molecules have essentially no peaks. So you can do pulse-chase experiments with deuterium-labeled precursors, imaging the incorporation and turnover of specific metabolites in live cells over time, without any genetic modification and without perturbing the cells with fluorescent labels. The MARS dye platform extends this to 24-channel simultaneous imaging using engineered probes with spectral linewidths about 100-fold sharper than typical fluorescence spectra (Wei et al., Nature, 2017).

SRS is not plug-and-play. You need a pulsed laser system with optical parametric oscillators, which runs about $200K to $500K for the optical setup alone, and you need someone in the lab who actually understands nonlinear optics. It has stayed confined to a handful of specialist labs for this reason. But for measuring what cells are doing metabolically, in real time, without touching their genome, it is one of the most principled techniques available, and I expect the cost-and-complexity barrier to slowly come down over the next decade.

Where the gap actually sits

I want to close by being as clear as I can about what I think is wrong and what is right in the current landscape. The computational approaches I spent most of this post criticizing (pseudotime, velocity, optimal transport) are not wrong to exist. They extract real signal from data that is available today, and for tasks like cell type classification, tissue atlasing, and hypothesis generation, they have been transformative. My criticism is specific. The problem is when these methods get used to make kinetic or mechanistic claims that the underlying data cannot support. Claims about transition rates, causal ordering, the direction of fate decisions in individual cells. These are all outside the reach of snapshot data, and no algorithm run on snapshots will bring them within reach.

The correct posture, which parts of the field are starting to adopt, is to use the ground-truth technologies (metabolic labeling, live biosensors, lineage tracing, spatial multi-omics) to validate the claims that computational methods make from sequencing data. The CellRank 2 benchmarks I mentioned earlier are the clearest example. A velocity-based analysis of hematopoiesis predicted cell transitions that do not occur. A real-time or metabolic-labeling analysis of the same data got it right. The difference was not a small correction. It was a reversal. If you take velocity-based predictions seriously without ground-truth validation, you can easily end up with a story about blood cell development that is confidently wrong in directions you would never have suspected from the analysis alone.

The most reliable experiments I have seen combine at least two orthogonal approaches. Velocity validated by metabolic labeling. Pseudotime anchored by real time points. Spatial inference constrained by lineage barcodes. Single-modality conclusions should attract more skepticism than they currently do, in proportion to how much of the claim was inferred versus directly observed. What the field is actually moving toward, with some momentum now, is the integration of these modalities in the same cells at the same time. Lineage recording with spatial transcriptomics. Metabolic labeling with single-cell sequencing. Live biosensors with computational frameworks that account for real experimental time. This is the work. The distance between inference and measurement, for the first time, is actually closing. It is not closed yet. But the trajectory is now pointing in the right direction, which is more than I could have said five years ago.