The Recorder Zoo

February 2026 · molecular recording

Every cell you have ever had carried a story it could not tell you. It divided, it migrated, it received a signal, it committed to a fate, and every bit of that history evaporated the moment you lysed it for sequencing. The previous post was about how the field has been handling this computationally, with pseudotime and RNA velocity and optimal transport, applying increasingly elaborate math to snapshots and hoping dynamics would emerge from the statistics. This post is about the other response, which is different in kind. Stop inferring. Write the history down. Build a molecular recorder that converts transient biological events into permanent, readable marks inside the cell itself, and then read them out later alongside the cell's molecular identity.

The field has been trying to do this for about a decade now, and what it has produced is a genuinely bizarre menagerie. Thirteen or so distinct recorder classes spanning DNA, RNA, and protein as substrates. Systems range from elegant bacterial designs that will probably never work in a mouse to protein structures that grow like tree rings inside living neurons. Some of these will be remembered as the CRISPR of the 2030s. Some are beautiful ideas trapped permanently in a test tube because mammalian cells have different problems than E. coli. Most are somewhere in between, with some fraction of what they need already working and some fraction still broken in ways nobody has solved yet.

Once you look at all of them together, a natural taxonomy emerges. There are three fundamental classes, distinguished by what substrate is actually storing the information. There are systems that write into DNA, where you end up reading out the recording by sequencing. There are systems that assemble protein structures inside the cell, where you read the recording by microscopy. And there are systems that sequester RNA, preserving a transcriptional snapshot by protecting existing molecules from degradation rather than writing anything new. Within each class there are sub-strategies, and I think understanding why each sub-strategy exists is the point of this tour.

Hover any node for details. The root splits into three classes. Each class's sub-branches hang directly below it. Every DNA sub-branch edits the genome. The thing that varies between them is how precisely the writing instrument works.

Class 1: DNA Writers

Every system in this class operates on the same core logic. Take a transient biological signal, convert it into a permanent change in the cell's DNA, let it sit there while the cell goes about its business, and read it back out later by sequencing. The genome is the tape. The cell does the recording on its own, without you having to watch it. What varies from system to system is the precision of the writing instrument, which runs the full range from "we are essentially just scratching random scars into a barcode and hoping the pattern is informative" to "we are writing specific symbols at specific positions in a specific order like a typewriter." I want to walk through the sub-strategies in roughly the order they were invented, because the story of each new generation is basically a response to what the previous generation got wrong.

Indel writers: let CRISPR break things randomly

The first insight here was almost embarrassingly simple. CRISPR already generates random mutations every time Cas9 cuts DNA, because non-homologous end joining (NHEJ) is sloppy and tends to leave small insertions or deletions (indels) at the cut site. If you put a synthetic barcode array into a cell's genome, with many Cas9 target sites in a row, and let Cas9 cut it repeatedly over the course of cell divisions, different cells accumulate different patterns of scars. Daughter cells inherit their parent's scars plus any new ones they pick up. Read the pattern at the end, reconstruct who descended from whom. You get a tree.

GESTALT (McKenna et al., Science, 2016) was the first serious demonstration of this. The authors put an array of 10 Cas9 target sites into zebrafish embryos, let development run, and at the end sequenced the scar patterns in tens of thousands of adult cells. They recovered roughly 4,200 unique alleles per fish, which was enough to reconstruct organ-level lineage relationships. The finding that really landed was that more than 98 percent of adult zebrafish blood derived from just 5 progenitor clones, which is the kind of quantitative lineage claim that you simply cannot make without recording data. CARLIN (Bowling et al., Cell, 2020) took the idea and made it inducible and breedable in mice, generating up to 44,000 transcribed barcodes, which brought the whole approach into mammalian genetics.

The ceiling on this whole class of systems is fundamental, and it is worth being specific about what it is. Cas9 makes double-strand breaks, and NHEJ strongly favors deletions over insertions when repairing them. This means many recording events are deletions that span multiple target sites, collapsing several independent recording channels into a single unreadable scar. It gets worse over time. Once a site has been deleted, it is gone, which means the recorder progressively destroys its own tape as it records. A formal analysis puts the effective recording capacity of indel-based systems at around 10 to 20 bits per cell. That is not nothing, and for lineage reconstruction it is often enough to get the tree right. But if you want to record biological state over long developmental windows, 10 to 20 bits stops being enough very quickly. These systems proved the concept, meaning the concept that a cell can write its own history into its own DNA. The concept being proved does not mean the implementation is the one you want to run with.

Base editors: stop breaking things

The obvious response to "double-strand breaks are wrecking our recording tape" is to not make double-strand breaks. Base editors deliver this. A cytidine base editor (CBE) is a fusion of a Cas9 nickase with a cytidine deaminase enzyme that converts C to T at a specific targeted position, without cutting the DNA all the way through. Adenine base editors (ABEs) do the same thing for A to G using an engineered version of an enzyme called TadA. The result is a single, precise, predictable nucleotide conversion. No double-strand break. No deletion bias. No tape destruction.

CAMERA (Tang and Liu, Science, 2018) was the first explicit demonstration that base editing could function as a biological recorder, responding to signals like doxycycline, IPTG, and Wnt pathway activity. DOMINO (Farzadfard et al., Molecular Cell, 2019) pushed the concept further in a way I find genuinely clever. They designed guide RNAs that only bind their target after a prior mutation has occurred at a specific upstream site, so one editing event unlocks the next. This lets you implement AND gates and OR gates and temporal sequence detection, all in DNA, inside a living cell. You can ask questions like "did event A occur before event B" and the cell will tell you by the pattern of edits it accumulates.

The state of the art as of early 2026 is BASELINE (Winter et al., bioRxiv, 2025), which uses a nuclease-dead Cas12a fused to ABE8e, targeting 50 synthetic sites integrated at multiple copies per cell via piggyBac transposon. The information capacity is more than 4,300 bits, which is a roughly 50-fold increase over anything that came before, with lineage reconstruction spanning an estimated 40 cell divisions. Forty divisions is within the range of mammalian development from zygote to adult, which is exactly the window you want to be recording in if you care about how tissues actually build themselves.

Symbol writers: deterministic tapes

Indel and base-editing approaches both write stochastic information. You make a cut or a conversion, and you see what you get. The output is randomized by the biology. Prime editing changes this. Prime editing is a technology developed in the Liu lab that can write specific, arbitrary sequences to specific locations without making double-strand breaks, using a reverse transcriptase fused to a Cas9 nickase and a template-encoding guide RNA. What this means for recording is that you can write not just random scars but actual programmable symbols.

DNA Typewriter (Choi et al., Nature, 2022, Shendure lab) built the logical structure. The design is a tandem array of partial CRISPR target sites, arranged in order. At any moment, only one site is active, meaning only one site has a complete and functional spacer sequence. When prime editing happens at the active site, the insertion it writes includes a small key that completes the spacer of the next site downstream, deactivating the current site and activating the next one. Recording proceeds strictly left-to-right along the array. Temporal order is encoded by physical position on the tape. You can read it out in order. To demonstrate the system, the Shendure group encoded the message "WHAT HATH GOD WROUGHT?" into the genomes of living cells, which is the message Samuel Morse sent on the first telegraph transmission in 1844, and which is an understated flex.

ENGRAM (Chen et al., Nature, 2024, Shendure lab) extended the architecture to record biological signals rather than arbitrary text. Instead of a controlled insertion sequence, cis-regulatory element-responsive promoters drive the expression of pegRNAs (the guide RNAs that prime editing needs), each encoding a 4-basepair signal-specific barcode. When a given transcription factor is active, its responsive promoter produces its pegRNA, which gets written into the tape. Applied during mouse embryonic stem cell differentiation into gastruloids, ENGRAM captured the activities of around 100 transcription factor consensus motifs across daily recording windows. PEtracer (Koblan et al., Science, 2025) solved the spatial readout problem for this class of system, coupling prime editing lineage marks with MERFISH spatial transcriptomics, and mapping recording plus transcriptional state plus spatial position in more than 260,000 cells simultaneously. That combination (history, identity, and location measured in the same cell) is the kind of data that would genuinely let you reconstruct how a tissue built itself.

Recombinases: state machines

All the systems I have described so far work by accumulating edits to the DNA. Recombinases take a different approach. They are enzymes that catalyze permanent rearrangements at specific recognition sequences: they flip, excise, or invert whole DNA segments. Instead of accumulating mutations, a recombinase-based recorder advances through a set of discrete states. Each recombination event is a state transition. The cell is, in a literal sense, a little state machine, and the current state is whatever configuration the DNA has ended up in.

Brainbow (Livet et al., Nature, 2007) is the canonical example of this, and it is mostly famous as a neuroscience tool rather than a recorder. It uses stochastic Cre-mediated inversions among an array of fluorescent protein genes to give individual neurons distinguishable color combinations, roughly 100 distinguishable colors in practice. It is beautiful, it is qualitative, it is used extensively for clonal analysis in the brain, and it has fundamentally low information content. Polylox (Pei et al., Nature, 2017) scales the principle substantially, using 10 loxP sites in alternating orientations at the Rosa26 locus to generate up to 1.87 million theoretical barcodes. intMEMOIR (Chow et al., Science, 2021) uses the Bxb1 serine integrase instead of Cre, and it is specifically designed for sequential single-molecule FISH imaging in fixed tissue. The appeal there is that the readout preserves spatial context, because you never have to lyse the cells to sequence them. The information ceiling of recombinase-based systems is around 15 to 21 bits per cell, which puts them in the same rough range as indel writers but with a better spatial readout story.

Spacer acquisition and transposons: the bacterial branch

Running in parallel to all of this is a whole evolutionary lineage of recording systems that went into bacteria and largely never came out. The core idea is to exploit the fact that bacterial CRISPR adaptive immune systems already naturally record molecular encounters, by integrating new spacer sequences into CRISPR arrays in the order they were acquired. Temporal order encoded in physical position. Exactly the same logic as DNA Typewriter, except evolved by bacteria about two billion years earlier. Record-seq (Schmidt and Platt, Nature, 2018) hijacks this by introducing an RT-Cas1/Cas2 complex that captures intracellular RNA as new CRISPR spacers. The in vivo follow-up (Schmidt et al., Science, 2022) is honestly the most successful deployment of any molecular recorder in a complex biological environment I know of. Engineered E. coli sentinel cells were fed to mice, traversed the mouse gastrointestinal tract recording transcriptional histories from their surroundings, and were then recovered from fecal samples. The whole recording could be read out after the fact. TIGER (Jiao et al., Nature Biotechnology, 2023) records specified transcripts quantitatively, which is the only system with that particular capability. It is elegant. It is beautifully demonstrated. It is, so far, stuck in bacteria.

As of early 2026, no validated mammalian spacer acquisition system exists. The mammalian chromatin environment, the nuclear compartmentalization, the different sequence requirements for the relevant bacterial enzymes, all of these have resisted years of effort. Getting this to work in mammalian cells would be a real breakthrough. It also might never work.

Class 2: Protein Assembly

The two most recent additions to the recorder zoo are, by some distance, the most alien to everything that came before them. They abandon DNA as the substrate entirely. They abandon sequencing as the readout. They build something closer to dendrochronology than to genomics. The idea is that time gets encoded into the physical structure of a growing protein inside the cell, and you read the recording by confocal microscopy, looking at where in the protein different tags ended up.

The first of these is GEMINI (Yan et al., Nature, 2026, from the Dingchang Lin lab at Johns Hopkins). The acronym expands to Granularly Expanding Memory for Intracellular Narrative Integration, which is a mouthful, but the underlying design is elegant. The researchers computationally designed protein subunits that self-assemble into hierarchical structures: first into cages, then into isotropic spherical particles that grow concentrically over time, like nested shells. Three subunit types serve distinct functions. There are blank subunits that drive steady predictable growth and give you a reliable expanding substrate. There are reporter subunits that transduce specific cellular events (in the demonstration, NF-kB activity) into fluorescent signals that get incorporated when the event happens. And there are timestamp subunits that create temporal landmarks so you can anchor events in time. Radial distance from the center of the sphere corresponds to elapsed time, so the older stuff is in the middle and the newer stuff is near the surface. They resolved NF-kB transcriptional dynamics at 15-minute resolution. They demonstrated it in cultured mammalian cells, in xenograft tumors, and in mouse brain neurons in vivo. The animals maintained normal motor and cognitive behavior, meaning the proteins did not obviously perturb the biology they were recording. No sequencing was involved at any point. You just image the cells at the end.

The second system is CytoTape (Linghu et al., Nature, 2026, from the Linghu lab at the University of Michigan), which uses a complementary geometry. Instead of growing spheres, CytoTape grows linear fibers. An engineered protein polymerizes into a long fiber inside the cell, and activity-dependent promoters drive the incorporation of epitope tags into specific positions along the fiber as it elongates. Position along the fiber encodes time. Tag identity encodes which transcription factor was active at that time. They recorded five simultaneous transcription factor activities, including pCREB, Fos, Arc, Egr1, and NPAS4, as distinct epitope bands along single fibers, readable by immunofluorescence. Continuous recording for up to three weeks at minutes-scale resolution. Demonstrated in more than 14,000 neurons in vivo. The key biological finding from this work is that CREB activation and FOS expression can temporally decouple under MEK-ERK pathway inhibition, which is a real biological result that would simply be invisible to any snapshot-based approach because the decoupling is a temporal feature.

GEMINI and CytoTape answer complementary questions. GEMINI records when things happened. CytoTape records which transcription factors were active. Neither requires sequencing. Both preserve spatial context by virtue of being read in situ. Both work in living animals. If you had asked me five years ago whether the next generation of recorders would be protein-based rather than DNA-based, I would have given that about a 15 percent probability, because all the infrastructure of molecular biology is oriented around DNA. I would have been wrong.

Class 3: RNA Sequestration

The third class is the smallest and the most conceptually distinct. Rather than writing new information into DNA or building a growing protein, these systems preserve existing mRNA at a defined moment by protecting the transcripts from degradation. The information was already there in the cell's transcriptome. The challenge is keeping it long enough to sequence later. This sounds simple, and in a way it is, but it requires solving the problem of how to take a snapshot of a transcriptome and keep it around while the cell carries on doing whatever it was going to do next.

The two sub-strategies here differ in where the protected RNA goes. Internal sequestration uses compartments that already exist inside the cell. External sequestration packages the RNA into engineered containers that can be harvested from outside the cell without destroying it. Both are conceptually interesting, because they preserve rather than write, but they have different practical profiles.

TimeVault (Friedman et al., Nature Methods, 2025, Church lab) is the most developed internal-sequestration system. It uses vault particles, which are hollow ribonucleoprotein compartments that already exist in most eukaryotic cells and whose natural function is still somewhat mysterious. At a user-defined time point, transcription is chemically arrested and mRNA is loaded into vaults through an engineered interaction. A second loading event at a later time point uses a different channel, so you can get two snapshots from the same cells. Both are read simultaneously by sequencing at the end of the experiment. The clever thing about this is that you get paired time points from the same cells without ever having to lyse them in between. Current limitations: it is bulk RNA-seq only, meaning no single-cell resolution yet. The recording windows are limited to roughly one week. And no in vivo demonstration has been published. But the conceptual move is genuinely different from everything else in this essay. Everything else writes new information. TimeVault just protects existing information from being degraded.

COURIER (Horns et al., Cell, 2023) and related virus-like particle (VLP) approaches are the external-sequestration version of this idea. Engineered virus-like particles package and export cellular RNA into the surrounding medium without killing the cell. The cell survives, and you collect the RNA from secreted VLPs by just sampling the media. You can do this repeatedly from the same living culture over the course of an experiment, which gives you longitudinal transcriptome sampling without the destructive readout that normal scRNA-seq requires.

Live-seq (Chen et al., Nature, 2022, from the Deplancke and Vorholt labs) sits adjacent to this class and deserves its own mention because the approach is so strange. Rather than getting the cell to package its own RNA for export, Live-seq uses a hollow atomic force microscope cantilever with a 400 to 600 nanometer aperture at its tip. The cantilever physically penetrates the cell membrane and aspirates roughly 1 picoliter of cytoplasm under negative pressure. That is about 1 picogram of RNA, which is enough to sequence with modern protocols. The cell survives the procedure at 85 to 89 percent viability, which is remarkable given that you literally stuck a microscopic needle into it. In the demonstration, the Deplancke group found that basal Nfkbia expression was the strongest negative predictor of which macrophages would respond to LPS, a finding that is fundamentally invisible to standard single-cell sequencing because the cell has to be destroyed before you can observe its response to anything. The downside, which is severe, is that throughput is about one cell per hour, because operating an AFM cantilever is slow and involves a skilled human being at a microscope. Somebody needs to parallelize the cantilevers.

The gap

The point of walking through all of these is partly that the systems themselves are genuinely interesting, and partly to establish, fairly concretely, how much of the problem remains unsolved. The gap between what we can record today and what we would need to record to actually study development as a dynamical process is not small, and I want to be specific about it.

full trajectory needed

~10⁶–10⁸ bits

BASELINE · DNA Writer

4,300 bits

TimeVault · RNA Sequest.

~5k bits*

GEMINI / CytoTape · Protein

~600 bits†

GESTALT · Gen 1 indel

~15 bits

* per snapshot window, not accumulated longitudinally · † estimated from 5 TF channels × ~20 time bins × ~6 bits/bin

Consider the numbers. A full mammalian developmental trajectory runs roughly 40 cell divisions across around 20 days. A single-cell transcriptome snapshot contains on the order of 1,000 to 5,000 bits of meaningful information (the exact number depends on what you count as meaningful, but it is somewhere in that range). If you wanted to record transcriptome snapshots hourly across those 20 days, you would need on the order of 960,000 bits per cell just for the transcriptome component. Add signaling pathway states, chromatin dynamics, and mechanical history, and the requirement climbs toward 10⁶ to 10⁸ bits per cell per trajectory. The best DNA-based recorder available today stores 4,300 bits. The gap between what we have and what we need is three to five orders of magnitude.

Temporal resolution is the deepest bottleneck, and it may be more fundamental than raw information capacity. Most DNA-based recorders operate at cell-division timescales, because both the editing events and the dilution of signals that you might record depend on DNA replication. Signaling dynamics, on the other hand, occur on seconds-to-minutes timescales, which is roughly a thousand times faster. GEMINI and CytoTape partially bridge this gap, running at 15-minute and minutes-scale resolution respectively, but even those systems are recording transcriptional output, meaning the downstream consequences of signaling, rather than the signaling itself. The sub-minute signaling regime, which is where a lot of interesting biology lives, remains entirely unrecorded by any mammalian system. And mechanical signal recorders, meaning systems that would capture force or compression or stretch history, do not exist as published technologies at all. That is a gap I would personally like to close, which is why I am building one.

What actually works right now, honestly, is a combination across all three classes. No single technology captures everything. The systems that work best in vivo are the ones with the lowest information capacity, because information capacity and biological robustness trade off in the current generation of designs. The systems with the highest information capacity have not yet been demonstrated in living animals. The most interesting experiments of the next few years will probably be the ones that co-deploy two or three simultaneously, reading them out together through spatial multi-omics platforms, and start closing the gap between what the cells already know about themselves and what we can learn from them. That integration is where I think the next real breakthroughs are going to come from. The individual recorders have each already done the hard part, which is proving the concept works. Making them work together is the part that is still open.