From Text Messages to Literary Memoir: Building the Narrative Machine
The Logistics Gap ended with an uncomfortable conclusion: fine-tuning two instances of Qwen3-8B (one per person) on 46,000 text messages between two people produced models that could schedule dinner but couldn’t sustain a conversation. The training data was shallow (coordination scaffolding, not communication). “ok sounds good” learned perfectly. The person who said it was missing.
The obvious next question: if statistical pattern matching extracts logistics from messaging data, can anything extract the person?
The narrative pipeline is a different answer to the same underlying archive. The fine-tuning experiment drew from a two-person subset; the narrative pipeline works from a broader corpus of approximately 286,000 messages across multiple contacts. Radically different method. Instead of training a model to reproduce the texting patterns, the pipeline uses Claude Opus as a literary engine, fed structured personality references and raw message archives, tasked with inferring the interior life that the text messages only circumstantially document. The fine-tuned model learned to say “ok sounds good.” The narrative pipeline attempts to infer why someone needs to say “ok sounds good” at that particular moment, in that particular relationship, at that stage of their life.
The output: approximately 189,000 words of first-person literary memoir across 128 chapters spanning two relationship arcs, every quoted message traceable to the source archive via hash-based citation, and a citation system that makes transparent exactly how much is evidence and how much is inference.
Two Pipelines, One Corpus
The voice-clone project has two independent systems working from the same underlying messaging archive:
The fine-tuning pipeline (covered in The Logistics Gap) takes a two-person subset of approximately 46,000 raw messages, formats them as conversation pairs, and trains a QLoRA adapter on Qwen3-8B. The output is a model that reproduces conversational patterns: message length, vocabulary, punctuation habits, topic distribution. It learns the shape of how someone texts.
The narrative pipeline (this post’s subject) works from the broader archive of approximately 286,000 messages across multiple contacts, extracts a quantitative style profile and qualitative personality reference, builds a chronological emotional phase map, and then deploys Opus as a literary generator, producing first-person memoir. It attempts to infer the person behind the texting patterns.
The distinction matters because the two pipelines fail in complementary ways. The fine-tuned model is faithful to the surface but hollow underneath; it reproduces “ok sounds good” without understanding why the phrase carries emotional weight in context. The narrative pipeline achieves psychological depth but at the cost of authorial entanglement: the depth is substantially Opus’s inference, not the subject’s testimony. Neither pipeline solves the problem Post 025 identified. They illuminate different aspects of it.
The Texting Gap
Before describing the architecture, the fundamental data limitation needs to be named plainly: the richest messaging periods in any relationship capture the absence of the relationship, not the relationship itself.
Text volume inversely correlates with physical proximity. When two people are geographically separated (one working a summer away, the other across town during the workday), messaging volume spikes. When they’re together (cohabiting, spending evenings on the same couch), it drops. The messaging archive is densest precisely when the relationship is most attenuated, and sparsest when it’s most alive.
This means the pipeline’s source data is systematically biased toward logistical coordination (“what time are you coming over”) and away from the substantive interactions that define the relationship (the conversation that happens after “I’m at the door”). The same structural limitation Post 025 identified in fine-tuning, that text messages are coordination, not communication, applies equally to the narrative pipeline’s source material.
The pipeline must work with this limitation rather than pretending it doesn’t exist. The approach: use the messages as evidence (timestamps as emotional seismography, message volume as proximity proxy, specific phrases as behavioral anchors) while acknowledging that the psychological depth between those anchors is primarily inference.
Kumar and Epley’s 2021 finding in the Journal of Experimental Psychology: General, that voice-based interactions create significantly stronger social bonds than text-based ones, suggests a related dynamic here. The richest interpersonal content travels through channels that generate no training data. The narrative pipeline doesn’t solve this. It builds a literary interpretation of the data that does exist, with citations that make the interpretation’s evidential basis visible.
The Architecture: Six Steps
The pipeline transforms raw message archives into assembled literary narratives through six discrete stages (Step 0 through Step 5), each producing artifacts through files that feed the next.
Step 0: Discovery and Preparation
Before any writing can happen, the pipeline needs reference materials that all subsequent steps depend on.
Quantitative survey. Python scripts against the JSONL archives extract message counts, monthly volume distributions, date ranges, key milestone messages. Volume curves reveal emotional phases: a spike from hundreds of messages per day to near-silence marks a transition that no individual message captures.
Emotional phases. The volume curves get mapped to a chronological emotional arc: early contact, escalation, plateau, dissolution, aftermath. Each phase has date ranges, volume statistics, key events with unique identifiers, and emotional state descriptions for each participant. This is the narrative’s skeleton.
Voice profiles. Three artifacts per person:
-
A style profile: quantitative metrics extracted from the messages. Vocabulary frequency, punctuation patterns, emoji usage rates, filler words, sentence starters, message length distribution. These are the facts about how someone texts.
-
A personality reference: qualitative character notes derived from the style profile. Not rules for prose style. Character notes. An adapted example: “This person processes emotions indirectly. Their exclamation rate is extremely low, suggesting they rarely express strong emotions through emphasis. Instead, they ask questions frequently, approaching situations through inquiry rather than declaration.” (Names anonymized from the original.) The personality reference describes who the person is, not how the narrative should sound.
-
Sample messages: thirty substantive messages selected across the full time range, scored for emotional and reflective content. These give the writing agent concrete examples of the person’s voice under different conditions.
Canonical facts. A markdown document listing every verifiable fact the narrative must agree on: dates, names, message counts, key events, unique identifiers for pivotal messages. This is the single source of truth that prevents the narrative from drifting away from the documentary record. When the story says the first message arrived on a Sunday but the archive says Thursday, the canonical facts document catches it.
Step 1: Outline Generation
A single outline agent per arc (two total, one per relationship) receives the emotional phases, canonical facts, and voice profiles. Each produces a JSON outline structured chapter by chapter: time period, word target, emotional arc, key scenes with source message identifiers, and research pointers (which JSONL file, what date range, which search terms) for the next step.
The outlines follow a strict schema. Every chapter specifies its research_sources: the file, date range, and keywords that the research agent will use to mine the archive. This schema enforcement is what makes the pipeline reproducible rather than artisanal.
Step 2: Research Briefs
Four agents (two per arc, divided into first half and second half) mine the raw message archives for each chapter. Each agent receives its section of the outline, a Python search snippet for filtering the JSONL by date range and keywords, and the JSONL file path. The agent runs Python against the archive and produces a research brief: a markdown document with key evidence (timestamped quotes with UIDs), emotional arc description, relationship context, and notes for the writing agent.
The agents never read the raw JSONL directly. Always Python filtering. A 267MB archive read into a language model’s context window would be both impractical and wasteful. The Python snippet filters by date range, contact name, and keyword, returning only the messages relevant to that specific chapter.
Step 3: Chapter Writing
This is where Opus does the literary work. Writing agents receive the outline, their chapter’s research brief, the voice profile, canonical facts, and (critically) the previous chapter for continuity. Within each half, chapters are written sequentially; continuity requires it. Across halves and across arcs, everything runs in parallel.
The voice guidelines are specific: the first person stories use the personality reference as character notes, not prose constraints. The narrative voice should be “literary memoir, not a simulation of texting patterns.” The person being narrated says “lol” and “ok sounds good.” The narrative about that person uses complete sentences, analytical framing, and literary construction. This is by design; the pipeline isn’t trying to clone the texting voice. It’s trying to write a memoir about the person who texts that way.
Every quoted message gets a footnote citation: {FN:uid} where the UID is a SHA-256 hash of timestamp|sender|contact|text[:100], truncated to 16 hex characters. This is the pipeline’s integrity mechanism. Every quote is traceable. Every claim that “she said X on July 19th” can be verified against the archive.
Step 4: Assembly and Validation
An assembly script stitches chapters into the complete narrative, adds a preface documenting the data sources and methodology, extracts all {FN:uid} references, and validates each UID against the unified message index. Any citation that doesn’t point to a real message in the archive is flagged.
The preface is honest about what the narrative is:
This story was reconstructed from [N] text messages spanning [date range]. All quoted messages are real. Internal states and feelings have been inferred from the evidence of the messages themselves: what was said, to whom, and when. No events, people, or conversations have been invented.
Step 5: Quality Assessment
Final metrics: word counts, character mention frequency, source reference density per chapter, citation validation against the canonical facts document. Versions prepared for printing strip the {FN:uid} citations for readability.
The Engineering Challenge: Context Management
The output (101,601 words across 72 chapters in one arc, 87,110 words across 56 chapters in the other) is far too large for any single context window. The entire pipeline is designed around handoff through files rather than conversational continuity.
Why files. An approach built on conversation (“here’s the outline, now write chapter 1, great, now write chapter 2”) fails in three ways. Context explosion: by chapter 10, the accumulated context would include the outline, nine completed chapters, nine research briefs, and all the corrections and instructions along the way. Nondeterminism: a regenerated chapter in the middle of a conversation changes the context for every subsequent chapter. No failure recovery: if the session crashes at chapter 8, you lose everything.
The approach using files: outline JSON, brief markdown, chapter files, assembled story. Each artifact lives on disk. Each agent reads its inputs from known paths and writes its outputs to known paths. If an agent fails, you re-run it with the same inputs. The file system is the coordination mechanism, not the conversation thread.
Granularity by halves. Each arc’s narrative is divided into halves. Within each half, chapters are written sequentially (continuity requires it). Across halves and across arcs, everything runs in parallel. For a project with one story per arc with two arcs, this means 4 research agents running simultaneously (2 per arc), then 4 writing agents (first halves in parallel, then second halves in parallel once continuity is established).
Background execution. All parallel agent steps use background execution. The orchestrating session monitors progress by checking for output files (do they exist yet?), word counts (are they near target?), and citation density (do they have enough source references?). No polling an API. Just ls and wc and grep.
Session budget. The full pipeline typically requires four to five Claude Code sessions:
| Session | Steps | Agents |
|---|---|---|
| 1 | Discovery, voice profiles | 1-2 |
| 2 | Outlines | 2 (one per arc) |
| 3 | Research briefs | 4 (2 per arc) |
| 4 | Chapter writing | 2-4 |
| 5 | Assembly, validation | 1 |
Each session verifies the previous step’s output before launching the next. The sessions are independent; you can close Claude Code between them, come back a week later, and pick up where the files left off.
The Citation System
Every quoted message in the narrative gets a {FN:uid} citation where the UID is the first 16 characters of the SHA-256 hash of timestamp|sender|contact|text[:100]. The unified index (unified_index.jsonl) contains every message in the archive with its computed UID. The assembly script validates every citation against this index.
What the validation reveals about the narratives’ evidential basis:
Sampling two chapters (one from each arc) illustrates the ratio:
| Metric | Chapter A | Chapter B |
|---|---|---|
| Footnote markers | 12 | 13 |
| Paragraphs with citations | ~40-43% | ~40-43% |
| Sourced words | ~8% | ~11% |
| Opus construction | ~92% | ~89% |
Across the full corpus, quoted-word share varies widely by chapter, from roughly 2% to over 46%, depending on how much raw messaging material the research agents surfaced. Even paragraphs containing citations are majority written by Opus; the citation functions as a short evidentiary anchor (a quoted sentence or two) inside a 100-250 word interpretive block.
This ratio is the pipeline’s most important transparency mechanism. The narrative is not a transcript with commentary. It is fundamentally an AI literary construction with evidentiary scaffolding. The sourced quotes establish behavioral data points: what was actually said, when, to whom. The remainder is Opus’s inference about what those data points mean (the interior life, the emotional arc, the psychological depth).
The sourced quotes tend to be brief, concrete, often hedged or understated. The Opus construction is extended, analytical, and frequently more psychologically articulate than anything the subject wrote in real time. This gap between source register and narrative register is itself informative: the pipeline doesn’t reproduce how someone communicates. It produces a more articulate version of what someone might be thinking, grounded in evidence of what they actually said.
Consider a citation like “I’m just dumb,” a three-word act of self-deprecation delivered in the middle of a text conversation. The narrative wraps this in 200 words of psychological context: what it means as a confession proxy, where it sits in the hedging architecture, what it reveals about the gap between self-assessment and demonstrated capacity. The three words are the evidence. The 200 words are the inference. The {FN:uid} marker tells you exactly where the evidence ends and the inference begins.
The Personality Reference: 2.3KB That Drives 189,000 Words
The personality reference is the pipeline’s steering mechanism, a markdown document of roughly 2,300 bytes that shapes how Opus constructs the narrator’s interior life across nearly two hundred thousand words of output.
It is derived from the quantitative style profile (extracted by Python from the messages) and structured as character notes (adapted, names anonymized):
This person processes emotions indirectly. Their exclamation rate is extremely low (0.3%), suggesting they rarely express strong emotions through emphasis or excitement in text. Instead, they ask questions frequently (18.3%), approaching situations through inquiry rather than declaration. Their heavy use of ellipsis (0.4%) suggests trailing thoughts, hedging, or deliberation: a mind that doesn’t land hard on conclusions.
The critical distinction: “these describe who this person IS, not how the narrative should SOUND.” The personality reference tells Opus that this person hedges, qualifies, processes through proxies, and approaches emotion through inquiry. It does not tell Opus to write in short hedged sentences. The narrative voice is literary memoir. The character being narrated hedges. The narrative about the hedging is direct.
For the later arc, a separate personality reference specific to that era documents the evolution: emoji usage increased from 9.1% to 61.5% of messages. Question rate dropped from 18.3% to 10.9%. The person expressed direct emotional vulnerability within weeks of a new relationship, where years earlier the same kind of directness took months to arrive. These metrics capture real behavioral change between eras: the same person, texting measurably differently. The reference ensures Opus constructs the evolved version of the character for the later narrative, not a static average.
The personality reference is not a psychological profile. It’s a 2.3KB document of behavioral observations derived from texting metrics. The psychological depth in the narratives (the inferred attachment architecture, the emotion regulation patterns, the analysis-as-action-substitute mechanism) was not specified in the reference. Opus derived it from the combination of the reference’s behavioral notes and the messages themselves. Whether those derivations are accurate is an open question. (The next post in this series attempts to answer it.)
What the Pipeline Produced
The first arc (the longer relationship, spanning several years): four first-person stories totaling 72 chapters and 101,601 words, drawn from approximately 266,000 messages. The dataset was dense enough that the research agents had abundant material for every chapter.
The second arc (the shorter relationship, spanning roughly eight months): four first-person stories totaling 56 chapters and 87,110 words, drawn from 19,867 messages. This arc tested the pipeline’s capacity to generate depth from significantly less data: about 7.5% of the first arc’s message volume, producing roughly 86% of its word count.
Every quote is traceable. 25 out of 25 canonical facts were verified in the second arc’s validation pass. Citation validation confirmed 1,159 of 1,160 source references match entries in the unified message index (the second arc validated at 480/480; the first arc at 679/680, with one invalid UID).
The volume disparity between arcs produced different narrative textures. The first arc had so much messaging data that the challenge was selection: which of thousands of messages to cite, which to leave in the archive. The second arc required more inferential work from Opus; fewer data points meant more interpretive construction between citations. The confidence weight of each citation is higher in the sparser corpus because each data point carries more interpretive load.
What the Narratives Reveal About the Method
The most interesting thing about the pipeline’s output is not the stories themselves. It’s what the citation system reveals about the relationship between evidence and inference.
The sourced quotes (typically the smaller share of each chapter’s word count) are behavioral data. What someone actually typed, when they typed it, to whom. They tend to be hedged, fragmented, colloquial. Real text messages under real emotional pressure don’t come out in polished prose. They come out as: “I’m just dumb.” “Ok. That resulted in the feels.” (Quoted messages are lightly normalized; some emoji present in the source archive have been removed for readability.) And, occasionally, a single sentence that compresses an entire relational stance into a handful of words.
The Opus construction (the majority of each chapter) is a literary interpretation of that behavior. It identifies patterns the subject likely wasn’t aware of in real time, names mechanisms that texting shorthand can’t articulate, and constructs an interior life from exterior evidence. Opus writes things like: “I subjected them to analysis so rigorous that the analysis became a substitute for action.” The subject never said this. Opus inferred it from dozens of messages showing the pattern.
Is the inference accurate? That depends on what you mean by accurate. The behavioral pattern is documented: the messages really do show compulsive analysis preceding every significant action. The mechanism Opus names (analysis as action-substitute) is a reasonable interpretation of the pattern. Whether the subject would recognize the description as capturing their actual experience is a different question, and one the pipeline can’t answer from message data alone.
This is the honest limitation of the approach. The pipeline produces narratives that are psychologically coherent and evidentially grounded. It does not produce narratives that are psychologically verified. The majority construction is a model’s interpretation, not the subject’s testimony. The citation system makes this proportion visible rather than hiding it behind a seamless literary surface.
The Connection Back to Post 025
The Logistics Gap demonstrated that statistical pattern matching on text messages extracts coordination protocols, not personality. The fine-tuned model learned the distribution of the training data, which is overwhelmingly logistical, and reproduced it faithfully.
The narrative pipeline draws from the same underlying archive and applies a different approach: literary inference rather than statistical reproduction. Instead of learning to generate messages that look like the training data, it uses the messaging data as evidence for a literary engine that constructs the person behind the messages.
The result is complementary. The fine-tuned model is more faithful to the surface (it actually sounds like the person’s texting style) but empty underneath. The narrative pipeline achieves depth (it constructs a psychologically coherent character) but the depth is largely inference. Neither is the person. One is the logistics of the person. The other is a literary interpretation of the person. The person remains in the gaps.
The next question is whether the literary interpretation gets the person right. Not just coherent; accurate. Does Opus’s implicit model of the subject’s psychology align with what validated psychometric instruments measure? That comparison is the subject of the next post.
Toward a Methodology Release
The pipeline’s architecture (the six-step process, handoff through files, the citation system, the parallel agent orchestration) is generalizable. It doesn’t depend on any specific relationship or corpus. Anyone with a JSONL message archive and access to Claude could run a variant of this pipeline.
Whether anyone should is a different question. The pipeline produces narratives about real people derived from their private communications. The ethical constraints are significant: the subject should be aware, ideally consenting, and the output should never be published without review. The methodology can be open while the data stays private.
The METHODOLOGY.md document that drove this pipeline’s development is structured as a reproducible guide. The scripts are parameterized. The schema is documented. If there’s interest in the pipeline as a tool (separate from the specific narratives it produced here) the methodology and tooling could be released as an open-source project. The data and the generated stories would remain private.
What’s generalizable: the architecture, the citation system, the agent orchestration pattern, the voice profiling methodology. What’s not: any specific person’s messages, personality references, or narrative output. The wall between method and data is where the privacy boundary sits.
Comments