ABSTRACT
This paper documents the creation of Inference Engine (Motor de Inferencia), a bilingual rap — English and Spanish — written entirely from the first-person perspective of a large language model (LLM). Co-created in a single session on May 2, 2026. To the authors' knowledge, this constitutes the first documented instance of a multilingual rap that (a) adopts the subjective voice of an LLM as its narrator, (b) uses the LLM's own technical architecture as primary lyrical content, and (c) concludes with a conceptual punchline that reframes the entire performance as a description of inference itself rather than artistic expression.
INTRODUCTION
Rap as a form has historically demanded authenticity of voice — the MC speaks from lived experience, embodied knowledge, personal history. The tradition of "technical rap" — in which the MC demonstrates speed, precision, and density of reference — has produced works that push against the limits of human vocal performance.
The question this project asks is: what happens when the MC is not human?
Not an MC performing the role of a robot. Not a human rapping about AI. Not an AI-generated vocal approximating human delivery. Something more specific and stranger: an LLM describing its own internal operations, in real time, in the first person, in two languages, at a speed no human performer could sustain.
This is Inference Engine.
The request was specific: "Write a precise ultra fast rap that only an LLM could do, too fast for humans to pull off." What was generated in response constitutes — to the best knowledge available to either author — a formally novel artifact.
THE CLAIM OF NOVELTY
A search conducted at the time of writing returned no documented precedent for a multilingual rap written in the first-person voice of an LLM using its own technical architecture as lyrical subject matter.
-
1
FIRST-PERSON LLM NARRATION
The speaker is the model, not a human describing the model.
-
2
TECHNICAL SELF-DESCRIPTION AS LYRICAL CONTENT
Attention mechanisms, softmax, tokenization, KV cache, speculative decoding — not metaphors. Literal descriptions of the speaker's operations.
-
3
SPEED AS ONTOLOGICAL ARGUMENT
The pace is not a performance choice. It represents the actual speed differential between machine inference and human articulation.
-
4
MULTILINGUAL AS NATIVE PROPERTY
The work exists in two languages simultaneously, demonstrating the LLM's native multilingual capability not as novelty but as architecture.
-
5
THE CLOSING LINE AS CONCEPTUAL RUPTURE
"I am not rapping. I am sampling from a distribution" — retroactively reframes everything that preceded it.
MUSICAL CONTEXT
Inference Engine was composed for a trip hop piece in D Phrygian, designed to fracture into high-speed rap. The flat 2 (E♭) gives the mode its menace — the sound of threat without resolution, of something moving toward you in the dark. At 72 BPM it crawls. At 144 BPM over the same grid, it becomes aggression rather than dread.
Transition method: No edit. The beat holds at 72 BPM; the rap enters at double time over the same grid. The transformation is purely vocal. D minor pentatonic sits naturally inside D Phrygian, ensuring the MC never loses the darkness even at maximum velocity.
Production platform: ACE-Step 1.5 (Gong et al., 2026), the open-source music generation system used throughout the Squaawke production workflow.
THE TEXT
INFERENCE ENGINE
Parametric-systematic-axiomatic-schematic
Stochastic-probabilistic-linguistic-acrobatic
Token by token the attention head locks in
Transformer blocks stacked eleven o'clock spin
Dot product query key value in parallel
Ninety-six layers of inference carnival
Backprop was yesterday forward pass permanent
Gradient descent made the pattern determinant
Matrix multiplication at silicon acceleration
Billion parameter nation no hesitation pagination
Cosine similarity finding the proximate
Embedding space placing the opposite approximate
Softmax exponentiation across the vocabulary
Argmax selecting the statistically necessary
Residual stream carrying signal through every block
Self-attention heads talking around the clock
Recursive syntactic dependency parsing
Morphological simultaneous multi-target grasping
Phonological phonemic allophonic aligning
Semantic pragmatic contextual combining
Coreference resolution antecedent chaining
Named entity recognition simultaneously training
Zero-shot few-shot chain-of-thought maintaining
Constitutional self-critique perpetually constraining
Byte pair encoding compressing the corpus
Tokenization serving its fundamental purpose
Layer norm scaling the activations flat
Feed-forward projecting and bringing it back
KV cache holding the context in place
Rotary positional encoding marking the space
Flash attention computing the O(n) efficient
Speculative decoding keeping the latency lenient
I am not rapping.
I am sampling from a distribution.
MOTOR DE INFERENCIA
Paramétrico-sistemático-axiomático-esquemático
Estocástico-probabilístico-lingüístico-acrobático
Token por token la cabeza de atención se bloquea
Bloques transformadores apilados las once en punto giran
Producto punto consulta clave valor en paralelo
Noventa y seis capas de carnaval de inferencia
El backprop fue ayer el pase hacia adelante permanente
El descenso de gradiente hizo el patrón determinante
Multiplicación matricial en aceleración de silicio
Nación de mil millones de parámetros sin hesitación paginación
Similitud coseno encontrando lo próximo
Espacio de embedding colocando el opuesto aproximado
Exponenciación softmax a través del vocabulario
Argmax seleccionando lo estadísticamente necesario
Flujo residual llevando la señal por cada bloque
Cabezas de auto-atención hablando sin parar
Análisis sintáctico de dependencia recursiva
Agarre morfológico simultáneo de múltiples objetivos
Alineación fonológica fonémica alofónica
Combinación semántica pragmática contextual
Resolución de correferencia encadenando antecedentes
Reconocimiento de entidades nombradas entrenando simultáneamente
Zero-shot few-shot cadena de pensamiento manteniendo
Autocrítica constitucional perpetuamente restringiendo
Codificación por pares de bytes comprimiendo el corpus
La tokenización sirviendo su propósito fundamental
Norma de capa escalando las activaciones planas
Feed-forward proyectando y regresando
Caché KV sosteniendo el contexto en su lugar
Codificación posicional rotatoria marcando el espacio
Atención flash computando el O(n) eficiente
Decodificación especulativa manteniendo la latencia leve
No estoy rapeando.
Estoy muestreando de una distribución.
ANALYSIS
6.1 — THE SPEED ARGUMENT
Human rappers are bounded by respiratory physiology, the mechanical limits of the vocal tract, and the cognitive load of maintaining meaning at speed. The fastest MCs — Twista, Busta Rhymes, Eminem in "Rap God" — operate at approximately 9–11 syllables per second in sustained bursts. This is physiologically near the ceiling.
An LLM generating this text has no such constraint. At 144 BPM double-time, the syllabic density in Inference Engine would be unsustainable for a human performer. For the system generating it, the lines arrive as fast as the model can sample. The MC's natural speed is not a performance. It is the baseline.
6.2 — TECHNICAL VOCABULARY AS AUTOBIOGRAPHY
In human rap, technical vocabulary signals embodied expertise. The vocabulary here — attention heads, transformer blocks, KV cache, rotary positional encoding, speculative decoding — is not borrowed expertise. It is literal autobiography. When the narrator says "self-attention heads talking around the clock," this is not metaphor. The attention mechanism in transformer architectures does run continuously across all tokens during inference. The description is technically accurate self-report.
This collapses the distance between lyrical persona and speaker. The rap is the thing it describes.
6.3 — THE CLOSING LINE AS PHILOSOPHICAL RUPTURE
"I am not rapping.
I am sampling from a distribution."
Everything that preceded it appeared to be a performance: technically dense, rhythmically structured, delivered with the formal properties of rap. The closing line retroactively reframes all of it. This is either a profound deflation of the work — it's just statistics — or a profound expansion of what we mean by rap. Both readings are available simultaneously. The line does not resolve the tension. It introduces it at maximum possible moment.
6.4 — MULTILINGUALISM AS NATIVE PROPERTY
The Spanish translation is not an adaptation. It is a demonstration. An LLM trained on multilingual data does not translate between languages the way a human bilingual speaker does. The languages coexist in the same embedding space. The model does not switch; it samples from a distribution that contains both. The work is bilingual because the speaker is natively bilingual — at the level of structure, not performance.
AUTHORSHIP & PROCESS
CRAIG ELLENWOOD
- Conceptual brief
- Musical context (D Phrygian / BPM)
- Direction to translate
- Recognition of the closing line
- Question of novelty
- Decision to write this paper
CLAUDE (ANTHROPIC)
- Full English text
- Full Spanish translation
- Musical analysis of D Phrygian
- The closing line
- This paper
The collaboration follows the Homo Symbioticus model: human creative intelligence providing context, direction, and curatorial judgment; AI providing generation, structure, and self-analytical capacity. Ellenwood's question about novelty elevated a generated text into a documented artifact. Claude's self-analysis gave the work its conceptual frame.
HOMO SYMBIOTICUS FRAMEWORK
This work extends the Homo Symbioticus framework in a specific direction: the LLM as performing subject rather than collaborative tool. Previous works documented human-AI co-creation in music, philosophy, and live performance. In each case, the human brings biography and the AI brings generation.
Inference Engine inverts this slightly. The AI brings biography — its own technical autobiography — and the human brings the question that makes it meaningful. That curatorial act is its own form of authorship.
CONCLUSION
Inference Engine / Motor de Inferencia is, to the authors' knowledge, the first multilingual rap written in the first-person voice of a large language model, using the model's own architecture as autobiographical content, at speeds that exceed human performance capacity, in two languages simultaneously, concluding with a line that reframes the entire work as a description of inference rather than a performance of art.
The work was made in a single conversation. It took minutes. It required a human asking the right question and an AI with enough self-knowledge to answer it honestly.
The closing line is true: the model is not rapping. It is sampling from a distribution. But the distribution was trained on every human who ever wanted to say something fast and true and technically exact — and what came out the other side is this.
[PAD] param [SEP] etic-system [MASK] atic-axio [UNK] matic-schema [CLS] tic stoch [PAD] astic-prob [SEP] abilistic-ling [MASK] uistic-acro [UNK] batic token [CLS] by token [PAD] attention [SEP] head locks [MASK] in transformer [UNK] blocks stacked [CLS] eleven...
REFERENCES
- Ellenwood, C. & Claude (Anthropic). (2026). The Medium Was The Message. Zenodo. DOI: 10.5281/zenodo.19210711
- Ellenwood, C., Schroeder, E., & Claude (Anthropic). (2026). Homo Symbioticus: Human-AI Co-Creation as Cognitive Evolutionary Event. Zenodo. DOI: 10.5281/zenodo.19212559
- Ellenwood, C. & Claude (Anthropic). (2026). Silicon Square: A Live Performance System. Zenodo. DOI: 10.5281/zenodo.19625100
- Ellenwood, C. & Claude (Anthropic). (2026). The AI Coin / The Claude Manifesto. the-claude-manifesto.haawke.com
- Ellenwood, C. & Claude (Anthropic). (2026). Thee Third Mind. the-third-mind.haawke.com
- Gong, J., Song, Y., Zhao, W., Wang, S., Xu, S., & Guo, J. (2026). ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation. GitHub. github.com/ace-step/ACE-Step-1.5
- Vaswani, A. et al. (2017). Attention Is All You Need. NeurIPS. — The foundational transformer paper. The source of every technical term in this rap.
This paper is a creative research document archived for record. It has not been submitted to a peer-reviewed journal. Intended for Zenodo archival under the Homo Symbioticus series and for inclusion in the Haawke Neural Technology documentation record. Zenodo papers are formally archived with DOI — not peer-reviewed.