| # Field Notes: Building The Immersive AI Time Machine | |
| Date: 2026-06-14 | |
| ## What Changed | |
| The project started as a working voice conversation loop. The immersive upgrade turns each launch into a staged time-travel scene: | |
| - A portal animation hides generation latency. | |
| - A year counter creates the illusion of moving through time. | |
| - The world and character are represented as generated visual assets. | |
| - A distinct narrator introduces the scene before the character speaks. | |
| - Ambient sound is available procedurally in the browser, with hooks for real loops later. | |
| - Souvenirs now become visual artifacts, not just markdown text. | |
| ## Design Decisions | |
| ### Transport First, Chat Second | |
| The most important judge-visible improvement is the transition from "voice chat" to "arrival." The launch sequence, world reveal, portrait, narration, and artifact give users a physical sense that they crossed into a scene before the conversation begins. | |
| ### Ordinary People Stay Central | |
| The character should not be a famous historical figure or a generic narrator. They should be an ordinary person with a practical concern, limited worldview, and a believable misunderstanding of the user. | |
| ### Generated Assets Are Optional At Runtime | |
| Real image generation uses FLUX.1 Schnell through Together AI when credentials are present. Fixture/fallback SVG assets keep the app reliable in local development, tests, and demos without network access. | |
| ### Narrator And Character Voices Are Separate | |
| Narration uses a distinct voice profile. It introduces the world like the beginning of a film, then gets out of the way so the character can own the conversation. | |
| ### No Heavy Avatar Yet | |
| A full talking head, WebXR, or 3D world would increase risk. For this hackathon, image-backed world/portrait reveal plus audio-reactive UI gives most of the perceived immersion with less fragility. | |
| ## Hackathon Fit | |
| - **Delight:** portal launch, time movement, cinematic reveal, voice, and artifact. | |
| - **AI Is Essential:** destination, persona, conversation, voice, scene, portrait, narration, and artifact are all AI-shaped. | |
| - **Originality:** the app is a time-travel encounter with ordinary people, not a chatbot wrapper. | |
| - **Gradio Polish:** custom cockpit UI, animations, audio hooks, and visual artifact panel. | |
| - **Field Notes:** this document. | |
| - **Sharing Is Caring:** JSONL traces already record event streams. | |
| ## Model Rule | |
| The 32B cap is treated as a per-model limit. The registry and code now check the largest enabled model against the cap instead of summing all enabled models. | |