Growing the Compliment Forest: Small Models, Honest Encouragement, and Five Clearings

Community Article Published June 12, 2026

image

Most AI encouragement tools make the same mistake: they become vague exactly when the user needs something concrete.

Someone writes, "I worry about my test score," and receives a polished cloud of phrases about believing in themselves, trusting the journey, or keeping their own pace. The words are kind, but they do not help the person understand the worry or decide what to do next.

These are ordinary problems in modern society. Students can feel that one score defines their intelligence. Workers can feel trapped between an unhealthy job and fear of an uncertain search. Social comparison can turn one difficult moment into a judgment about an entire future.

The Compliment Forest began with a different question:

Can a small model help someone understand one real worry and choose a useful next step without becoming generic or pretending to be a therapist?

The result is a Gradio application that turns a worry into a five-chapter, illustrated walk. It is whimsical on the surface, but its generation pipeline is deliberately strict underneath.

Why This Is Backyard AI

We built The Compliment Forest for the Backyard AI track because it focuses on a common human problem close to home: people often need help making sense of school pressure, work uncertainty, belonging, comparison, or fear about what comes next. They may not need a grand solution. They need to feel understood, separate evidence from prediction, see realistic choices, and identify one manageable action.

Generic reassurance does not solve that problem. Telling someone to believe in themselves may sound warm, but it does not help them decide whether to review a missed test question, identify a knowledge gap, ask for clearer expectations, or gather more information before making a job decision.

The forest is designed to make that support easier to approach. The visual journey lowers the emotional barrier to reflection, while the model pipeline keeps the result tied to the person's own words. It does not promise that the worry will disappear. It helps the person leave with a clearer understanding and a small next move.

The Experience

The visitor starts with a name and one sentence about what is troubling them. The forest then asks five adaptive multiple-choice questions. These questions stay focused on the actual problem:

  • What triggered the worry?
  • What feels most at stake?
  • When is it harder or easier?
  • What support or information would help?
  • What would count as a small win?

After the visitor chooses an image style, the application generates five clearings:

  1. Arrive: acknowledge the feeling and concern.
  2. Steady: separate facts from the outcome fear predicts.
  3. Widen: offer realistic explanations or options.
  4. Step: suggest one small, optional action.
  5. Carry: leave a simple plan or rule to remember.

Each clearing includes a scene, short narration, reflection, mantra, and a fresh illustration. The browser reveals them progressively rather than showing a wall of generated text.

Why a Planner-Author-Critic Pipeline?

Free-form generation was not reliable enough for a sensitive experience. Larger prompts produced warmer prose, but they also encouraged plausible inventions: interviews the user never attended, applications they never sent, dates they never mentioned, or actions they never completed.

The application therefore divides text generation into roles.

The planner creates a conservative evidence plan. Every fact anchor must copy an exact phrase from the user's situation. A fear remains an uncertainty; it cannot silently become a fact.

The author writes the five-chapter forest from that validated plan.

The critic identifies chapters that are repetitive, unsupported, generic, or structurally weak.

Python validators then enforce constraints that should not be delegated to prose judgment:

  • source phrases must occur in the user's input;
  • generated numbers and dates must be supported;
  • completed actions and biography cannot be invented;
  • long user sentences may be echoed only once;
  • clearings cannot substantially repeat one another;
  • stock abstract language is rejected;
  • the step clearing must contain practical help.

When a chapter fails, the author rewrites only that chapter. Valid chapters are preserved exactly. If targeted repair still fails, the application requests one fresh full forest. If that also fails, it returns an honest error before image generation. It never replaces the result with canned encouragement.

That last decision came from a real failure. An earlier safety fallback always returned five valid chapters, but every chapter repeated the user's sentence and surrounded it with abstract language. It looked polished and passed the schema, yet it failed the person. Removing that fallback made the system more honest and ultimately more useful.

Small Models, Different Jobs

The live text path uses openbmb/MiniCPM4.1-8B. MiniCPM handles adaptive intake, evidence planning, authoring, and critique. Together with the roughly 17B-parameter FLUX image stack, the live application is about 25B parameters in total and stays below the hackathon's 32B total cap.

The project also publishes a 1.08B MiniCPM5 fine-tune trained on 1,500 schema-validated examples. It was converted to a 688 MB Q4_K_M GGUF and smoke-tested with llama.cpp. That local path remains part of the same application for reproducible, off-grid experiments.

Images use FLUX.1-schnell with four rank-16 LoRA adapters:

  • Watercolor Storybook
  • Layered Paper Cut
  • Moonlit Gouache
  • Botanical Ink Wash

The multi-style dataset contains 160 generated examples balanced across animals, people, symbolic objects, and environments. Balancing subjects was important. The first dataset changed style successfully but produced too many animals, so the visual variety felt smaller than the style menu suggested.

Modal as the GPU Layer

The canonical organization Space serves the custom interface and streams the API response. Since organization members cannot manage its secrets, it forwards generation requests to an owner-controlled CPU Space that holds the HMAC credential. Text and image workloads then run on separate Modal applications:

  • MiniCPM4.1-8B on an A100 40GB endpoint
  • FLUX.1-schnell plus the four style adapters on an A100 80GB endpoint

This separation matters. Text planning and image rendering have different memory and scaling behavior. Keeping them in separate containers prevents one model from evicting the other and lets each service scale to zero independently. The public repository contains no credentials.

Modal was also used for adapter training, validation grids, GGUF smoke tests, and deployment experiments. The runtime bridge signs requests with HMAC, and the organization Space preserves the NDJSON stream so a long generation remains visibly alive through both Hugging Face hops.

Codex as an Engineering Partner

OpenAI Codex was used across the project rather than for one isolated code generation step.

It read the architecture and handoff notes, traced production errors across the Space and Modal boundaries, wrote regression tests before fixes, strengthened JSON parsing, redesigned prompt contracts, calibrated deterministic quality checks, deployed Space revisions, and exercised full live user flows.

The most useful Codex work was not producing more code. It was preserving the discipline to find root causes. A malformed critic response, a repeated intake question, an incomplete planner object, and a five-role survivor failure looked like separate bugs. Following their data flow showed a shared issue: strict model contracts need bounded repair, precise diagnostics, and deterministic validation at the right boundary.

Safety and Privacy

The Compliment Forest is not therapy. A guard stops crisis, self-harm, abuse, and acute medical inputs before model calls and provides a human-support message.

Public traces use fictional scenarios. Identity, situation text, secrets, tokens, and image payloads are not published. The trace dataset records the shape of planner-author-critic handoffs so others can inspect the architecture without exposing a visitor's private worry.

What I Learned

A schema is necessary, but not sufficient. Perfect JSON can still contain bad help.

Concrete does not mean invented. Useful advice can be specific while remaining conditional and grounded in the user's words.

Fallbacks can hide product failure. A deterministic success response is worse than an honest retry when it erases personalization.

Small models improve when each call has one job. Planning, writing, and critique are easier to validate than one giant prompt.

Pacing is part of model design. Streaming one clearing at a time changes a slow generation into a walk.

Visual diversity needs subject diversity. Four styles are not truly four experiences if every image contains the same kind of character.

Links

Community

Sign up or log in to comment