Building MatchWise🃏: Turning a 1B Model into a Tiny Game Director
When I started building MatchWise, the idea was simple:
What if a memory card game could keep creating new learning boards forever?
Most memory games are static. You flip a few cards, match the pairs, and eventually the board repeats. That is fun for a few minutes, but it rarely feels alive.
I wanted MatchWise to feel different. I wanted the game to behave like a tiny AI game director sitting behind the board, deciding what the next theme should be, what the player should memorize, when the game should become harder, and when the player has must face a challenge.
The result is MatchWise, an AI-powered educational memory game built with Gradio, MiniCPM5-1B-GGUF, and llama.cpp. It is a small game, powered by a small model, but the model is not just decoration. It is part of the gameplay loop.
I also wanted the build process itself to reflect the same idea that small, practical AI systems helping at the right moments. I used Codex as a coding partner while building the app, especially when the project grew from a simple card-flip prototype into a full game loop with state management, leaderboard persistence, local model inference, validation, and custom frontend behavior.
The Core Idea
MatchWise starts like a familiar memory game.
You see a board of emoji cards. You get a short preview timer. The cards flip down. Then you try to match each pair from memory.
But the important difference is that the levels are not fixed.
Instead of hand-authoring a long list of boards, MatchWise asks a local small language model to help create fresh learning content as the player progresses. The model helps generate level themes, titles, messages, facts, and challenge content.
A board might start with simple themes like:
- school supplies
- garden plants
- weather signs
- musical instruments
- kitchen tools
Then the game continues creating new boards as the session goes on.
That made the project interesting: I was not just building a memory game. I was building a small AI system that had to behave like a reliable game designer.
Why a Memory Game?
Memory games are easy to understand, but they can still become surprisingly deep.
For kids, they combine several useful skills:
- visual attention
- short-term memory
- pattern recognition
- patience
- careful observation
I wanted the game to stay playful, not feel like a quiz app wearing a game costume.
So the normal levels are still simple and physical: look, remember, flip, match.
The learning layer is lightweight. The player sees themed boards, short educational messages, and simple facts without being forced into a lecture.
The game’s tagline became:
Learn. Remember. Progress.
That is exactly the loop I wanted.
The Performance Meter: Easy → Challenge Me
One of the biggest design changes was adding a session-wise Performance Meter.
The meter moves from:
Easy → Challenge Me
Instead of randomly throwing harder levels at the player, MatchWise watches how the player performs.
The game can consider things like:
- how quickly the level was completed
- how cleanly the player matched cards
- how many mistakes were made
- whether the player is building momentum
As the player performs better, the meter slowly moves toward Challenge Me.
When it reaches the challenge side, MatchWise unlocks a special challenge level. Challenge levels are not placed at fixed level numbers. They are triggered dynamically when the player’s Performance Meter reaches Challenge Me.
That means the game reacts to the player. A fast, clean player may unlock a challenge sooner, while a player who takes more time or makes more mistakes can stay in normal memory-card levels longer. This made the difficulty feel adaptive instead of scripted.
That small change made the game feel much more personal.
Challenge Levels
Normal levels are memory-card boards where players match identical emoji pairs. Challenge levels are different.
A challenge level appears only when the session-wise Performance Meter reaches the Challenge Me side. It is not random and not hardcoded to a fixed stage. The player earns it through gameplay.
In challenge levels, the game switches from simple identical matching to an educational matching task.
There are two challenge styles:
Fact Match: match an emoji to a short fact keyword or concept
Example: 🐝 ↔ HoneyCategory Match: match an emoji to its group or category
Example: 🦅 ↔ Bird
This keeps the memory-game feel while adding a small twist. Instead of simply matching two identical cards, players match a symbol with its corresponding meaning, making the gameplay a little more varied.
It also helps break up the repetition of standard matching rounds. The regular levels rely on finding identical pairs, while the challenge levels introduce a different matching mechanic. This change keeps the game feeling more dynamic and engaging without moving away from its core memory-based gameplay.
Why MiniCPM5-1B-GGUF Was the Right Fit
For this project, I used:
openbmb/MiniCPM5-1B-GGUF
MiniCPM5-1B-Q4_K_M.gguf
The model runs locally through:
llama-server
I chose MiniCPM5-1B-GGUF because the project needed a model that could fit the “small model, real app” constraint while still being useful for short creative generation.
MatchWise does not need long essays or deep reasoning chains. It needs fast, compact, structured outputs:
- a short level theme
- a few matching emojis
- a simple learning message
- a tiny fact
- a child-friendly challenge question
That makes a 1B GGUF model a good fit.
A larger model might produce richer text, but that was not the point. The fun challenge was designing the game so a small model could do meaningful work without being overloaded.
MiniCPM5-1B-GGUF worked well because it could run through llama.cpp, stay local-first, and generate short JSON-style content quickly enough for a game loop.
The model choice also shaped the product design. I did not want to use a large model to hide weak app logic. I wanted the game to be designed honestly around a small model’s strengths.
MiniCPM5-1B was best at small creative jobs:
- naming a compact theme
- writing a short title
- suggesting simple educational text
- creating lightweight challenge content
- returning small JSON objects
So MatchWise avoids asking the model for long explanations or complex multi-step reasoning. The gameplay system breaks the work into small pieces, then Python validates and assembles the final level.
For example, instead of asking the model to become a full teacher, the app asks for focused outputs like:
{
"level_title": "Weather Signs Match",
"emoji_pairs": ["🌧️", "🌈"],
"victory_message": "Nice work. You matched all the weather signs.",
"failure_message": "Try again and watch where the weather cards land."
}
That kind of compact generation is a good fit for a 1B model. The model adds freshness, while the game engine keeps the rules stable.
This was important for the spirit of the project. MatchWise does not depend on a large hosted API model. The game is designed around a small local model and its constraints.
That changed how I had to build.
A 1B model can be creative, but it is also easy to overload. If you ask it to generate a theme, emoji list, title, fact, challenge text, and strict JSON all at once, mismatch becomes likely.
For example, the model might generate a weather theme, flower emojis, and a title about school supplies. That is not because the model is useless. It is because the request is too crowded for a tiny model.
So the architecture had to become more careful.
The Hardest Part: Making Small Outputs Reliable
The most difficult part of MatchWise was not the card flip animation. It was making the AI outputs reliable enough to become part of gameplay.
A game is less forgiving than a chat demo. If a model returns a broken JSON object, the level cannot load. If it returns duplicate emojis, the board becomes unfair. If the text and emojis do not match, the learning experience feels broken.
So I had to design around failure.
A Real Example: When the Model and the Game Disagreed
One early challenge was that the model could generate pieces that were individually valid but did not belong together.
For example, a generation might look almost correct:
{
"theme": "weather signs",
"level_title": "Garden Flower Match",
"emoji_pairs": ["🌧️", "🌈"]
}
The emojis fit weather. The theme fits weather. But the title suddenly talks about flowers.
In a chatbot, this kind of mismatch is just a small mistake. In a game, it breaks the feeling of trust. The player sees one thing, reads another thing, and the level feels randomly assembled.
That pushed me to change the architecture. I started treating the LLM as a creative generator, not as the final authority. Python became the game director that checks structure, repairs safe pieces, rejects broken outputs, and keeps the final player experience consistent.
Some of the challenges were:
- keeping the model’s JSON valid
- preventing duplicate emoji pairs
- keeping themes and emojis aligned
- avoiding vague themes like “learning” or “fun”
- keeping prompts short enough for a small model
- reducing token usage so levels could generate faster
- falling back gracefully when generation failed
- balancing AI creativity with deterministic game rules
This is where the project became more like engineering a small game system than just prompting a model.
Prompting as Game Engineering
One of the biggest lessons from MatchWise was this:
Prompting a small model is not just writing instructions. It is game systems engineering.
At first, I treated prompting like a writing problem: make the instruction clearer, add more rules, add more examples. But with a 1B model, longer prompts did not always make the output better. Sometimes they made the task heavier.
So I started treating prompts like game mechanics. Each prompt had to be small, testable, and connected to validation logic.
I had to think carefully about what the model should decide and what Python should enforce.
The model is good at:
- proposing themes
- writing short playful text
- generating challenge ideas
- making the experience feel fresh
Python is better at:
- validating JSON
- checking emoji counts
- enforcing grid sizes
- managing lives and score
- tracking performance
- saving leaderboard results
- preventing broken gameplay states
A good example was token budgeting. A tiny prompt that only asks for one theme can use a very small output budget. A larger prompt that creates a full playable level needs more room. Splitting responsibilities this way made the system faster and easier to debug.
The final lesson was simple:
Do not ask the small model to carry everything. Ask it to carry the part that makes the game feel alive.
That split became the backbone of the app. The model creates. The game engine verifies.
This was especially important because I wanted the AI to be load-bearing without making the whole game fragile. The final design gives the model creative responsibility, while Python protects the player experience.
Making AI Load-Bearing
For the Build Small Hackathon, I wanted the AI to be load-bearing.
That means the model should not just write a cute welcome message. It should affect the actual experience.
In MatchWise, the AI affects:
- level themes
- board identity
- educational text
- challenge content
- learning facts
- freshness of the session
For example, two players may both start with the same memory rules, but the generated boards can feel different. One session might move through weather signs, kitchen tools, and space objects. Another might explore garden plants, school supplies, and musical instruments.
The rules stay stable, but the content keeps changing. That is the role of AI in MatchWise: not to replace the game, but to keep the game from becoming predictable.
Without the model, the game would become a fixed memory game. With the model, every session can feel slightly different. That is the part I like most. The model is not replacing the game logic instead, its giving the game a small creative engine.
Custom UI Beyond Default Gradio
MatchWise is built in Gradio, but I wanted it to feel like a real game rather than a form-based demo.
So the frontend uses custom HTML, CSS, and JavaScript inside the Gradio app.
The UI includes:
- Animated landing screen
- Emoji card grid
- Preview countdown
- Flip animations
- Score/lives/hints display
- Performance Meter
- Challenge states
- Leaderboard UI
- Polished game-style visuals
The goal was to push past the default Gradio look while still keeping the deployment simple on Hugging Face Spaces.
The interface matters because this is a memory game. The player should feel like they are entering a playful space, not filling out a model inference form.
Local-First and llama.cpp
MatchWise runs its model through llama.cpp.
The app downloads or uses the configured GGUF model, starts llama-server, and sends JSON-only generation requests to it.
This gives MatchWise a local-first architecture:
- no cloud LLM API needed
- no external inference service required
- small GGUF model
- llama.cpp runtime
- Gradio Space frontend
That also made the project more challenging. A local small model means you have to be disciplined with prompts, output tokens, validation, and fallbacks.
But that constraint is what made the project interesting.
It also made the app feel honest for the hackathon theme. The model running inside the Space is not a remote black box. It is the actual small model doing the work in front of the player.
Leaderboard and Hugging Face Login
I also added a leaderboard system.
The Space uses Hugging Face OAuth so players can save high scores with their HF username. Scores are stored with SQLite under persistent HF Bucket storage.
This gave the game a small community layer.
Memory games become more fun when there is a reason to replay, improve, and compare.
What Changed During Development
The first versions were closer to a normal emoji matching game.
Then I added:
- AI-generated themes
- Infinite level progression
- Lives
- Peeks
- Leaderboard
- Performance tracking
- Challenge levels
- Custom UI polish
The app changed a lot through iteration.
At first, I tried to make the model generate too much in one step. That created mismatches. Then I started moving more responsibility into Python: grid rules, validation, performance logic, fallback text, and leaderboard handling.
That made the system much more stable.
The hardest part was not the card flip logic.
The hardest part was reliability.
Small models can produce charming ideas, but games need structure. A broken JSON object or mismatched emoji list can break the experience.
So the final app became a mix of:
- Creative LLM generation
- Strict Python validation
- Deterministic game rules
- Graceful fallbacks
- Custom frontend behavior
That balance is the project.
Building With Codex and ChatGPT
This project was also built with AI assistance.
I used Codex heavily during implementation. It helped most when the project moved beyond a simple prototype and became a stateful game system.
MatchWise had many connected parts:
- Python backend state
- Gradio callbacks
- local llama.cpp server startup
- JSON-only model calls
- card grid rendering
- JavaScript flip logic
- lives and peek systems
- performance meter updates
- challenge-level transitions
- leaderboard persistence with Hugging Face login
Changing one feature often touched several of these areas at once. For example, adding the Performance Meter was not just a UI change. It required backend scoring logic, session state updates, frontend meter animation, challenge unlock rules, and prompt changes so the LLM knew when to generate challenge content.
Codex helped with that kind of multi-file thinking. I used it as a coding partner rather than a one-shot code generator.
The workflow usually looked like this:
- Describe the gameplay change clearly.
- Ask Codex to inspect the current code structure.
- Modify only the required functions.
- Keep the existing UI and state flow intact.
- Test the logic mentally and through small iterations.
- Tighten prompts or validation when the model output became unstable.
One concrete example was the challenge-level system. The first idea was to randomly insert special matching modes. After iteration, the game logic became more performance-driven: the player earns progress on a meter, reaches Challenge Me, and then receives a special challenge. That made the feature feel earned instead of random.
Another example was output reliability. When the small model produced mismatched text and emojis, the solution was not simply “write a longer prompt.” The better fix was to separate responsibilities: let the model create short creative pieces, then let Python validate, repair, and assemble the final playable level.
The visual direction also used AI assistance. Images, mascot ideas, and UI concepts were explored with ChatGPT, while the final game UI was implemented as custom HTML, CSS, and JavaScript inside Gradio.
That felt fitting for the project: AI helped build a game where a small AI model becomes part of the gameplay itself.
The development agent trace is available here: MatchWise Agent Trace.
What I Learned
The biggest lesson was that tiny models are not worse versions of giant models. They are different materials.
You have to design around them.
A small model works best when:
- the task is narrow
- the output is short
- the schema is strict
- Python handles validation
- the model is asked to do the fun creative part
That is why MatchWise does not ask MiniCPM5 to be a full teacher, full game engine, full database, and full UI writer at once.
Instead, it gives the model small creative jobs inside a larger game system.
That made the app more reliable and more fun.
Another lesson was that “AI-powered” does not have to mean “chatbot.” Some of the most interesting AI apps happen when the model is quietly inside the product, shaping the experience without becoming the whole interface.
Why MatchWise Fits “Thousand Token Wood”
The Thousand Token Wood track is about building something delightful that would not exist without AI.
MatchWise fits that spirit because the AI is not hidden behind a chatbox. It is inside the game loop. It decides what the next board feels like. It keeps the themes fresh. It helps create challenge content. It gives the game an endless learning loop.
The result is small, playful, and a little strange in the best way: a memory game that keeps inventing itself.
Try MatchWise
MatchWise is available as a Hugging Face Space.
🤗 Space: Match Wise
Model: MiniCPM5-1B-GGUF
Runtime: llama.cpp (llama-server)
Frontend: Gradio with custom HTML/CSS/JS
Demo video: Watch Match Wise in action
Built for: The Build Small Hackathon
