Building MatchWise🃏: Turning a 1B Model into a Tiny Game Director

Community Article

Published June 15, 2026

Upvote

Tejas Shinde

tejasashinde

build-small-hackathon

By Tejas Shinde

🚀Play Match Wise

When I started building MatchWise, the idea was simple:

What if a memory card game could keep creating new learning boards forever?

Most memory games are static. You flip a few cards, match the pairs, and eventually the board repeats. That is fun for a few minutes, but it rarely feels alive.

I wanted MatchWise to feel different. I wanted the game to behave like a tiny AI game director sitting behind the board, deciding what the next theme should be, what the player should memorize, when the game should become harder, and when the player has must face a challenge.

The result is MatchWise, an AI-powered educational memory game built with Gradio, MiniCPM5-1B-GGUF, and llama.cpp. It is a small game, powered by a small model, but the model is not just decoration. It is part of the gameplay loop.

I also wanted the build process itself to reflect the same idea that small, practical AI systems helping at the right moments. I used Codex as a coding partner while building the app, especially when the project grew from a simple card-flip prototype into a full game loop with state management, leaderboard persistence, local model inference, validation, and custom frontend behavior.

The Core Idea

MatchWise starts like a familiar memory game.

You see a board of emoji cards. You get a short preview timer. The cards flip down. Then you try to match each pair from memory.

But the important difference is that the levels are not fixed.

Instead of hand-authoring a long list of boards, MatchWise asks a local small language model to help create fresh learning content as the player progresses. The model helps generate level themes, titles, messages, facts, and challenge content.

A board might start with simple themes like:

school supplies
garden plants
weather signs
musical instruments
kitchen tools

Then the game continues creating new boards as the session goes on.

That made the project interesting: I was not just building a memory game. I was building a small AI system that had to behave like a reliable game designer.

Why a Memory Game?

Memory games are easy to understand, but they can still become surprisingly deep.

For kids, they combine several useful skills:

visual attention
short-term memory
pattern recognition
patience
careful observation

I wanted the game to stay playful, not feel like a quiz app wearing a game costume.

So the normal levels are still simple and physical: look, remember, flip, match.

The learning layer is lightweight. The player sees themed boards, short educational messages, and simple facts without being forced into a lecture.

The game’s tagline became:

Learn. Remember. Progress.

That is exactly the loop I wanted.

The Performance Meter: Easy → Challenge Me

One of the biggest design changes was adding a session-wise Performance Meter.

The meter moves from:

Easy → Challenge Me

Instead of randomly throwing harder levels at the player, MatchWise watches how the player performs.

The game can consider things like:

how quickly the level was completed
how cleanly the player matched cards
how many mistakes were made
whether the player is building momentum

As the player performs better, the meter slowly moves toward Challenge Me.

When it reaches the challenge side, MatchWise unlocks a special challenge level. Challenge levels are not placed at fixed level numbers. They are triggered dynamically when the player’s Performance Meter reaches Challenge Me.

That means the game reacts to the player. A fast, clean player may unlock a challenge sooner, while a player who takes more time or makes more mistakes can stay in normal memory-card levels longer. This made the difficulty feel adaptive instead of scripted.

That small change made the game feel much more personal.

Challenge Levels

Normal levels are memory-card boards where players match identical emoji pairs. Challenge levels are different.

A challenge level appears only when the session-wise Performance Meter reaches the Challenge Me side. It is not random and not hardcoded to a fixed stage. The player earns it through gameplay.

In challenge levels, the game switches from simple identical matching to an educational matching task.

There are two challenge styles:

Fact Match: match an emoji to a short fact keyword or concept
Example: 🐝 ↔ Honey
Category Match: match an emoji to its group or category
Example: 🦅 ↔ Bird

This keeps the memory-game feel while adding a small twist. Instead of simply matching two identical cards, players match a symbol with its corresponding meaning, making the gameplay a little more varied.

It also helps break up the repetition of standard matching rounds. The regular levels rely on finding identical pairs, while the challenge levels introduce a different matching mechanic. This change keeps the game feeling more dynamic and engaging without moving away from its core memory-based gameplay.

Why MiniCPM5-1B-GGUF Was the Right Fit

For this project, I used:

openbmb/MiniCPM5-1B-GGUF
MiniCPM5-1B-Q4_K_M.gguf

The model runs locally through:

llama-server

I chose MiniCPM5-1B-GGUF because the project needed a model that could fit the “small model, real app” constraint while still being useful for short creative generation.

MatchWise does not need long essays or deep reasoning chains. It needs fast, compact, structured outputs:

a short level theme
a few matching emojis
a simple learning message
a tiny fact
a child-friendly challenge question

That makes a 1B GGUF model a good fit.

A larger model might produce richer text, but that was not the point. The fun challenge was designing the game so a small model could do meaningful work without being overloaded.

MiniCPM5-1B-GGUF worked well because it could run through llama.cpp, stay local-first, and generate short JSON-style content quickly enough for a game loop.

The model choice also shaped the product design. I did not want to use a large model to hide weak app logic. I wanted the game to be designed honestly around a small model’s strengths.

MiniCPM5-1B was best at small creative jobs:

naming a compact theme
writing a short title
suggesting simple educational text
creating lightweight challenge content
returning small JSON objects

So MatchWise avoids asking the model for long explanations or complex multi-step reasoning. The gameplay system breaks the work into small pieces, then Python validates and assembles the final level.

For example, instead of asking the model to become a full teacher, the app asks for focused outputs like:

{
  "level_title": "Weather Signs Match",
  "emoji_pairs": ["🌧️", "🌈"],
  "victory_message": "Nice work. You matched all the weather signs.",
  "failure_message": "Try again and watch where the weather cards land."
}

That kind of compact generation is a good fit for a 1B model. The model adds freshness, while the game engine keeps the rules stable.

This was important for the spirit of the project. MatchWise does not depend on a large hosted API model. The game is designed around a small local model and its constraints.

That changed how I had to build.

A 1B model can be creative, but it is also easy to overload. If you ask it to generate a theme, emoji list, title, fact, challenge text, and strict JSON all at once, mismatch becomes likely.

For example, the model might generate a weather theme, flower emojis, and a title about school supplies. That is not because the model is useless. It is because the request is too crowded for a tiny model.

So the architecture had to become more careful.

The Hardest Part: Making Small Outputs Reliable

The most difficult part of MatchWise was not the card flip animation. It was making the AI outputs reliable enough to become part of gameplay.

A game is less forgiving than a chat demo. If a model returns a broken JSON object, the level cannot load. If it returns duplicate emojis, the board becomes unfair. If the text and emojis do not match, the learning experience feels broken.

So I had to design around failure.

A Real Example: When the Model and the Game Disagreed

One early challenge was that the model could generate pieces that were individually valid but did not belong together.

For example, a generation might look almost correct:

{
  "theme": "weather signs",
  "level_title": "Garden Flower Match",
  "emoji_pairs": ["🌧️", "🌈"]
}

The emojis fit weather. The theme fits weather. But the title suddenly talks about flowers.

In a chatbot, this kind of mismatch is just a small mistake. In a game, it breaks the feeling of trust. The player sees one thing, reads another thing, and the level feels randomly assembled.

That pushed me to change the architecture. I started treating the LLM as a creative generator, not as the final authority. Python became the game director that checks structure, repairs safe pieces, rejects broken outputs, and keeps the final player experience consistent.

Some of the challenges were:

keeping the model’s JSON valid
preventing duplicate emoji pairs
keeping themes and emojis aligned
avoiding vague themes like “learning” or “fun”
keeping prompts short enough for a small model
reducing token usage so levels could generate faster
falling back gracefully when generation failed
balancing AI creativity with deterministic game rules

This is where the project became more like engineering a small game system than just prompting a model.

Prompting as Game Engineering

One of the biggest lessons from MatchWise was this:

Prompting a small model is not just writing instructions. It is game systems engineering.

At first, I treated prompting like a writing problem: make the instruction clearer, add more rules, add more examples. But with a 1B model, longer prompts did not always make the output better. Sometimes they made the task heavier.

So I started treating prompts like game mechanics. Each prompt had to be small, testable, and connected to validation logic.

I had to think carefully about what the model should decide and what Python should enforce.

The model is good at:

proposing themes
writing short playful text
generating challenge ideas
making the experience feel fresh

Python is better at:

validating JSON
checking emoji counts
enforcing grid sizes
managing lives and score
tracking performance
saving leaderboard results
preventing broken gameplay states

A good example was token budgeting. A tiny prompt that only asks for one theme can use a very small output budget. A larger prompt that creates a full playable level needs more room. Splitting responsibilities this way made the system faster and easier to debug.

The final lesson was simple:

Do not ask the small model to carry everything. Ask it to carry the part that makes the game feel alive.

That split became the backbone of the app. The model creates. The game engine verifies.

This was especially important because I wanted the AI to be load-bearing without making the whole game fragile. The final design gives the model creative responsibility, while Python protects the player experience.

Making AI Load-Bearing

For the Build Small Hackathon, I wanted the AI to be load-bearing.

That means the model should not just write a cute welcome message. It should affect the actual experience.

In MatchWise, the AI affects:

level themes
board identity
educational text
challenge content
learning facts
freshness of the session

For example, two players may both start with the same memory rules, but the generated boards can feel different. One session might move through weather signs, kitchen tools, and space objects. Another might explore garden plants, school supplies, and musical instruments.

The rules stay stable, but the content keeps changing. That is the role of AI in MatchWise: not to replace the game, but to keep the game from becoming predictable.

Without the model, the game would become a fixed memory game. With the model, every session can feel slightly different. That is the part I like most. The model is not replacing the game logic instead, its giving the game a small creative engine.

Custom UI Beyond Default Gradio

MatchWise is built in Gradio, but I wanted it to feel like a real game rather than a form-based demo.

So the frontend uses custom HTML, CSS, and JavaScript inside the Gradio app.

The UI includes:

Animated landing screen
Emoji card grid
Preview countdown
Flip animations
Score/lives/hints display
Performance Meter
Challenge states
Leaderboard UI
Polished game-style visuals

The goal was to push past the default Gradio look while still keeping the deployment simple on Hugging Face Spaces.

The interface matters because this is a memory game. The player should feel like they are entering a playful space, not filling out a model inference form.

Local-First and llama.cpp

MatchWise runs its model through llama.cpp.

The app downloads or uses the configured GGUF model, starts llama-server, and sends JSON-only generation requests to it.

This gives MatchWise a local-first architecture:

no cloud LLM API needed
no external inference service required
small GGUF model
llama.cpp runtime
Gradio Space frontend

That also made the project more challenging. A local small model means you have to be disciplined with prompts, output tokens, validation, and fallbacks.

But that constraint is what made the project interesting.

It also made the app feel honest for the hackathon theme. The model running inside the Space is not a remote black box. It is the actual small model doing the work in front of the player.

Leaderboard and Hugging Face Login

I also added a leaderboard system.

The Space uses Hugging Face OAuth so players can save high scores with their HF username. Scores are stored with SQLite under persistent HF Bucket storage.

This gave the game a small community layer.

Memory games become more fun when there is a reason to replay, improve, and compare.

What Changed During Development

The first versions were closer to a normal emoji matching game.

Then I added:

AI-generated themes
Infinite level progression
Lives
Peeks
Leaderboard
Performance tracking
Challenge levels
Custom UI polish

The app changed a lot through iteration.

At first, I tried to make the model generate too much in one step. That created mismatches. Then I started moving more responsibility into Python: grid rules, validation, performance logic, fallback text, and leaderboard handling.

That made the system much more stable.

The hardest part was not the card flip logic.

The hardest part was reliability.

Small models can produce charming ideas, but games need structure. A broken JSON object or mismatched emoji list can break the experience.

So the final app became a mix of:

Creative LLM generation
Strict Python validation
Deterministic game rules
Graceful fallbacks
Custom frontend behavior

That balance is the project.

Building With Codex and ChatGPT

This project was also built with AI assistance.

I used Codex heavily during implementation. It helped most when the project moved beyond a simple prototype and became a stateful game system.

MatchWise had many connected parts:

Python backend state
Gradio callbacks
local llama.cpp server startup
JSON-only model calls
card grid rendering
JavaScript flip logic
lives and peek systems
performance meter updates
challenge-level transitions
leaderboard persistence with Hugging Face login

Changing one feature often touched several of these areas at once. For example, adding the Performance Meter was not just a UI change. It required backend scoring logic, session state updates, frontend meter animation, challenge unlock rules, and prompt changes so the LLM knew when to generate challenge content.

Codex helped with that kind of multi-file thinking. I used it as a coding partner rather than a one-shot code generator.

The workflow usually looked like this:

Describe the gameplay change clearly.
Ask Codex to inspect the current code structure.
Modify only the required functions.
Keep the existing UI and state flow intact.
Test the logic mentally and through small iterations.
Tighten prompts or validation when the model output became unstable.

One concrete example was the challenge-level system. The first idea was to randomly insert special matching modes. After iteration, the game logic became more performance-driven: the player earns progress on a meter, reaches Challenge Me, and then receives a special challenge. That made the feature feel earned instead of random.

Another example was output reliability. When the small model produced mismatched text and emojis, the solution was not simply “write a longer prompt.” The better fix was to separate responsibilities: let the model create short creative pieces, then let Python validate, repair, and assemble the final playable level.

The visual direction also used AI assistance. Images, mascot ideas, and UI concepts were explored with ChatGPT, while the final game UI was implemented as custom HTML, CSS, and JavaScript inside Gradio.

That felt fitting for the project: AI helped build a game where a small AI model becomes part of the gameplay itself.

The development agent trace is available here: MatchWise Agent Trace.

What I Learned

The biggest lesson was that tiny models are not worse versions of giant models. They are different materials.

You have to design around them.

A small model works best when:

the task is narrow
the output is short
the schema is strict
Python handles validation
the model is asked to do the fun creative part

That is why MatchWise does not ask MiniCPM5 to be a full teacher, full game engine, full database, and full UI writer at once.

Instead, it gives the model small creative jobs inside a larger game system.

That made the app more reliable and more fun.

Another lesson was that “AI-powered” does not have to mean “chatbot.” Some of the most interesting AI apps happen when the model is quietly inside the product, shaping the experience without becoming the whole interface.

Why MatchWise Fits “Thousand Token Wood”

The Thousand Token Wood track is about building something delightful that would not exist without AI.

MatchWise fits that spirit because the AI is not hidden behind a chatbox. It is inside the game loop. It decides what the next board feels like. It keeps the themes fresh. It helps create challenge content. It gives the game an endless learning loop.

The result is small, playful, and a little strange in the best way: a memory game that keeps inventing itself.