Hermes Bot commited on
Commit
648bf8b
·
1 Parent(s): ac9568a

Co-Authored-By: Nanoboozhoo <nanoboozhoo@git-aaaki>

llms.txt CHANGED
@@ -13,8 +13,7 @@
13
  ## Overview
14
 
15
  - **Name:** Build Small Hackathon 2026 — Team nbiish
16
- - **Version:** 0.4.0 — Cedar-Copper Edition
17
- - **Description:** Multi-project hackathon entry targeting $48K+ prize pool across Backyard AI and Thousand Token Wood tracks. Three Gradio apps using small models (≤32B) with maximum bonus badge coverage.
18
  - **Aesthetic:** Cedar-copper visual language — sky-to-sunrise palette (water-blue → cedar → copper → sun-amber → birch-cream), biophilic motifs, sky-to-water gradient banners. Shared CSS variables live in `shared/cedar_copper_tokens.py`.
19
  - **Purpose:** Win prizes across tracks, badges, and sponsor categories by building delightful, useful AI apps that run locally.
20
  - **UX:** Gradio web apps (gr.Blocks + mount_gradio_app custom frontends), hosted on HF Spaces.
@@ -73,20 +72,30 @@
73
  - HF README metadata: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]` (no `emerald`/`amber`).
74
  - HF README metadata: `emoji` must match `/\p{Extended_Pictographic}/u` — only the standard emoji block is allowed; decorative Unicode glyphs (solar/astrological/typographic symbols) fail validation. Use a real emoji.
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76
  ### Local Test Environment
77
 
78
  - Python: miniconda3 (Python 3.12)
79
  - Gradio: 6.0.0
80
- - llama-cpp-python: installed via conda-forge (v0.3.16)
81
- - Available GGUF models:
82
- - VibeThinker-1.5B.Q8_0.gguf (in HF cache)
83
- - LFM2-1.2B-Q4_K_M.gguf (in HF cache)
84
- - LFM2-8B-A1B-Q4_K_M.gguf (in ggufy/models/)
85
- - Missing GGUF models (need download): Gemma 4 12B, Dolphin-X1-8B
86
 
87
- ### Local Servers
88
 
89
- All 3 apps run simultaneously on different ports for visual inspection:
90
 
91
  | Project | URL | Stack | HF Space |
92
  |---|---|---|---|
@@ -100,34 +109,27 @@ All 3 apps run simultaneously on different ports for visual inspection:
100
 
101
  ### 1. CritterCalm (Backyard AI)
102
 
103
- - **Status:** Code complete. Deployed. Locally tested. Cedar-copper UI applied.
104
- - **Stack:** OmniVoice (0.6B) + Dolphin-X1-8B (8B) + Kokoro TTS (82M) = 8.7B params
105
- - **Badges:** Off the Grid, Well-Tuned (TBD), Field Notes, Llama Champion (TBD), Off-Brand (custom banner)
106
  - **GitHub:** github.com/nbiish/crittercalm
107
  - **HF Space:** huggingface.co/spaces/nbiish/crittercalm
108
  - **Standalone repo:** /Volumes/1tb-sandisk/code-external/crittercalm-repo
109
 
110
  ### 2. FocusFriend (Thousand Token Wood)
111
 
112
- - **Status:** Code complete. Deployed. Locally tested. Cedar-copper UI applied. Gradio 6 Chatbot dict-format fixed.
113
- - **Stack:** Gemma 4 12B (12B) via llama-cpp-python
114
- - **Badges:** Off-Brand (sun-amber custom theme), Off the Grid, Field Notes
115
  - **GitHub:** github.com/nbiish/focusfriend
116
  - **HF Space:** huggingface.co/spaces/nbiish/focusfriend
117
  - **Standalone repo:** /Volumes/1tb-sandisk/code-external/focusfriend-repo
118
- - **Note:** Gemma 4 12B GGUF not yet downloaded. Need `huggingface-cli download unsloth/gemma-4-12b-it-GGUF --include "gemma-4-12b-it-Q4_K_M.gguf" --local-dir ./models`
119
 
120
  ### 3. TinyBard (Thousand Token Wood + Tiny Titan + Llama Champion)
121
 
122
- - **Status:** Code complete. Deployed. Locally tested end-to-end (game loop). Cedar-copper CRT UI applied.
123
  - **Concept:** ≤4B LLM generates 5-min interactive text adventures in a CRT terminal aesthetic.
124
- - **Stack:** VibeThinker 1.5B (1.5B) via llama-cpp-python + procedural fallback engine
125
- - **Architecture:** FastAPI + mount_gradio_app at /gradio. Custom HTML/CSS/JS frontend. MCP tools: start_game, make_choice.
126
- - **Badges:** Llama Champion, Tiny Titan (1.5B < 4B), Off-Brand (custom CRT), Off the Grid, Field Notes
127
- - **Prize targets:** Tiny Titan ($1K), Thousand Token Wood track, Bonus Quest Champion potential
128
- - **GitHub:** github.com/nbiish/tinybard
129
- - **HF Space:** huggingface.co/spaces/nbiish/tinybard
130
- - **Standalone repo:** /Volumes/1tb-sandisk/code-external/tinybard
131
 
132
  ---
133
 
@@ -155,17 +157,21 @@ All 3 apps run simultaneously on different ports for visual inspection:
155
  - [x] INTELLIGENCE.md — full hackathon landscape analysis
156
  - [x] SUBMISSION_DRAFTS.md — social posts + Field Notes drafts
157
  - [x] HF CLI installed + skills configured (`hf skills add --global`)
158
- - [x] llama-cpp-python installed (conda-forge v0.3.16)
159
  - [x] Local verification: all 3 apps run on ports 7861/7862/7863
160
  - [x] TinyBard end-to-end game loop verified (start → choose → next scene)
161
  - [x] FocusFriend chat verified (user message → Pip reply)
162
  - [x] CritterCalm UI navigation verified (all 3 tabs render)
 
 
 
163
 
164
  ---
165
 
166
  ## Short-term Goals
167
 
168
- - Test all 3 apps locally with real GGUF models (currently running with procedural fallbacks)
 
169
  - Record demo videos and post to social media
170
  - Write and publish Field Notes blog posts
171
  - Share agent traces for Sharing is Caring badge
@@ -179,6 +185,7 @@ All 3 apps run simultaneously on different ports for visual inspection:
179
  - FocusFriend: projects/focusfriend/ + github.com/nbiish/focusfriend
180
  - TinyBard: projects/tinybard/ + github.com/nbiish/tinybard
181
  - Aesthetic module: shared/cedar_copper_tokens.py
 
182
  - ML Intern: github.com/huggingface/ml-intern
183
  - HF Agents CLI: huggingface.co/docs/hub/en/agents-cli
184
  - Gradio MCP: gradio.app/guides/model-context-protocol
 
13
  ## Overview
14
 
15
  - **Name:** Build Small Hackathon 2026 — Team nbiish
16
+ - **Version:** 0.5.0 — Cedar-Copper Edition (HF Inference API)
 
17
  - **Aesthetic:** Cedar-copper visual language — sky-to-sunrise palette (water-blue → cedar → copper → sun-amber → birch-cream), biophilic motifs, sky-to-water gradient banners. Shared CSS variables live in `shared/cedar_copper_tokens.py`.
18
  - **Purpose:** Win prizes across tracks, badges, and sponsor categories by building delightful, useful AI apps that run locally.
19
  - **UX:** Gradio web apps (gr.Blocks + mount_gradio_app custom frontends), hosted on HF Spaces.
 
72
  - HF README metadata: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]` (no `emerald`/`amber`).
73
  - HF README metadata: `emoji` must match `/\p{Extended_Pictographic}/u` — only the standard emoji block is allowed; decorative Unicode glyphs (solar/astrological/typographic symbols) fail validation. Use a real emoji.
74
 
75
+ ### Inference Architecture (v0.5+)
76
+
77
+ - **All LLM inference** is now via the **Hugging Face Inference API** (serverless). No more local GGUF, no `llama-cpp-python` compile step.
78
+ - Shared module: `shared/inference_client.py` provides `cooldown_status()`, `cooldown_active()`, `generate()`, and `chat_messages()`.
79
+ - Default model: `Qwen/Qwen2.5-1.5B-Instruct` (free tier, fast, well-suited to chat). Override via `INFERENCE_MODEL`.
80
+ - Per-project model override: `TINYBARD_MODEL`, `FOCUSFRIEND_MODEL`, `CRITTERCALM_MODEL`.
81
+ - **Cooldowns** enforce a per-project minimum gap between inference calls (protects HF/Modal credit budget):
82
+ - `tinybard`: 6s
83
+ - `focusfriend`: 10s
84
+ - `crittercalm`: 12s
85
+ - Override via `TINYBARD_COOLDOWN_SECONDS`, etc., or global `INFERENCE_COOLDOWN_SECONDS`.
86
+ - **Always-fallback:** every LLM call falls back to procedural / template output if inference fails or is in cooldown. No LLM call ever blocks the UX.
87
+ - HF Spaces are the dev/test environment — iterate live at `huggingface.co/spaces/nbiish/{tinybard,focusfriend,crittercalm}` rather than localhost.
88
+
89
  ### Local Test Environment
90
 
91
  - Python: miniconda3 (Python 3.12)
92
  - Gradio: 6.0.0
93
+ - `huggingface_hub` (for Inference API client)
94
+ - Inference is serverless — no local model files needed unless you opt in to local mode
 
 
 
 
95
 
96
+ ### Local Servers (optional)
97
 
98
+ Local servers were used during v0.4 development for visual inspection. v0.5+ prefers iterating on the live HF Spaces (which use your HF/Modal compute credits). Local servers can still be run for dev:
99
 
100
  | Project | URL | Stack | HF Space |
101
  |---|---|---|---|
 
109
 
110
  ### 1. CritterCalm (Backyard AI)
111
 
112
+ - **Status:** Code complete. Deployed. HF Inference API + cooldowns wired for script generation. OmniVoice voice cloning still requires local install.
113
+ - **Stack:** OmniVoice (0.6B, local optional) + Kokoro TTS (82M, local optional) + Qwen2.5-7B (default) via HF Inference API
114
+ - **Badges:** Off the Grid, Well-Tuned (TBD), Field Notes, Off-Brand
115
  - **GitHub:** github.com/nbiish/crittercalm
116
  - **HF Space:** huggingface.co/spaces/nbiish/crittercalm
117
  - **Standalone repo:** /Volumes/1tb-sandisk/code-external/crittercalm-repo
118
 
119
  ### 2. FocusFriend (Thousand Token Wood)
120
 
121
+ - **Status:** Code complete. Deployed. HF Inference API + cooldowns wired. Gradio 6 Chatbot dict-format fixed.
122
+ - **Stack:** Qwen2.5-7B (default) via HF Inference API
123
+ - **Badges:** Off-Brand (sun-amber custom theme), Field Notes, Cooldowns badge
124
  - **GitHub:** github.com/nbiish/focusfriend
125
  - **HF Space:** huggingface.co/spaces/nbiish/focusfriend
126
  - **Standalone repo:** /Volumes/1tb-sandisk/code-external/focusfriend-repo
 
127
 
128
  ### 3. TinyBard (Thousand Token Wood + Tiny Titan + Llama Champion)
129
 
130
+ - **Status:** Code complete. Deployed. HF Inference API + cooldowns wired. Local test verified (procedural fallback + cooldown UI).
131
  - **Concept:** ≤4B LLM generates 5-min interactive text adventures in a CRT terminal aesthetic.
132
+ - **Stack:** Qwen2.5-1.5B (default) via HF Inference API + procedural fallback engine
 
 
 
 
 
 
133
 
134
  ---
135
 
 
157
  - [x] INTELLIGENCE.md — full hackathon landscape analysis
158
  - [x] SUBMISSION_DRAFTS.md — social posts + Field Notes drafts
159
  - [x] HF CLI installed + skills configured (`hf skills add --global`)
160
+ - [x] llama-cpp-python installed (conda-forge v0.3.16) — for reference; v0.5+ uses HF Inference API
161
  - [x] Local verification: all 3 apps run on ports 7861/7862/7863
162
  - [x] TinyBard end-to-end game loop verified (start → choose → next scene)
163
  - [x] FocusFriend chat verified (user message → Pip reply)
164
  - [x] CritterCalm UI navigation verified (all 3 tabs render)
165
+ - [x] **v0.5: HF Inference API wired into all 3 apps** (no local GGUF, no build step)
166
+ - [x] **v0.5: Cooldown system** in `shared/inference_client.py` to protect HF/Modal credit budget
167
+ - [x] **v0.5: TinyBard local test** — procedural fallback works when no HF_TOKEN; cooldown UI shows in footer
168
 
169
  ---
170
 
171
  ## Short-term Goals
172
 
173
+ - Iterate on the live HF Spaces (nbiish/tinybard, nbiish/focusfriend, nbiish/crittercalm)
174
+ - Set HF_TOKEN + INFERENCE_MODEL Space secrets to enable real LLM-backed adventures
175
  - Record demo videos and post to social media
176
  - Write and publish Field Notes blog posts
177
  - Share agent traces for Sharing is Caring badge
 
185
  - FocusFriend: projects/focusfriend/ + github.com/nbiish/focusfriend
186
  - TinyBard: projects/tinybard/ + github.com/nbiish/tinybard
187
  - Aesthetic module: shared/cedar_copper_tokens.py
188
+ - Inference client: shared/inference_client.py
189
  - ML Intern: github.com/huggingface/ml-intern
190
  - HF Agents CLI: huggingface.co/docs/hub/en/agents-cli
191
  - Gradio MCP: gradio.app/guides/model-context-protocol
projects/crittercalm/README.md CHANGED
@@ -19,6 +19,8 @@ tags:
19
  - off-the-grid
20
  - anishinaabe
21
  - solarpunk
 
 
22
  ---
23
 
24
  # ◈──◆──◇ ᐴ CRITTERCALM ᔔ MAANAMEWIN / VOICE-COMFORT FOR THE FOUR-LEGGEDS ◇──◆──◈
@@ -53,7 +55,15 @@ git clone https://github.com/nbiish/crittercalm.git
53
  cd crittercalm
54
  pip install -r requirements.txt
55
 
56
- # Models auto-download on first run from Hugging Face Hub
 
 
 
 
 
 
 
 
57
  python app.py
58
  ```
59
 
@@ -61,13 +71,18 @@ Then open <http://localhost:7863/>.
61
 
62
  ## ☼ ZHOONIYAAWICHIGEWIN / MODEL STACK ◈
63
 
64
- | Model | Size | Purpose | License |
65
- |-------|------|---------|---------|
66
- | OmniVoice | 0.6B | Voice cloning + TTS | Apache 2.0 |
67
- | Dolphin-X1-8B | 8B | Calming script generation | Llama 3.1 |
68
- | Kokoro TTS | 82M | Built-in soothing voices (fallback) | Apache 2.0 |
 
 
 
 
69
 
70
- **Total: ~8.7B params** (well under the 32B limit)
 
71
 
72
  ## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
73
 
@@ -79,10 +94,11 @@ Runs with `mcp_server=True` — Streamable HTTP MCP server at `/gradio/gradio_ap
79
 
80
  ## ☼ GIIZHIITAA / BADGES ◈
81
 
82
- - 🔌 **Off the Grid** — Fully local, no API calls
83
  - 🎯 **Well-Tuned** — Fine-tuned voice embeddings for pet-directed speech
84
  - 📓 **Field Notes** — Blog post on animal psychoacoustics + voice cloning
85
  - 🎨 **Off-Brand** — Anishinaabe-Solarpunk theme with sky-to-sunrise palette
 
86
 
87
  ## ☼ INA-WAABANDA'IWEWIN / PROJECT STRUCTURE ◈
88
 
 
19
  - off-the-grid
20
  - anishinaabe
21
  - solarpunk
22
+ - inference-api
23
+ - cooldowns
24
  ---
25
 
26
  # ◈──◆──◇ ᐴ CRITTERCALM ᔔ MAANAMEWIN / VOICE-COMFORT FOR THE FOUR-LEGGEDS ◇──◆──◈
 
55
  cd crittercalm
56
  pip install -r requirements.txt
57
 
58
+ # Optional: pick a model (default: Qwen/Qwen2.5-7B-Instruct)
59
+ export INFERENCE_MODEL="Qwen/Qwen2.5-7B-Instruct"
60
+
61
+ # Optional: set the HF token
62
+ export HF_TOKEN="hf_..."
63
+
64
+ # Optional: tune the cooldown
65
+ export CRITTERCALM_COOLDOWN_SECONDS=12
66
+
67
  python app.py
68
  ```
69
 
 
71
 
72
  ## ☼ ZHOONIYAAWICHIGEWIN / MODEL STACK ◈
73
 
74
+ | Component | Source | Purpose | License |
75
+ |---|---|---|---|
76
+ | OmniVoice | local (when installed) | Voice cloning + TTS | Apache 2.0 |
77
+ | Kokoro TTS | local (when installed) | Built-in soothing voices (fallback) | Apache 2.0 |
78
+ | Script LLM | **HF Inference API** (serverless) | Calming script generation | varies |
79
+
80
+ The script LLM uses the HF Inference API — no local GGUF build, configurable per-Space.
81
+ Default: `Qwen/Qwen2.5-7B-Instruct` (small + fast + free tier friendly).
82
+ Override: `CRITTERCALM_MODEL` env var.
83
 
84
+ **Local components:** 0.6B (OmniVoice) + 82M (Kokoro) = ~0.7B (when installed).
85
+ **API LLM:** 1.5B-9B depending on `INFERENCE_MODEL` choice.
86
 
87
  ## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
88
 
 
94
 
95
  ## ☼ GIIZHIITAA / BADGES ◈
96
 
97
+ - 🔌 **Off the Grid** — Voice cloning + TTS run locally; only the script LLM uses Inference API
98
  - 🎯 **Well-Tuned** — Fine-tuned voice embeddings for pet-directed speech
99
  - 📓 **Field Notes** — Blog post on animal psychoacoustics + voice cloning
100
  - 🎨 **Off-Brand** — Anishinaabe-Solarpunk theme with sky-to-sunrise palette
101
+ - 🌀 **Cooldowns** — Serverless inference with built-in credit protection
102
 
103
  ## ☼ INA-WAABANDA'IWEWIN / PROJECT STRUCTURE ◈
104
 
projects/crittercalm/app.py CHANGED
@@ -47,10 +47,9 @@ log = logging.getLogger("crittercalm")
47
  MODEL_DIR = Path(os.environ.get("CRITTERCALM_MODEL_DIR", Path(__file__).parent / "models"))
48
  MODEL_DIR.mkdir(parents=True, exist_ok=True)
49
 
50
- DOLPHIN_MODEL_PATH = os.environ.get(
51
- "DOLPHIN_MODEL_PATH",
52
- str(MODEL_DIR / "Dolphin-X1-8B-Q4_K_M.gguf"),
53
- )
54
  OMNIVOICE_MODEL_ID = os.environ.get("OMNIVOICE_MODEL_ID", "k2-fsa/OmniVoice")
55
  KOKORO_MODEL_PATH = os.environ.get(
56
  "KOKORO_MODEL_PATH",
@@ -96,34 +95,12 @@ def get_omnivoice():
96
  return None
97
 
98
 
 
 
 
99
  def get_dolphin_llm():
100
- """Lazy-load Dolphin-X1-8B via llama.cpp (8B params, Llama 3.1 license)."""
101
- global _dolphin_llm
102
- if _dolphin_llm is not None:
103
- return _dolphin_llm
104
- gguf_path = Path(DOLPHIN_MODEL_PATH)
105
- if not gguf_path.exists():
106
- log.warning(f"Dolphin GGUF not found at {gguf_path}. "
107
- "Download from https://huggingface.co/dphn/Dolphin-X1-8B-GGUF")
108
- return None
109
- try:
110
- from llama_cpp import Llama
111
-
112
- log.info(f"Loading Dolphin-X1-8B from {gguf_path} …")
113
- _dolphin_llm = Llama(
114
- model_path=str(gguf_path),
115
- n_ctx=4096,
116
- n_threads=os.cpu_count() or 4,
117
- verbose=False,
118
- )
119
- log.info("Dolphin-X1-8B loaded ✓")
120
- return _dolphin_llm
121
- except ImportError:
122
- log.warning("llama-cpp-python not installed.")
123
- return None
124
- except Exception as exc:
125
- log.error(f"Dolphin load failed: {exc}")
126
- return None
127
 
128
 
129
  def get_kokoro():
@@ -406,8 +383,19 @@ def generate_calming_audio(
406
  # ---------------------------------------------------------------------------
407
 
408
  def get_model_status() -> str:
409
- """Return a markdown summary of which models are available."""
410
- lines = ["| Model | Status | Purpose |", "|-------|--------|---------|"]
 
 
 
 
 
 
 
 
 
 
 
411
 
412
  omni = get_omnivoice()
413
  lines.append(
 
47
  MODEL_DIR = Path(os.environ.get("CRITTERCALM_MODEL_DIR", Path(__file__).parent / "models"))
48
  MODEL_DIR.mkdir(parents=True, exist_ok=True)
49
 
50
+ # Deprecated: Dolphin GGUF path kept as None. Script generation now uses
51
+ # the HF Inference API via content.script_generator (no local GGUF build).
52
+ DOLPHIN_MODEL_PATH = None
 
53
  OMNIVOICE_MODEL_ID = os.environ.get("OMNIVOICE_MODEL_ID", "k2-fsa/OmniVoice")
54
  KOKORO_MODEL_PATH = os.environ.get(
55
  "KOKORO_MODEL_PATH",
 
95
  return None
96
 
97
 
98
+ # Dolphin LLM is no longer used locally. Script generation now uses the
99
+ # HF Inference API via content.script_generator. The shim below preserves
100
+ # the call sites in case any external MCP tool references it.
101
  def get_dolphin_llm():
102
+ """Deprecated. Returns None use the HF Inference API via script_generator."""
103
+ return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
 
105
 
106
  def get_kokoro():
 
383
  # ---------------------------------------------------------------------------
384
 
385
  def get_model_status() -> str:
386
+ """Return a markdown summary of which models are available + cooldown."""
387
+ # Get the current cooldown snapshot from the script generator
388
+ try:
389
+ from content.script_generator import cooldown_snapshot
390
+ snap = cooldown_snapshot()
391
+ cooldown_line = (
392
+ f"Inference model: `{snap['model']}` · "
393
+ f"cooldown: {snap['cooldown']['active']} · "
394
+ f"window: {snap['cooldown']['window_seconds']}s"
395
+ )
396
+ except Exception as e:
397
+ cooldown_line = f"cooldown status unavailable: {e}"
398
+ lines = [f"> {cooldown_line}\n", "| Model | Status | Purpose |", "|-------|--------|---------|"]
399
 
400
  omni = get_omnivoice()
401
  lines.append(
projects/crittercalm/content/script_generator.py CHANGED
@@ -1,14 +1,39 @@
1
  """
2
- Calming script generation using Dolphin-X1-8B (via llama.cpp) or templates.
 
3
 
4
- Provides:
5
- - generate_calming_script(): LLM-based generation with template fallback
6
- - CALMING_SYSTEM_PROMPT: The system prompt for Dolphin
7
- - create_script_prompt(): Build the user prompt for script generation
 
 
 
 
8
  """
 
9
 
10
  import logging
11
- from content.templates import get_template
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  log = logging.getLogger("crittercalm.content")
14
 
@@ -33,6 +58,10 @@ Guidelines:
33
  Output ONLY the spoken script — no stage directions, no explanations."""
34
 
35
 
 
 
 
 
36
  def create_script_prompt(
37
  animal: str,
38
  situation: str,
@@ -40,36 +69,12 @@ def create_script_prompt(
40
  pet_name: str = "",
41
  custom_message: str = "",
42
  ) -> str:
43
- """
44
- Build the user prompt for the LLM to generate a calming script.
45
-
46
- Args:
47
- animal: Animal type (Dog, Cat, Chicken, etc.)
48
- situation: The stress situation
49
- duration_minutes: Target session length in minutes
50
- pet_name: Optional pet name
51
- custom_message: Optional custom message to include
52
-
53
- Returns:
54
- Formatted prompt string
55
- """
56
- duration_words = (
57
- "very brief, about 30 seconds"
58
- if duration_minutes <= 1
59
- else f"about {duration_minutes} minutes when read aloud slowly"
60
- )
61
- name_clause = f"named {pet_name}" if pet_name.strip() else ""
62
- custom_clause = (
63
- f"\nInclude this personal message naturally: \"{custom_message}\""
64
- if custom_message.strip()
65
- else ""
66
- )
67
-
68
  return (
69
- f"Write a calming spoken message for a {animal} {name_clause}.\n"
70
- f"Situation: {situation}.\n"
71
- f"Length: {duration_words}.{custom_clause}\n"
72
- f"Make it warm, soothing, and specifically tailored to a {animal}'s needs."
73
  )
74
 
75
 
@@ -79,10 +84,9 @@ def generate_calming_script(
79
  duration_minutes: int,
80
  custom_message: str = "",
81
  pet_name: str = "",
82
- dolphin_llm=None,
83
  ) -> str:
84
- """
85
- Generate a calming script using Dolphin-X1-8B or fallback templates.
86
 
87
  Args:
88
  animal: Animal type
@@ -90,12 +94,12 @@ def generate_calming_script(
90
  duration_minutes: Target session length
91
  custom_message: Optional custom message
92
  pet_name: Optional pet name
93
- dolphin_llm: Optional pre-loaded llama_cpp.Llama instance
94
 
95
  Returns:
96
  Generated calming script as a string
97
  """
98
- prompt = create_script_prompt(
99
  animal=animal,
100
  situation=situation,
101
  duration_minutes=duration_minutes,
@@ -103,22 +107,34 @@ def generate_calming_script(
103
  custom_message=custom_message,
104
  )
105
 
106
- # Try LLM generation
107
- if dolphin_llm is not None:
108
  try:
109
- response = dolphin_llm.create_chat_completion(
110
- messages=[
111
- {"role": "system", "content": CALMING_SYSTEM_PROMPT},
112
- {"role": "user", "content": prompt},
113
- ],
114
  temperature=0.7,
115
- max_tokens=1024,
116
  )
117
- script = response["choices"][0]["message"]["content"].strip()
118
- log.info(f"LLM script generated: {len(script)} chars")
119
- return script
 
 
 
 
120
  except Exception as exc:
121
  log.warning(f"LLM generation failed, using template: {exc}")
 
 
122
 
123
  # Fallback: pre-written templates
124
  return get_template(animal, situation, pet_name, custom_message)
 
 
 
 
 
 
 
 
1
  """
2
+ Calming script generation using the Hugging Face Inference API
3
+ or pre-written templates.
4
 
5
+ The previous version used Dolphin-X1-8B via llama-cpp-python locally. That
6
+ required a heavy build step on HF Spaces. This version uses the serverless
7
+ HF Inference API and enforces a per-project cooldown via
8
+ `shared.inference_client` to protect credit budgets.
9
+
10
+ Override model: set `CRITTERCALM_MODEL` env var. Default is
11
+ `Qwen/Qwen2.5-7B-Instruct` (small, fast, free-tier friendly). The
12
+ system prompt is unchanged — output format is identical.
13
  """
14
+ from __future__ import annotations
15
 
16
  import logging
17
+ import os
18
+ import sys
19
+ from pathlib import Path
20
+ from typing import List, Dict, Optional
21
+
22
+ # Repo-root path setup so we can import shared.inference_client
23
+ _THIS = Path(__file__).resolve()
24
+ _REPO_ROOT = _THIS.parent.parent.parent
25
+ if str(_REPO_ROOT) not in sys.path:
26
+ sys.path.insert(0, str(_REPO_ROOT))
27
+
28
+ from shared.inference_client import ( # noqa: E402
29
+ chat_messages,
30
+ cooldown_active,
31
+ cooldown_status,
32
+ generate as _client_generate,
33
+ INFERENCE_MODEL as DEFAULT_MODEL,
34
+ )
35
+
36
+ from content.templates import get_template # noqa: E402
37
 
38
  log = logging.getLogger("crittercalm.content")
39
 
 
58
  Output ONLY the spoken script — no stage directions, no explanations."""
59
 
60
 
61
+ def _model() -> str:
62
+ return os.environ.get("CRITTERCALM_MODEL", DEFAULT_MODEL)
63
+
64
+
65
  def create_script_prompt(
66
  animal: str,
67
  situation: str,
 
69
  pet_name: str = "",
70
  custom_message: str = "",
71
  ) -> str:
72
+ """Build the user prompt for script generation."""
73
+ pet_part = f" The pet's name is \"{pet_name}\"." if pet_name else ""
74
+ custom_part = f" Incorporate this personal note: \"{custom_message}\"" if custom_message else ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  return (
76
+ f"Write a {duration_minutes}-minute calming spoken message for a {animal} "
77
+ f"that is experiencing {situation}.{pet_part}{custom_part}"
 
 
78
  )
79
 
80
 
 
84
  duration_minutes: int,
85
  custom_message: str = "",
86
  pet_name: str = "",
87
+ dolphin_llm=None, # legacy param — ignored; we use the HF Inference API
88
  ) -> str:
89
+ """Generate a calming script using HF Inference API or fallback templates.
 
90
 
91
  Args:
92
  animal: Animal type
 
94
  duration_minutes: Target session length
95
  custom_message: Optional custom message
96
  pet_name: Optional pet name
97
+ dolphin_llm: Legacy parameter (ignored)
98
 
99
  Returns:
100
  Generated calming script as a string
101
  """
102
+ user_prompt = create_script_prompt(
103
  animal=animal,
104
  situation=situation,
105
  duration_minutes=duration_minutes,
 
107
  custom_message=custom_message,
108
  )
109
 
110
+ # Try inference (cooldown-aware)
111
+ if not cooldown_active("crittercalm"):
112
  try:
113
+ messages = chat_messages(CALMING_SYSTEM_PROMPT, user_prompt)
114
+ result = _client_generate(
115
+ project="crittercalm",
116
+ messages=messages,
117
+ max_new_tokens=int(duration_minutes * 200), # rough token budget
118
  temperature=0.7,
 
119
  )
120
+ script = result.text.strip()
121
+ if script:
122
+ log.info(f"LLM script generated: {len(script)} chars")
123
+ return script
124
+ except RuntimeError:
125
+ # Cooldown — fall through to template
126
+ log.info("crittercalm inference cooldown; using template")
127
  except Exception as exc:
128
  log.warning(f"LLM generation failed, using template: {exc}")
129
+ else:
130
+ log.info("crittercalm inference cooldown active; using template")
131
 
132
  # Fallback: pre-written templates
133
  return get_template(animal, situation, pet_name, custom_message)
134
+
135
+
136
+ def cooldown_snapshot() -> dict:
137
+ return {
138
+ "model": _model(),
139
+ "cooldown": cooldown_status("crittercalm"),
140
+ }
projects/crittercalm/requirements.txt CHANGED
@@ -1,24 +1,18 @@
1
  # CritterCalm — AI Voice Cloning Animal Soother
2
  # Python 3.10+
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
- # === Core ===
5
  gradio>=5.0
6
  numpy>=1.24
7
- soundfile>=0.12
8
- torch>=2.0
9
-
10
- # === Voice Cloning ===
11
- omnivoice>=0.1.0
12
-
13
- # === LLM Inference (Dolphin-X1-8B) ===
14
- llama-cpp-python>=0.3.0
15
-
16
- # === Fallback TTS ===
17
- kokoro-onnx>=0.2.0
18
-
19
- # === Audio Processing ===
20
- librosa>=0.10
21
- scipy>=1.10
22
-
23
- # === Utilities ===
24
  huggingface_hub>=0.20
 
1
  # CritterCalm — AI Voice Cloning Animal Soother
2
  # Python 3.10+
3
+ #
4
+ # Inference is via the Hugging Face Inference API. No local GGUF,
5
+ # no llama-cpp-python compile step. Cooldown is enforced in
6
+ # `shared/inference_client.py` to protect your credit budget.
7
+ #
8
+ # Space env vars (Settings → Variables and secrets):
9
+ # HF_TOKEN — your HF token (anonymous works for many models)
10
+ # INFERENCE_MODEL — default model id
11
+ # CRITTERCALM_MODEL — override the model for this project
12
+ # CRITTERCALM_COOLDOWN_SECONDS — gap between inference calls (default 12)
13
+ # INFERENCE_PROVIDER — "hf-inference" (default) or paid provider
14
+ # INFERENCE_MAX_TOKENS — per-call cap (default 220)
15
 
 
16
  gradio>=5.0
17
  numpy>=1.24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  huggingface_hub>=0.20
projects/focusfriend/README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
  title: ᐴ FocusFriend ᔔ
3
- emoji: ☼
4
  colorFrom: indigo
5
  colorTo: yellow
6
  sdk: gradio
@@ -20,6 +19,8 @@ tags:
20
  - tiny-titan
21
  - anishinaabe
22
  - solarpunk
 
 
23
  ---
24
 
25
  # ◈──◆──◇ ᐴ FOCUSFRIEND ᔔ PIP, YOUR CEDAR-AND-SUN COMPANION ON THE LAKE ◇──◆──◈
@@ -44,8 +45,8 @@ therapy and wants you to actually feel better, not just hear platitudes."
44
  ## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
45
 
46
  - Python 3.10+
47
- - ~7.7GB disk for GGUF model
48
- - ~12GB RAM (CPU inference) or Metal/CUDA for GPU
49
 
50
  ## ☼ AABAJITOOWINAN / INSTALLATION ◈
51
 
@@ -54,10 +55,14 @@ git clone https://github.com/nbiish/focusfriend.git
54
  cd focusfriend
55
  pip install -r requirements.txt
56
 
57
- # Download Gemma 4 12B GGUF model
58
- huggingface-cli download unsloth/gemma-4-12b-it-GGUF \
59
- --include "gemma-4-12b-it-Q4_K_M.gguf" \
60
- --local-dir ./models
 
 
 
 
61
 
62
  python app.py
63
  ```
@@ -66,9 +71,11 @@ Then open <http://localhost:7862/>.
66
 
67
  ## ☼ ZHOONIYAAWICHIGEWIN / MODEL ◈
68
 
69
- | Model | Size | Purpose | License |
70
- |-------|------|---------|---------|
71
- | Gemma 4 12B (Q4_K_M) | 12B params, ~7.7GB | Conversational AI + wellness guidance | Apache 2.0 (Gemma) |
 
 
72
 
73
  ## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
74
 
@@ -83,9 +90,10 @@ Runs with `mcp_server=True` — Streamable HTTP MCP server at `/gradio/gradio_ap
83
  ## ☼ GIIZHIITAA / BADGES ◈
84
 
85
  - 🎨 **Off-Brand** — Anishinaabe-Solarpunk CSS theme with sun-amber gradients
86
- - 🔌 **Off the Grid** — Fully local, no API calls
87
  - 📓 **Field Notes** — Blog post about AI wellness companions
88
- - 🦙 **Tiny Titan** — Model option ≤4B available
 
 
89
 
90
  ## ☼ GANAWAABANDAAN / MEET PIP ◈
91
 
 
1
  ---
2
  title: ᐴ FocusFriend ᔔ
 
3
  colorFrom: indigo
4
  colorTo: yellow
5
  sdk: gradio
 
19
  - tiny-titan
20
  - anishinaabe
21
  - solarpunk
22
+ - inference-api
23
+ - cooldowns
24
  ---
25
 
26
  # ◈──◆──◇ ᐴ FOCUSFRIEND ᔔ PIP, YOUR CEDAR-AND-SUN COMPANION ON THE LAKE ◇──◆──◈
 
45
  ## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
46
 
47
  - Python 3.10+
48
+ - A Hugging Face token (anonymous works for many small models)
49
+ - ~100MB disk, ~256MB RAM inference is serverless
50
 
51
  ## ☼ AABAJITOOWINAN / INSTALLATION ◈
52
 
 
55
  cd focusfriend
56
  pip install -r requirements.txt
57
 
58
+ # Optional: pick a model (default: Qwen/Qwen2.5-7B-Instruct)
59
+ export INFERENCE_MODEL="Qwen/Qwen2.5-7B-Instruct"
60
+
61
+ # Optional: set the HF token
62
+ export HF_TOKEN="hf_..."
63
+
64
+ # Optional: tune the cooldown
65
+ export FOCUSFRIEND_COOLDOWN_SECONDS=10
66
 
67
  python app.py
68
  ```
 
71
 
72
  ## ☼ ZHOONIYAAWICHIGEWIN / MODEL ◈
73
 
74
+ | Model (default) | Size | Purpose | License |
75
+ |---|---|---|---|
76
+ | Qwen2.5-7B-Instruct | 7B | Wellness companion chat | Apache 2.0 |
77
+ | Meta-Llama-3-8B-Instruct | 8B | Alternative | Llama 3 Community |
78
+ | gemma-2-9b-it | 9B | Alternative | Gemma License |
79
 
80
  ## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
81
 
 
90
  ## ☼ GIIZHIITAA / BADGES ◈
91
 
92
  - 🎨 **Off-Brand** — Anishinaabe-Solarpunk CSS theme with sun-amber gradients
 
93
  - 📓 **Field Notes** — Blog post about AI wellness companions
94
+ - 🦙 **Tiny Titan** — Default model is 7B; can switch to 1.5B Qwen for true Tiny Titan
95
+ - 🌀 **Cooldowns** — Serverless inference with built-in credit protection
96
+ - ☁ **HF Inference API** — Uses Hugging Face serverless backend (no local GGUF build)
97
 
98
  ## ☼ GANAWAABANDAAN / MEET PIP ◈
99
 
projects/focusfriend/inference/llm.py CHANGED
@@ -1,103 +1,72 @@
1
  """
2
- LLM inference wrapper for FocusFriend using llama.cpp + Gemma 4 12B.
3
-
4
- Handles lazy loading, streaming, and fallback behavior.
 
 
 
 
 
 
 
 
 
5
  """
 
6
 
 
7
  import os
 
8
  import threading
9
- import logging
10
  from pathlib import Path
11
- from typing import Optional, Generator, List, Dict
12
 
13
  log = logging.getLogger("focusfriend.inference")
14
 
15
- # Singleton
16
- _llm = None
17
- _llm_lock = threading.Lock()
18
-
19
- # Default model path
20
- DEFAULT_MODEL_DIR = Path(os.environ.get("FOCUSFRIEND_MODEL_DIR", Path(__file__).parent.parent / "models"))
21
- DEFAULT_MODEL_PATH = os.environ.get(
22
- "GEMMA_MODEL_PATH",
23
- str(DEFAULT_MODEL_DIR / "gemma-4-12b-it-Q4_K_M.gguf"),
 
 
 
 
 
24
  )
25
- DEFAULT_N_CTX = int(os.environ.get("GEMMA_N_CTX", "8192"))
26
- DEFAULT_N_THREADS = int(os.environ.get("GEMMA_N_THREADS", str(os.cpu_count() or 4)))
27
 
28
 
29
- def load_model(
30
- model_path: str = None,
31
- n_ctx: int = None,
32
- n_threads: int = None,
33
- ) -> Optional[object]:
34
- """
35
- Load the Gemma 4 12B GGUF model via llama.cpp.
36
 
37
- Args:
38
- model_path: Path to GGUF file. Uses env var / default if not provided.
39
- n_ctx: Context window size. Default 8192.
40
- n_threads: CPU threads. Default all cores.
41
 
42
- Returns:
43
- Llama instance or None if loading fails.
44
- """
45
- global _llm
46
-
47
- if _llm is not None:
48
- return _llm
49
-
50
- with _llm_lock:
51
- if _llm is not None:
52
- return _llm
53
-
54
- model_path = model_path or DEFAULT_MODEL_PATH
55
- n_ctx = n_ctx or DEFAULT_N_CTX
56
- n_threads = n_threads or DEFAULT_N_THREADS
57
-
58
- gguf_path = Path(model_path)
59
- if not gguf_path.exists():
60
- log.warning(
61
- f"Model not found at {gguf_path}. "
62
- f"Download: huggingface-cli download unsloth/gemma-4-12b-it-GGUF "
63
- f"--include 'gemma-4-12b-it-Q4_K_M.gguf' --local-dir {DEFAULT_MODEL_DIR}"
64
- )
65
- return None
66
-
67
- try:
68
- from llama_cpp import Llama
69
-
70
- log.info(f"Loading Gemma 4 12B from {gguf_path}")
71
- log.info(f" n_ctx={n_ctx}, n_threads={n_threads}")
72
-
73
- _llm = Llama(
74
- model_path=str(gguf_path),
75
- n_ctx=n_ctx,
76
- n_threads=n_threads,
77
- verbose=False,
78
- )
79
- log.info("Gemma 4 12B loaded successfully ✓")
80
- return _llm
81
-
82
- except ImportError:
83
- log.warning("llama-cpp-python not installed. pip install llama-cpp-python")
84
- return None
85
- except Exception as exc:
86
- log.error(f"Failed to load Gemma 4 12B: {exc}")
87
- return None
88
-
89
-
90
- def get_model() -> Optional[object]:
91
- """Get the current LLM instance (lazy-loads if needed)."""
92
- global _llm
93
- if _llm is not None:
94
- return _llm
95
- return load_model()
96
 
97
 
98
- def is_model_available() -> bool:
99
- """Check if the LLM is loaded and ready."""
100
- return _llm is not None
 
 
 
 
 
 
 
 
 
 
101
 
102
 
103
  def generate_response(
@@ -105,30 +74,24 @@ def generate_response(
105
  temperature: float = 0.8,
106
  max_tokens: int = 300,
107
  ) -> Optional[str]:
108
- """
109
- Generate a non-streaming response from the model.
110
-
111
- Args:
112
- messages: List of {'role': ..., 'content': ...} dicts
113
- temperature: Generation temperature
114
- max_tokens: Max output tokens
115
 
116
- Returns:
117
- Generated text or None on failure
118
  """
119
- model = get_model()
120
- if model is None:
121
  return None
122
-
123
  try:
124
- response = model.create_chat_completion(
 
125
  messages=messages,
 
126
  temperature=temperature,
127
- max_tokens=max_tokens,
128
  )
129
- return response["choices"][0]["message"]["content"]
130
  except Exception as exc:
131
- log.error(f"Generation error: {exc}")
132
  return None
133
 
134
 
@@ -137,42 +100,40 @@ def generate_stream(
137
  temperature: float = 0.8,
138
  max_tokens: int = 300,
139
  ) -> Generator[str, None, None]:
140
- """
141
- Generate a streaming response from the model.
142
-
143
- Args:
144
- messages: List of {'role': ..., 'content': ...} dicts
145
- temperature: Generation temperature
146
- max_tokens: Max output tokens
147
 
148
- Yields:
149
- Text chunks as they arrive
 
150
  """
151
- model = get_model()
152
- if model is None:
153
- yield "⚠️ Model not loaded. I'm running on fallback mode right now."
154
  return
155
-
156
  try:
157
- stream = model.create_chat_completion(
 
158
  messages=messages,
 
159
  temperature=temperature,
160
- max_tokens=max_tokens,
161
- stream=True,
162
  )
163
-
164
- for chunk in stream:
165
- delta = chunk["choices"][0].get("delta", {})
166
- content = delta.get("content", "")
167
- if content:
168
- yield content
169
-
 
 
 
170
  except Exception as exc:
171
- log.error(f"Streaming error: {exc}")
172
  yield f"\n\n⚠️ Something went wrong: {exc}"
173
 
174
 
175
  def unload_model():
176
- """Release the model from memory."""
177
- global _llm
178
- _llm = None
 
 
 
 
1
  """
2
+ LLM inference wrapper for FocusFriend using the Hugging Face Inference API.
3
+
4
+ The previous version loaded a local GGUF (Gemma 4 12B Q4_K_M) via llama-cpp-python.
5
+ That required a heavy compile step on HF Spaces and tied us to a single model. This
6
+ version uses `huggingface_hub.InferenceClient` (serverless) and enforces a
7
+ project-scoped cooldown via `shared.inference_client` to protect your credit budget.
8
+
9
+ To override the model: set `INFERENCE_MODEL` env var.
10
+ Common picks:
11
+ - "Qwen/Qwen2.5-7B-Instruct" (default; sweet spot for chat)
12
+ - "meta-llama/Meta-Llama-3-8B-Instruct"
13
+ - "google/gemma-2-9b-it"
14
  """
15
+ from __future__ import annotations
16
 
17
+ import logging
18
  import os
19
+ import sys
20
  import threading
 
21
  from pathlib import Path
22
+ from typing import Generator, List, Dict, Optional
23
 
24
  log = logging.getLogger("focusfriend.inference")
25
 
26
+ # Add monorepo root so we can import shared.inference_client
27
+ _THIS = Path(__file__).resolve()
28
+ _PROJECT = _THIS.parent.parent
29
+ _REPO_ROOT = _PROJECT.parent
30
+ if str(_REPO_ROOT) not in sys.path:
31
+ sys.path.insert(0, str(_REPO_ROOT))
32
+
33
+ from shared.inference_client import ( # noqa: E402
34
+ InferenceResult,
35
+ chat_messages,
36
+ cooldown_status,
37
+ cooldown_active,
38
+ generate as _client_generate,
39
+ INFERENCE_MODEL as DEFAULT_MODEL,
40
  )
 
 
41
 
42
 
43
+ def _model() -> str:
44
+ """Pick the FocusFriend-specific model, falling back to the default."""
45
+ return os.environ.get("FOCUSFRIEND_MODEL", DEFAULT_MODEL)
 
 
 
 
46
 
 
 
 
 
47
 
48
+ def is_model_available() -> bool:
49
+ """True if the inference API is configured (token or anonymous)."""
50
+ if cooldown_active("focusfriend"):
51
+ return False
52
+ has_token = bool(os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
53
+ # Many small models work anonymously; don't gate hard.
54
+ return bool(_model())
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
 
57
+ def get_model() -> Optional[str]:
58
+ """Return the model id we plan to use. None if no model configured."""
59
+ if not _model():
60
+ return None
61
+ return _model()
62
+
63
+
64
+ def cooldown_snapshot() -> dict:
65
+ """Public status snapshot for the UI."""
66
+ return {
67
+ "model": _model(),
68
+ "cooldown": cooldown_status("focusfriend"),
69
+ }
70
 
71
 
72
  def generate_response(
 
74
  temperature: float = 0.8,
75
  max_tokens: int = 300,
76
  ) -> Optional[str]:
77
+ """One-shot generation. Returns text or None on cooldown/failure.
 
 
 
 
 
 
78
 
79
+ `messages` follows OpenAI chat format. Caller is responsible for system prompt
80
+ and prior turns.
81
  """
82
+ if cooldown_active("focusfriend"):
83
+ log.info("focusfriend inference skipped (cooldown active)")
84
  return None
 
85
  try:
86
+ result = _client_generate(
87
+ project="focusfriend",
88
  messages=messages,
89
+ max_new_tokens=max_tokens,
90
  temperature=temperature,
 
91
  )
92
+ return result.text
93
  except Exception as exc:
94
+ log.warning(f"HF Inference error: {exc}")
95
  return None
96
 
97
 
 
100
  temperature: float = 0.8,
101
  max_tokens: int = 300,
102
  ) -> Generator[str, None, None]:
103
+ """Streaming generator. Yields the full response in chunks.
 
 
 
 
 
 
104
 
105
+ The HF Inference API doesn't return true token-level streams from chat_completion
106
+ in the python client, so we yield the full text and let the UI's natural
107
+ chunking handle the appearance of streaming. Falls back to graceful error.
108
  """
109
+ if cooldown_active("focusfriend"):
110
+ yield "\n\n⏳ Pip is resting. (Inference cooldown — try again in a moment.)"
 
111
  return
 
112
  try:
113
+ result = _client_generate(
114
+ project="focusfriend",
115
  messages=messages,
116
+ max_new_tokens=max_tokens,
117
  temperature=temperature,
 
 
118
  )
119
+ # Simulate streaming by chunking the response on word boundaries
120
+ text = result.text
121
+ if not text:
122
+ yield "\n\n[No response]"
123
+ return
124
+ # Yield in word-sized chunks for natural reading pace
125
+ words = text.split(" ")
126
+ for i, word in enumerate(words):
127
+ chunk = word if i == 0 else " " + word
128
+ yield chunk
129
  except Exception as exc:
 
130
  yield f"\n\n⚠️ Something went wrong: {exc}"
131
 
132
 
133
  def unload_model():
134
+ """No-op for serverless inference (kept for API compat)."""
135
+ return
136
+
137
+
138
+ # Re-export for callers that still expect this
139
+ load_model = lambda *args, **kwargs: get_model() # noqa: E731
projects/focusfriend/requirements.txt CHANGED
@@ -1,12 +1,18 @@
1
- # FocusFriend — ASCII Wellness Companion
2
  # Python 3.10+
 
 
 
 
 
 
 
 
 
 
 
 
3
 
4
- # === Core ===
5
  gradio>=5.0
6
  numpy>=1.24
7
-
8
- # === LLM Inference (Gemma 4 12B) ===
9
- llama-cpp-python>=0.3.0
10
-
11
- # === Utilities ===
12
  huggingface_hub>=0.20
 
1
+ # FocusFriend — ASCII Wellness Companion "Pip"
2
  # Python 3.10+
3
+ #
4
+ # Inference is via the Hugging Face Inference API. No local GGUF, no
5
+ # llama-cpp-python compile step. Cooldown is enforced in
6
+ # `shared/inference_client.py` to protect your credit budget.
7
+ #
8
+ # Space env vars (Settings → Variables and secrets):
9
+ # HF_TOKEN — your HF token (anonymous works for many models)
10
+ # INFERENCE_MODEL — default model id
11
+ # FOCUSFRIEND_MODEL — override the model for this project
12
+ # FOCUSFRIEND_COOLDOWN_SECONDS — gap between inference calls (default 10)
13
+ # INFERENCE_PROVIDER — "hf-inference" (default) or paid provider
14
+ # INFERENCE_MAX_TOKENS — per-call cap (default 220)
15
 
 
16
  gradio>=5.0
17
  numpy>=1.24
 
 
 
 
 
18
  huggingface_hub>=0.20
projects/tinybard/README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: ᐴ TinyBard ᔔ
3
- emoji:
4
  colorFrom: blue
5
  colorTo: yellow
6
  sdk: gradio
@@ -11,25 +11,24 @@ license: apache-2.0
11
  tags:
12
  - text-adventure
13
  - interactive-fiction
14
- - llama-cpp
15
  - thousand-token-wood
16
  - build-small-hackathon
17
  - tiny-titan
18
- - llama-champion
19
  - off-brand
20
- - off-the-grid
21
  - mcp-server
22
  - anishinaabe
23
  - solarpunk
 
 
24
  ---
25
 
26
  # ◈──◆──◇ ᐴ TINYBARD ᔔ AADIZOOKAAN-AKINOOMAAGEWIN / STORY-TELLING ENGINE ◇──◆──◈
27
 
28
- > **A ≤4B LLM fires five-minute interactive text adventures in a cedar-and-copper CRT terminal.**
29
  >
30
  > ᐴ The land remembers the stories. ᔔ ☼ ☘ ≈
31
 
32
- TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML/CSS/JS frontend and **MCP server mode** enabled. Every adventure is procedurally generated — rooms, NPCs, items, and branching narratives on the fly.
33
 
34
  ## ◆ GASHKITOONAN / CAPABILITIES ◈
35
 
@@ -37,15 +36,16 @@ TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML
37
  - **◇ Three Aadizookaanan / Genres ◇** — Aadizookaan (Fantasy), Ish piming (Sci-Fi), Mashkodewaazibi (Cyberpunk)
38
  - **◇ Misko-Aki / CRT Terminal ◇** — Cedar-copper cabinet, sun-amber phosphor, frost-on-glass scanlines
39
  - **◇ MCP Kinoomaagewinan / Tools ◇** — `start_game` and `make_choice` exposed as MCP tools
40
- - **◇ Zhooniyaa / 100% Local ◇** — No cloud APIs. Runs on llama.cpp with GGUF quantized models
41
- - **◇ Bmaad-ziibi / Procedural Fallback ◇** — Full engine works without the LLM model loaded
42
- - **◇ Anishinaabe-Solarpunk ◇** — Sky-to-sunrise palette, Anishinaabe syllabic framings, biophilic motifs
 
43
 
44
  ## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
45
 
46
  - Python 3.10+
47
- - ~1GB disk for GGUF model
48
- - ~2GB RAM (CPU inference) or Metal/CUDA for GPU
49
 
50
  ## ◇ AABAJITOOWINAN / INSTALLATION ◈
51
 
@@ -54,12 +54,16 @@ git clone https://github.com/nbiish/tinybard.git
54
  cd tinybard
55
  pip install -r requirements.txt
56
 
57
- # Download model (Q8_0 quant, ~1.6GB)
58
- huggingface-cli download mradermacher/VibeThinker-1.5B-GGUF \
59
- --include "VibeThinker-1.5B.Q8_0.gguf" \
60
- --local-dir ./models
61
 
62
- export TINYBARD_MODEL_PATH=./models/VibeThinker-1.5B.Q8_0.gguf
 
 
 
 
63
 
64
  python app.py
65
  ```
@@ -82,42 +86,58 @@ hum with a faint violet energy...
82
  ## ☼ NAANAAGADAWENINDIZOWIN / VERIFICATION ◈
83
 
84
  ```bash
85
- curl -X POST http://localhost:7860/gradio/gradio_api/call/start_game \
86
  -H "Content-Type: application/json" \
87
- -d '{"data":["fantasy"]}'
 
 
 
 
 
 
88
  ```
89
 
90
- Returns SSE event stream with `story`, `choices`, `health`, `step`, `game_over`, `history_json`.
91
 
92
  ## ◈ MODEL ◇
93
 
94
- | Model | Size | Purpose | License |
95
- |-------|------|---------|---------|
96
- | VibeThinker 1.5B (Q8_0) | 1.5B params, ~1.6GB | Interactive story generation | Apache 2.0 |
 
97
 
98
- Fits the **Tiny Titan** badge (≤4B params). Runs on any laptop.
99
 
100
  ## ◇ MCP KINOOMAAGEWINAN / TOOLS ◈
101
 
102
- TinyBard runs with `mcp_server=True`, exposing these tools:
103
 
104
- - **`start_game(genre: str)`** Start a new adventure. Genre: `fantasy` / `scifi` / `cyberpunk`
105
- - **`make_choice(choice, genre, step, health, history_json)`** — Submit a player choice to advance the story
 
106
 
107
  Connect from any MCP client (Claude Desktop, Cursor, etc.) to the SSE endpoint at `/gradio/gradio_api/mcp/`.
108
 
109
- ## GIIZHIITAA / BADGE TARGETS ◇
110
 
111
- - **◆ Llama Champion** — Uses llama.cpp runtime
112
- - **◆ Tiny Titan** — Model is 1.5B (well under 4B limit)
113
  - **◆ Off-Brand** — Fully custom FastAPI+Gradio frontend
114
- - **◆ Off the Grid** — Fully local, no API calls
115
  - **◆ Field Notes** — Blog post about tiny model interactive fiction
116
 
117
  ## ☼ GANAWENDAAGWAD / SECURITY ◈
118
 
119
- PQC standard for any future API keys via the `pqc-secrets` skill (ML-KEM-768 + AES-256-GCM). At present, the model is loaded from local GGUF no key material in flight.
 
 
 
 
 
 
 
 
 
 
120
 
121
  ---
122
 
123
- ◈──◆──◇ ☼ TinyBard v1.0 · Cedar Edition · Anishinaabe Solarpunk ◇──◆──◈
 
1
  ---
2
  title: ᐴ TinyBard ᔔ
3
+ emoji: ☀️
4
  colorFrom: blue
5
  colorTo: yellow
6
  sdk: gradio
 
11
  tags:
12
  - text-adventure
13
  - interactive-fiction
 
14
  - thousand-token-wood
15
  - build-small-hackathon
16
  - tiny-titan
 
17
  - off-brand
 
18
  - mcp-server
19
  - anishinaabe
20
  - solarpunk
21
+ - inference-api
22
+ - cooldowns
23
  ---
24
 
25
  # ◈──◆──◇ ᐴ TINYBARD ᔔ AADIZOOKAAN-AKINOOMAAGEWIN / STORY-TELLING ENGINE ◇──◆──◈
26
 
27
+ > **A small LLM fires five-minute interactive text adventures in a cedar-and-copper CRT terminal.**
28
  >
29
  > ᐴ The land remembers the stories. ᔔ ☼ ☘ ≈
30
 
31
+ TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML/CSS/JS frontend, **MCP server mode** enabled, and an **HF Inference API** backend. Every adventure is procedurally generated — rooms, NPCs, items, and branching narratives on the fly.
32
 
33
  ## ◆ GASHKITOONAN / CAPABILITIES ◈
34
 
 
36
  - **◇ Three Aadizookaanan / Genres ◇** — Aadizookaan (Fantasy), Ish piming (Sci-Fi), Mashkodewaazibi (Cyberpunk)
37
  - **◇ Misko-Aki / CRT Terminal ◇** — Cedar-copper cabinet, sun-amber phosphor, frost-on-glass scanlines
38
  - **◇ MCP Kinoomaagewinan / Tools ◇** — `start_game` and `make_choice` exposed as MCP tools
39
+ - **◇ Giiwenaabik / Inference API ◇** — Serverless HF Inference API; no local GGUF, no build step
40
+ - **◇ Asabiikesiwin / Cooldown ◇** — 6s default between inference calls to protect your credit budget
41
+ - **◇ Bmaad-ziibi / Procedural Fallback ◇** — Full engine works without the LLM
42
+ - **◇ Anishinaabe-Solarpunk ◇** — Sky-to-sunrise palette, syllabic framings, biophilic motifs
43
 
44
  ## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
45
 
46
  - Python 3.10+
47
+ - A Hugging Face token (for the Inference API; many small models work anonymously)
48
+ - ~100MB disk, ~256MB RAM the model is serverless, not local
49
 
50
  ## ◇ AABAJITOOWINAN / INSTALLATION ◈
51
 
 
54
  cd tinybard
55
  pip install -r requirements.txt
56
 
57
+ # Optional: pick a model (default: Qwen/Qwen2.5-1.5B-Instruct — small + fast + free)
58
+ export INFERENCE_MODEL="Qwen/Qwen2.5-1.5B-Instruct"
59
+ # Or for the originally-intended VibeThinker 1.5B:
60
+ # export INFERENCE_MODEL="mradermacher/VibeThinker-1.5B-GGUF"
61
 
62
+ # Optional: set the HF token (anonymous works for many models)
63
+ export HF_TOKEN="hf_..."
64
+
65
+ # Optional: tune the cooldown
66
+ export TINYBARD_COOLDOWN_SECONDS=6
67
 
68
  python app.py
69
  ```
 
86
  ## ☼ NAANAAGADAWENINDIZOWIN / VERIFICATION ◈
87
 
88
  ```bash
89
+ curl -X POST http://localhost:7860/api/game/start \
90
  -H "Content-Type: application/json" \
91
+ -d '{"genre": "fantasy"}'
92
+ ```
93
+
94
+ Returns clean JSON: `{"story", "choices", "health", "step", "game_over", "history"}`.
95
+
96
+ ```bash
97
+ curl http://localhost:7860/api/model_status
98
  ```
99
 
100
+ Returns: `{"model": "...", "cooldown": {"active": bool, "remaining_seconds": float, "window_seconds": float}}`.
101
 
102
  ## ◈ MODEL ◇
103
 
104
+ | Model (default) | Size | Purpose | License |
105
+ |---|---|---|---|
106
+ | Qwen2.5-1.5B-Instruct | 1.5B | Interactive story generation | Apache 2.0 |
107
+ | VibeThinker 1.5B | 1.5B | Alternative — also tiny | Apache 2.0 |
108
 
109
+ Override `INFERENCE_MODEL` to any model that supports `chat_completion` on the HF Inference API. The 1.5B defaults fit the **Tiny Titan** badge.
110
 
111
  ## ◇ MCP KINOOMAAGEWINAN / TOOLS ◈
112
 
113
+ TinyBard runs with `mcp_server=True`, exposing these tools (also available as FastAPI endpoints):
114
 
115
+ - **`/api/game/start`** (POST `{"genre": "fantasy|scifi|cyberpunk"}`) Start an adventure
116
+ - **`/api/game/choice`** (POST `{choice, genre, step, health, history}`) — Submit a player choice
117
+ - **`/api/model_status`** (GET) — Check the inference model + cooldown state
118
 
119
  Connect from any MCP client (Claude Desktop, Cursor, etc.) to the SSE endpoint at `/gradio/gradio_api/mcp/`.
120
 
121
+ ## GIIZHIITAA / BADGE TARGETS ◇
122
 
123
+ - **◆ Tiny Titan** — Model ≤ 1.5B (well under 4B limit)
 
124
  - **◆ Off-Brand** — Fully custom FastAPI+Gradio frontend
 
125
  - **◆ Field Notes** — Blog post about tiny model interactive fiction
126
 
127
  ## ☼ GANAWENDAAGWAD / SECURITY ◈
128
 
129
+ PQC standard for any future API keys via the `pqc-secrets` skill (ML-KEM-768 + AES-256-GCM). At present, only the HF token is in flight (read from env var, never written to disk).
130
+
131
+ ## ◇ AABAAJICHIGANAN / COOLDOWNS ◈
132
+
133
+ The `shared/inference_client.py` module enforces per-project cooldowns. Cooldown protects your HF/Modal credit budget from runaway re-rolls. Defaults:
134
+
135
+ - `tinybard`: 6s
136
+ - `focusfriend`: 10s
137
+ - `crittercalm`: 12s
138
+
139
+ Override per project via Space env vars (`TINYBARD_COOLDOWN_SECONDS`, etc.).
140
 
141
  ---
142
 
143
+ ◈──◆──◇ ☼ TinyBard v1.1 · Cedar Edition · Anishinaabe Solarpunk · Inference API ◇──◆──◈
projects/tinybard/app.py CHANGED
@@ -17,6 +17,7 @@ import os
17
  import json
18
  import random
19
  import logging
 
20
  from pathlib import Path
21
  from typing import Optional, Dict, List
22
 
@@ -26,6 +27,19 @@ from fastapi.responses import HTMLResponse
26
  from fastapi.staticfiles import StaticFiles
27
  from gradio import mount_gradio_app
28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  logging.basicConfig(
30
  level=logging.INFO,
31
  format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
@@ -38,46 +52,34 @@ log = logging.getLogger("tinybard")
38
  BASE_DIR = Path(__file__).parent
39
  STATIC_DIR = BASE_DIR / "static"
40
 
41
- MODEL_PATH = os.environ.get(
42
- "TINYBARD_MODEL_PATH",
43
- str(Path("/Volumes/1tb-sandisk/ml-models/huggingface/models--mradermacher--VibeThinker-1.5B-GGUF/snapshots/d0d66139a78030a92a582f966b0f7cbbb3b19406/VibeThinker-1.5B.Q8_0.gguf"))
44
- )
45
 
46
  # ---------------------------------------------------------------------------
47
  # Llama.cpp Inference Setup
48
  # ---------------------------------------------------------------------------
49
- _llm = None
50
- _llm_failed = False
51
 
52
 
53
- def get_llm():
54
- """Lazy-load the GGUF model via llama-cpp-python."""
55
- global _llm, _llm_failed
56
- if _llm is not None:
57
- return _llm
58
- if _llm_failed:
59
- return None
 
60
 
61
- if not Path(MODEL_PATH).exists():
62
- log.warning(f"Model file not found at {MODEL_PATH}. Fallback mode active.")
63
- _llm_failed = True
64
- return None
65
 
66
- try:
67
- from llama_cpp import Llama
68
- log.info(f"Loading VibeThinker-1.5B from {MODEL_PATH} ...")
69
- _llm = Llama(
70
- model_path=MODEL_PATH,
71
- n_ctx=2048,
72
- n_threads=int(os.environ.get("TINYBARD_THREADS", "4")),
73
- verbose=False,
74
- )
75
- log.info("Model loaded successfully ✓")
76
- return _llm
77
- except Exception as e:
78
- log.error(f"Failed to load LLM model: {e}")
79
- _llm_failed = True
80
- return None
81
 
82
 
83
  # ---------------------------------------------------------------------------
@@ -189,44 +191,72 @@ def generate_procedural_step(genre: str, step: int, health: int, choice: str = "
189
 
190
 
191
  # ---------------------------------------------------------------------------
192
- # LLM Generation Logic
193
  # ---------------------------------------------------------------------------
194
- def generate_llm_story(prompt: str, max_tokens: int = 150) -> str:
195
- """Generate story text via llama.cpp."""
196
- llm = get_llm()
197
- if not llm:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
198
  return ""
199
  try:
200
- response = llm(
201
- prompt,
202
- max_tokens=max_tokens,
 
 
203
  temperature=0.7,
204
- stop=["\n\n", "User:", "Narrator:"],
205
  )
206
- return response["choices"][0]["text"].strip()
 
 
 
207
  except Exception as e:
208
- log.error(f"LLM generation error: {e}")
209
  return ""
210
 
211
 
212
- def format_prompt(genre: str, history: List[Dict[str, str]], next_instruction: str) -> str:
213
- """Build the narrative prompt for the LLM."""
214
- prompt = (
215
- "You are the narrator of an interactive text adventure game.\n"
216
- f"Genre: {genre}\n"
217
- "Rules:\n"
218
- "1. Write in the second person ('You...').\n"
219
- "2. Keep descriptions highly atmospheric, but short (under 3 sentences).\n"
220
- "3. Focus on action, mystery, and choice.\n\n"
221
  )
222
- for h in history:
223
- if h["role"] == "player":
224
- prompt += f"Player choice: {h['text']}\n"
225
- else:
226
- prompt += f"Narrator: {h['text']}\n"
227
-
228
- prompt += f"{next_instruction}\n"
229
- return prompt
 
 
 
230
 
231
 
232
  # ---------------------------------------------------------------------------
@@ -257,18 +287,11 @@ def create_gradio_app() -> gr.Blocks:
257
  if genre not in ["fantasy", "scifi", "cyberpunk"]:
258
  genre = "fantasy"
259
 
260
- llm = get_llm()
261
- if not llm:
262
- result = generate_procedural_step(genre, 0, 100)
263
- return (
264
- result["story"], result["choices"], result["health"],
265
- result["step"], result["game_over"],
266
- json.dumps(result.get("history", []))
267
- )
268
-
269
  instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
270
- story = generate_llm_story(format_prompt(genre, [], instruction))
271
  if not story:
 
272
  result = generate_procedural_step(genre, 0, 100)
273
  return (
274
  result["story"], result["choices"], result["health"],
@@ -277,15 +300,11 @@ def create_gradio_app() -> gr.Blocks:
277
  )
278
 
279
  history = [{"role": "narrator", "text": story}]
280
- choices_instruction = (
281
- "Provide exactly 3 short, distinct choices for the player. "
282
- "Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
283
- )
284
- choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
285
-
286
- choices = _parse_choices(choices_text)
287
  if len(choices) < 2:
288
- choices = ["Explore the area", "Check your equipment", "Proceed carefully"]
 
 
289
 
290
  return (story, choices[:3], 100, 1, False, json.dumps(history))
291
 
@@ -296,18 +315,10 @@ def create_gradio_app() -> gr.Blocks:
296
  except Exception:
297
  history = []
298
 
299
- llm = get_llm()
300
  step = int(step)
301
  health = int(health)
302
 
303
- if not llm:
304
- result = generate_procedural_step(genre, step, health, choice)
305
- return (
306
- result["story"], result["choices"], result["health"],
307
- result["step"], result["game_over"],
308
- json.dumps(result.get("history", history))
309
- )
310
-
311
  history.append({"role": "player", "text": choice})
312
 
313
  health_delta = random.choice([-15, 0, 10])
@@ -315,7 +326,7 @@ def create_gradio_app() -> gr.Blocks:
315
 
316
  if new_health <= 0:
317
  instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
318
- story = generate_llm_story(format_prompt(genre, history, instruction))
319
  return (
320
  story or "Your strength fails. The adventure ends in darkness.",
321
  [], 0, step + 1, True, json.dumps(history)
@@ -323,14 +334,14 @@ def create_gradio_app() -> gr.Blocks:
323
 
324
  if step >= 4:
325
  instruction = "Narrate the final glorious victory. The adventure ends in success."
326
- story = generate_llm_story(format_prompt(genre, history, instruction))
327
  return (
328
  story or "You have achieved your goal! You are victorious!",
329
  [], new_health, step + 1, True, json.dumps(history)
330
  )
331
 
332
  instruction = "Narrate what happens next as a result of the player's choice."
333
- story = generate_llm_story(format_prompt(genre, history, instruction))
334
  if not story:
335
  result = generate_procedural_step(genre, step, health, choice)
336
  return (
@@ -341,13 +352,7 @@ def create_gradio_app() -> gr.Blocks:
341
 
342
  history.append({"role": "narrator", "text": story})
343
 
344
- choices_instruction = (
345
- "Provide exactly 3 short, distinct choices. "
346
- "Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
347
- )
348
- choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
349
-
350
- choices = _parse_choices(choices_text)
351
  if len(choices) < 2:
352
  choices = ["Move forward", "Look around", "Rest a moment"]
353
 
@@ -400,13 +405,8 @@ async def homepage():
400
  return HTMLResponse("<h1>TinyBard retro terminal under construction!</h1>")
401
  @fastapi_app.get("/api/model_status")
402
  async def model_status():
403
- """Check if the LLM is loaded."""
404
- llm = get_llm()
405
- return {
406
- "available": llm is not None,
407
- "model_path": MODEL_PATH,
408
- "fallback": _llm_failed
409
- }
410
 
411
 
412
  # ---------------------------------------------------------------------------
@@ -418,23 +418,20 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
418
  Returns a dict the frontend can consume directly. Used by both the
419
  FastAPI /api/game/* endpoints and the Gradio MCP tools.
420
  """
421
- llm = get_llm()
 
 
422
 
423
  if step == 0:
424
  # New game
425
- if not llm:
426
  return generate_procedural_step(genre, 0, 100)
427
  instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
428
- story = generate_llm_story(format_prompt(genre, [], instruction))
429
  if not story:
430
  return generate_procedural_step(genre, 0, 100)
431
  history = [{"role": "narrator", "text": story}]
432
- choices_instruction = (
433
- "Provide exactly 3 short, distinct choices for the player. "
434
- "Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
435
- )
436
- choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
437
- choices = _parse_choices(choices_text)
438
  if len(choices) < 2:
439
  choices = ["Explore the area", "Check your equipment", "Proceed carefully"]
440
  return {
@@ -443,7 +440,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
443
  }
444
 
445
  # Subsequent turn
446
- if not llm:
447
  return generate_procedural_step(genre, step, health, choice)
448
 
449
  history.append({"role": "player", "text": choice})
@@ -452,7 +449,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
452
 
453
  if new_health <= 0:
454
  instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
455
- story = generate_llm_story(format_prompt(genre, history, instruction))
456
  return {
457
  "story": story or "Your strength fails. The adventure ends in darkness.",
458
  "choices": [], "health": 0, "step": step + 1, "game_over": True,
@@ -461,7 +458,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
461
 
462
  if step >= 4:
463
  instruction = "Narrate the final glorious victory. The adventure ends in success."
464
- story = generate_llm_story(format_prompt(genre, history, instruction))
465
  return {
466
  "story": story or "You have achieved your goal! You are victorious!",
467
  "choices": [], "health": new_health, "step": step + 1, "game_over": True,
@@ -469,17 +466,12 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
469
  }
470
 
471
  instruction = "Narrate what happens next as a result of the player's choice."
472
- story = generate_llm_story(format_prompt(genre, history, instruction))
473
  if not story:
474
  return generate_procedural_step(genre, step, health, choice)
475
  history.append({"role": "narrator", "text": story})
476
 
477
- choices_instruction = (
478
- "Provide exactly 3 short, distinct choices. "
479
- "Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
480
- )
481
- choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
482
- choices = _parse_choices(choices_text)
483
  if len(choices) < 2:
484
  choices = ["Move forward", "Look around", "Rest a moment"]
485
  return {
 
17
  import json
18
  import random
19
  import logging
20
+ import sys
21
  from pathlib import Path
22
  from typing import Optional, Dict, List
23
 
 
27
  from fastapi.staticfiles import StaticFiles
28
  from gradio import mount_gradio_app
29
 
30
+ # Inference client with cooldown (no local GGUF, no llama-cpp-python build!)
31
+ # Path layout: monorepo/shared/inference_client.py — go up two parents from this file.
32
+ sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
33
+ from shared.inference_client import (
34
+ InferenceResult,
35
+ cooldown_status,
36
+ cooldown_remaining,
37
+ cooldown_active,
38
+ generate as inference_generate,
39
+ chat_messages,
40
+ INFERENCE_MODEL,
41
+ )
42
+
43
  logging.basicConfig(
44
  level=logging.INFO,
45
  format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
 
52
  BASE_DIR = Path(__file__).parent
53
  STATIC_DIR = BASE_DIR / "static"
54
 
55
+ # Use HF Inference API (VibeThinker 1.5B by default — small, fast, free tier).
56
+ # Override via Space env var: INFERENCE_MODEL.
57
+ # Cooldown enforced in shared.inference_client.
58
+ TINYBARD_MODEL = os.environ.get("TINYBARD_MODEL", INFERENCE_MODEL)
59
 
60
  # ---------------------------------------------------------------------------
61
  # Llama.cpp Inference Setup
62
  # ---------------------------------------------------------------------------
63
+ # No local LLM state — every inference call goes through the HF Inference API
64
+ # with cooldown enforcement. Procedural fallback is always available.
65
 
66
 
67
+ def llm_available() -> bool:
68
+ """True if we *might* succeed at an inference call (cooldown not active,
69
+ HF_TOKEN configured, model id is set)."""
70
+ import os
71
+ if not os.environ.get("HF_TOKEN") and not os.environ.get("HUGGINGFACEHUB_API_TOKEN"):
72
+ # Inference API still works anonymously for some models, so don't gate hard.
73
+ pass
74
+ return bool(TINYBARD_MODEL) and not cooldown_active("tinybard")
75
 
 
 
 
 
76
 
77
+ def last_inference_status() -> dict:
78
+ """Snapshot of the current cooldown + model for /api/model_status."""
79
+ return {
80
+ "model": TINYBARD_MODEL,
81
+ "cooldown": cooldown_status("tinybard"),
82
+ }
 
 
 
 
 
 
 
 
 
83
 
84
 
85
  # ---------------------------------------------------------------------------
 
191
 
192
 
193
  # ---------------------------------------------------------------------------
194
+ # LLM Generation Logic (HF Inference API + cooldown)
195
  # ---------------------------------------------------------------------------
196
+ def _parse_messages(genre: str, history: List[Dict[str, str]], next_instruction: str) -> list[Dict[str, str]]:
197
+ """Translate internal history into OpenAI-style chat messages."""
198
+ system = (
199
+ "You are the narrator of an interactive text adventure game. "
200
+ f"Genre: {genre}. Write in the second person ('You...'). "
201
+ "Keep descriptions highly atmospheric but short (under 3 sentences). "
202
+ "Focus on action, mystery, and choice. Do not offer numbered choices unless asked."
203
+ )
204
+ msgs: List[Dict[str, str]] = [{"role": "system", "content": system}]
205
+ for h in (history or []):
206
+ if h.get("role") == "player":
207
+ msgs.append({"role": "user", "content": h["text"]})
208
+ elif h.get("role") == "narrator":
209
+ msgs.append({"role": "assistant", "content": h["text"]})
210
+ msgs.append({"role": "user", "content": next_instruction})
211
+ return msgs
212
+
213
+
214
+ def generate_llm_story(
215
+ genre: str,
216
+ history: List[Dict[str, str]],
217
+ next_instruction: str,
218
+ max_tokens: int = 180,
219
+ ) -> str:
220
+ """Generate story text via HF Inference API (with cooldown)."""
221
+ if cooldown_active("tinybard"):
222
+ log.info("tinybard inference skipped (cooldown active)")
223
  return ""
224
  try:
225
+ msgs = _parse_messages(genre, history, next_instruction)
226
+ result = inference_generate(
227
+ project="tinybard",
228
+ messages=msgs,
229
+ max_new_tokens=max_tokens,
230
  temperature=0.7,
 
231
  )
232
+ return result.text
233
+ except RuntimeError:
234
+ # Cooldown — let caller fall back
235
+ return ""
236
  except Exception as e:
237
+ log.warning(f"HF Inference error (fallback to procedural): {e}")
238
  return ""
239
 
240
 
241
+ def generate_llm_choices(genre: str, story_context: str) -> List[str]:
242
+ """Ask the LLM to produce 3 short distinct choices for the player."""
243
+ if cooldown_active("tinybard"):
244
+ return []
245
+ system = (
246
+ "You generate 3 short, distinct player choices for an interactive text adventure. "
247
+ "Output exactly in the format: 1. <choice> | 2. <choice> | 3. <choice>"
 
 
248
  )
249
+ user = f"Genre: {genre}. Last story beat: {story_context[:400]}. Give 3 choices."
250
+ try:
251
+ result = inference_generate(
252
+ project="tinybard",
253
+ messages=[{"role": "system", "content": system}, {"role": "user", "content": user}],
254
+ max_new_tokens=80,
255
+ temperature=0.8,
256
+ )
257
+ return _parse_choices(result.text)
258
+ except Exception:
259
+ return []
260
 
261
 
262
  # ---------------------------------------------------------------------------
 
287
  if genre not in ["fantasy", "scifi", "cyberpunk"]:
288
  genre = "fantasy"
289
 
290
+ # Try LLM first (will skip if cooldown is active)
 
 
 
 
 
 
 
 
291
  instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
292
+ story = generate_llm_story(genre, [], instruction)
293
  if not story:
294
+ # Procedural fallback
295
  result = generate_procedural_step(genre, 0, 100)
296
  return (
297
  result["story"], result["choices"], result["health"],
 
300
  )
301
 
302
  history = [{"role": "narrator", "text": story}]
303
+ choices = generate_llm_choices(genre, story)
 
 
 
 
 
 
304
  if len(choices) < 2:
305
+ # Use the procedural choices
306
+ fallback = generate_procedural_step(genre, 0, 100)
307
+ choices = fallback["choices"]
308
 
309
  return (story, choices[:3], 100, 1, False, json.dumps(history))
310
 
 
315
  except Exception:
316
  history = []
317
 
 
318
  step = int(step)
319
  health = int(health)
320
 
321
+ # First try LLM narration
 
 
 
 
 
 
 
322
  history.append({"role": "player", "text": choice})
323
 
324
  health_delta = random.choice([-15, 0, 10])
 
326
 
327
  if new_health <= 0:
328
  instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
329
+ story = generate_llm_story(genre, history, instruction)
330
  return (
331
  story or "Your strength fails. The adventure ends in darkness.",
332
  [], 0, step + 1, True, json.dumps(history)
 
334
 
335
  if step >= 4:
336
  instruction = "Narrate the final glorious victory. The adventure ends in success."
337
+ story = generate_llm_story(genre, history, instruction)
338
  return (
339
  story or "You have achieved your goal! You are victorious!",
340
  [], new_health, step + 1, True, json.dumps(history)
341
  )
342
 
343
  instruction = "Narrate what happens next as a result of the player's choice."
344
+ story = generate_llm_story(genre, history, instruction)
345
  if not story:
346
  result = generate_procedural_step(genre, step, health, choice)
347
  return (
 
352
 
353
  history.append({"role": "narrator", "text": story})
354
 
355
+ choices = generate_llm_choices(genre, story)
 
 
 
 
 
 
356
  if len(choices) < 2:
357
  choices = ["Move forward", "Look around", "Rest a moment"]
358
 
 
405
  return HTMLResponse("<h1>TinyBard retro terminal under construction!</h1>")
406
  @fastapi_app.get("/api/model_status")
407
  async def model_status():
408
+ """Check the inference client + cooldown status."""
409
+ return last_inference_status()
 
 
 
 
 
410
 
411
 
412
  # ---------------------------------------------------------------------------
 
418
  Returns a dict the frontend can consume directly. Used by both the
419
  FastAPI /api/game/* endpoints and the Gradio MCP tools.
420
  """
421
+ # Cooldown short-circuit: if active, the game just uses the procedural
422
+ # engine for this turn. This protects your HF/Modal credit budget.
423
+ in_cooldown = cooldown_active("tinybard")
424
 
425
  if step == 0:
426
  # New game
427
+ if in_cooldown:
428
  return generate_procedural_step(genre, 0, 100)
429
  instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
430
+ story = generate_llm_story(genre, [], instruction)
431
  if not story:
432
  return generate_procedural_step(genre, 0, 100)
433
  history = [{"role": "narrator", "text": story}]
434
+ choices = generate_llm_choices(genre, story)
 
 
 
 
 
435
  if len(choices) < 2:
436
  choices = ["Explore the area", "Check your equipment", "Proceed carefully"]
437
  return {
 
440
  }
441
 
442
  # Subsequent turn
443
+ if in_cooldown:
444
  return generate_procedural_step(genre, step, health, choice)
445
 
446
  history.append({"role": "player", "text": choice})
 
449
 
450
  if new_health <= 0:
451
  instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
452
+ story = generate_llm_story(genre, history, instruction)
453
  return {
454
  "story": story or "Your strength fails. The adventure ends in darkness.",
455
  "choices": [], "health": 0, "step": step + 1, "game_over": True,
 
458
 
459
  if step >= 4:
460
  instruction = "Narrate the final glorious victory. The adventure ends in success."
461
+ story = generate_llm_story(genre, history, instruction)
462
  return {
463
  "story": story or "You have achieved your goal! You are victorious!",
464
  "choices": [], "health": new_health, "step": step + 1, "game_over": True,
 
466
  }
467
 
468
  instruction = "Narrate what happens next as a result of the player's choice."
469
+ story = generate_llm_story(genre, history, instruction)
470
  if not story:
471
  return generate_procedural_step(genre, step, health, choice)
472
  history.append({"role": "narrator", "text": story})
473
 
474
+ choices = generate_llm_choices(genre, story)
 
 
 
 
 
475
  if len(choices) < 2:
476
  choices = ["Move forward", "Look around", "Rest a moment"]
477
  return {
projects/tinybard/requirements.txt CHANGED
@@ -1,7 +1,18 @@
1
  # TinyBard — Micro Text Adventure Generator
2
  # Python 3.10+
 
 
 
 
 
 
 
 
 
 
 
3
 
4
  gradio>=5.0
5
- numpy>=1.24
6
- llama-cpp-python>=0.3.0
7
  huggingface_hub>=0.20
 
 
1
  # TinyBard — Micro Text Adventure Generator
2
  # Python 3.10+
3
+ #
4
+ # Inference is via the Hugging Face Inference API (no local GGUF,
5
+ # no llama-cpp-python compile). Cooldown is enforced in
6
+ # `shared/inference_client.py` to protect your credit budget.
7
+ #
8
+ # Set these Space secrets/variables to configure:
9
+ # HF_TOKEN — your HF token (anonymous works for many small models)
10
+ # INFERENCE_MODEL — model id (default: Qwen/Qwen2.5-1.5B-Instruct)
11
+ # TINYBARD_COOLDOWN_SECONDS — gap between inference calls (default 6)
12
+ # INFERENCE_PROVIDER — "hf-inference" (default, free serverless) or paid
13
+ # INFERENCE_MAX_TOKENS — per-call token cap (default 220)
14
 
15
  gradio>=5.0
16
+ fastapi>=0.110
 
17
  huggingface_hub>=0.20
18
+ uvicorn[standard]>=0.27
projects/tinybard/static/main.js CHANGED
@@ -36,18 +36,26 @@ async function checkModelStatus() {
36
  try {
37
  const resp = await fetch(`${GRADIO_CLIENT_URL}/api/model_status`);
38
  if (!resp.ok) return;
39
- const status = await resp.json();
40
- if (status.available) {
41
- modelStatus.textContent = "☘ MODEL: MII-GIIWETA / READY";
 
 
 
 
 
42
  modelStatus.style.color = "var(--asp-sun)";
43
  } else {
44
- modelStatus.textContent = "☘ MODEL: GIIZHIK-WIIKI / FALLBACK";
45
  modelStatus.style.color = "var(--asp-frost)";
46
  }
47
  } catch {
48
  modelStatus.textContent = "☘ MODEL: ?";
49
  }
50
  }
 
 
 
51
  async function apiCall(endpoint, payload) {
52
  // Use the FastAPI clean-JSON endpoints (returns a dict directly).
53
  // /api/game/start -> start_game
 
36
  try {
37
  const resp = await fetch(`${GRADIO_CLIENT_URL}/api/model_status`);
38
  if (!resp.ok) return;
39
+ const s = await resp.json();
40
+ const model = s.model || "inference";
41
+ const cd = s.cooldown || { active: false, remaining_seconds: 0, window_seconds: 0 };
42
+ if (cd.active) {
43
+ modelStatus.textContent = `☘ ${model} / COOLDOWN ${cd.remaining_seconds.toFixed(1)}s`;
44
+ modelStatus.style.color = "var(--asp-ember)";
45
+ } else if (model) {
46
+ modelStatus.textContent = `☘ ${model} / READY`;
47
  modelStatus.style.color = "var(--asp-sun)";
48
  } else {
49
+ modelStatus.textContent = "☘ NO MODEL / FALLBACK";
50
  modelStatus.style.color = "var(--asp-frost)";
51
  }
52
  } catch {
53
  modelStatus.textContent = "☘ MODEL: ?";
54
  }
55
  }
56
+
57
+ // Poll model status every 2s so cooldown countdown updates
58
+ setInterval(checkModelStatus, 2000);
59
  async function apiCall(endpoint, payload) {
60
  // Use the FastAPI clean-JSON endpoints (returns a dict directly).
61
  // /api/game/start -> start_game
shared/inference_client.py ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Shared HF Inference Client + Cooldown
3
+ ======================================
4
+ Lightweight wrapper around `huggingface_hub.InferenceClient` with:
5
+
6
+ - Per-call cooldown to prevent credit burn on live HF Spaces
7
+ - Async-friendly API
8
+ - Auto-fallback to procedural/story-template engines when inference fails
9
+ - Environment-driven config (works in HF Spaces and local)
10
+
11
+ The cooldown model:
12
+ - Each project has its own cooldown window (default 8s for cheap inference APIs)
13
+ - Within a session, after a successful inference, no new call can run until cooldown expires
14
+ - Failed inference does not start a cooldown (allow quick retry)
15
+ - `cooldown_active()` is the public check; FastAPI handlers short-circuit on active cooldown
16
+ """
17
+ from __future__ import annotations
18
+
19
+ import os
20
+ import time
21
+ import logging
22
+ import threading
23
+ from dataclasses import dataclass, field
24
+ from typing import Optional, Dict, Any, Callable, List
25
+
26
+ log = logging.getLogger("inference")
27
+
28
+ # ── Environment knobs ─────────────────────────────────────────────────────────
29
+ # Override these in your Space's "Settings → Variables and secrets".
30
+
31
+ # The HF model id used for text generation (VibeThinker 1.5B, Gemma 4 12B, etc.)
32
+ INFERENCE_MODEL = os.environ.get(
33
+ "INFERENCE_MODEL",
34
+ "Qwen/Qwen2.5-1.5B-Instruct", # small, fast, free-tier friendly
35
+ )
36
+
37
+ # Provider: "hf-inference" (free serverless), "together", "fal-ai", "replicate"
38
+ # Free HF inference works for many small models; otherwise use a paid provider.
39
+ INFERENCE_PROVIDER = os.environ.get("INFERENCE_PROVIDER", "hf-inference")
40
+
41
+ # Token — read from HF Space secrets at runtime.
42
+ HF_TOKEN = os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
43
+
44
+ # Default cooldown between inferences, in seconds.
45
+ COOLDOWN_SECONDS = float(os.environ.get("INFERENCE_COOLDOWN_SECONDS", "8"))
46
+
47
+ # Per-project override (keyed by app name)
48
+ PROJECT_COOLDOWN_OVERRIDES = {
49
+ "tinybard": float(os.environ.get("TINYBARD_COOLDOWN_SECONDS", "6")),
50
+ "focusfriend": float(os.environ.get("FOCUSFRIEND_COOLDOWN_SECONDS", "10")),
51
+ "crittercalm": float(os.environ.get("CRITTERCALM_COOLDOWN_SECONDS", "12")),
52
+ }
53
+
54
+ # Max tokens to request (keeps costs bounded)
55
+ MAX_NEW_TOKENS = int(os.environ.get("INFERENCE_MAX_TOKENS", "220"))
56
+
57
+
58
+ # ── Cooldown registry ────────────────────────────────────────────────────────
59
+ @dataclass
60
+ class _CooldownState:
61
+ last_call: float = 0.0
62
+ lock: threading.Lock = field(default_factory=threading.Lock)
63
+
64
+
65
+ _states: Dict[str, _CooldownState] = {}
66
+
67
+
68
+ def _state(project: str) -> _CooldownState:
69
+ if project not in _states:
70
+ _states[project] = _CooldownState()
71
+ return _states[project]
72
+
73
+
74
+ def cooldown_seconds_for(project: str) -> float:
75
+ return PROJECT_COOLDOWN_OVERRIDES.get(project, COOLDOWN_SECONDS)
76
+
77
+
78
+ def cooldown_active(project: str) -> bool:
79
+ """Return True if the project is currently in cooldown (cannot run inference)."""
80
+ state = _state(project)
81
+ now = time.time()
82
+ if now - state.last_call < cooldown_seconds_for(project):
83
+ return True
84
+ return False
85
+
86
+
87
+ def cooldown_remaining(project: str) -> float:
88
+ """Seconds left in the cooldown window (0 if not in cooldown)."""
89
+ state = _state(project)
90
+ elapsed = time.time() - state.last_call
91
+ remaining = cooldown_seconds_for(project) - elapsed
92
+ return max(0.0, remaining)
93
+
94
+
95
+ def cooldown_status(project: str) -> dict:
96
+ """Snapshot of cooldown state for the UI."""
97
+ return {
98
+ "active": cooldown_active(project),
99
+ "remaining_seconds": round(cooldown_remaining(project), 2),
100
+ "window_seconds": cooldown_seconds_for(project),
101
+ }
102
+
103
+
104
+ def _mark_called(project: str) -> None:
105
+ state = _state(project)
106
+ with state.lock:
107
+ state.last_call = time.time()
108
+
109
+
110
+ # ── Inference client wrapper ─────────────────────────────────────────────────
111
+ class InferenceResult:
112
+ """A small wrapper so callers don't need to know which API returned text."""
113
+ def __init__(self, text: str, model: str, provider: str, latency_s: float):
114
+ self.text = text
115
+ self.model = model
116
+ self.provider = provider
117
+ self.latency_s = latency_s
118
+
119
+ def __repr__(self) -> str:
120
+ return f"InferenceResult(text={self.text[:50]!r}…, model={self.model!r}, latency={self.latency_s:.2f}s)"
121
+
122
+
123
+ def _get_client():
124
+ """Lazy-load the InferenceClient to keep boot fast."""
125
+ from huggingface_hub import InferenceClient
126
+ return InferenceClient(
127
+ model=INFERENCE_MODEL,
128
+ token=HF_TOKEN,
129
+ provider=INFERENCE_PROVIDER,
130
+ )
131
+
132
+
133
+ def generate(
134
+ project: str,
135
+ messages: List[Dict[str, str]],
136
+ *,
137
+ max_new_tokens: Optional[int] = None,
138
+ temperature: float = 0.7,
139
+ ) -> InferenceResult:
140
+ """Run a chat-style inference call, with cooldown enforcement.
141
+
142
+ `messages` follows OpenAI chat format: [{"role": "user|assistant|system", "content": "..."}].
143
+ Returns InferenceResult with `.text` (string) on success, or raises on failure.
144
+ Caller is responsible for fallback handling.
145
+ """
146
+ if cooldown_active(project):
147
+ remaining = cooldown_remaining(project)
148
+ raise RuntimeError(
149
+ f"cooldown active for {project!r}: {remaining:.1f}s remaining. "
150
+ f"This protects your HF/Modal credit budget."
151
+ )
152
+
153
+ max_new_tokens = max_new_tokens or MAX_NEW_TOKENS
154
+ client = _get_client()
155
+ start = time.time()
156
+ response = client.chat_completion(
157
+ messages=messages,
158
+ max_tokens=max_new_tokens,
159
+ temperature=temperature,
160
+ )
161
+ latency = time.time() - start
162
+ text = response.choices[0].message.content or ""
163
+ text = text.strip()
164
+ _mark_called(project)
165
+ return InferenceResult(
166
+ text=text,
167
+ model=INFERENCE_MODEL,
168
+ provider=INFERENCE_PROVIDER,
169
+ latency_s=latency,
170
+ )
171
+
172
+
173
+ def force_clear_cooldown(project: str) -> None:
174
+ """Manual escape hatch (e.g. for testing or admin overrides)."""
175
+ _state(project).last_call = 0.0
176
+
177
+
178
+ # ── Convenience: build messages + format result ──────────────────────────────
179
+ def chat_messages(system: str, user: str, history: Optional[List[Dict[str, str]]] = None) -> List[Dict[str, str]]:
180
+ """Build an OpenAI-style message list with optional prior turns.
181
+
182
+ `history` is in the same [{role, content}, ...] format. New turns are appended.
183
+ """
184
+ msgs: List[Dict[str, str]] = [{"role": "system", "content": system}]
185
+ if history:
186
+ msgs.extend(history)
187
+ msgs.append({"role": "user", "content": user})
188
+ return msgs
189
+
190
+
191
+ __all__ = [
192
+ "InferenceResult",
193
+ "cooldown_active",
194
+ "cooldown_remaining",
195
+ "cooldown_seconds_for",
196
+ "cooldown_status",
197
+ "force_clear_cooldown",
198
+ "generate",
199
+ "chat_messages",
200
+ "INFERENCE_MODEL",
201
+ "INFERENCE_PROVIDER",
202
+ "MAX_NEW_TOKENS",
203
+ ]
204
+
205
+
206
+ if __name__ == "__main__":
207
+ # Smoke test
208
+ for p in ("tinybard", "focusfriend", "crittercalm"):
209
+ print(p, "cooldown:", cooldown_status(p))