Hermes Bot commited on
Commit ·
648bf8b
1
Parent(s): ac9568a
Co-Authored-By: Nanoboozhoo <nanoboozhoo@git-aaaki>
- llms.txt +34 -27
- projects/crittercalm/README.md +24 -8
- projects/crittercalm/app.py +21 -33
- projects/crittercalm/content/script_generator.py +67 -51
- projects/crittercalm/requirements.txt +12 -18
- projects/focusfriend/README.md +20 -12
- projects/focusfriend/inference/llm.py +88 -127
- projects/focusfriend/requirements.txt +13 -7
- projects/tinybard/README.md +52 -32
- projects/tinybard/app.py +115 -123
- projects/tinybard/requirements.txt +13 -2
- projects/tinybard/static/main.js +12 -4
- shared/inference_client.py +209 -0
llms.txt
CHANGED
|
@@ -13,8 +13,7 @@
|
|
| 13 |
## Overview
|
| 14 |
|
| 15 |
- **Name:** Build Small Hackathon 2026 — Team nbiish
|
| 16 |
-
- **Version:** 0.
|
| 17 |
-
- **Description:** Multi-project hackathon entry targeting $48K+ prize pool across Backyard AI and Thousand Token Wood tracks. Three Gradio apps using small models (≤32B) with maximum bonus badge coverage.
|
| 18 |
- **Aesthetic:** Cedar-copper visual language — sky-to-sunrise palette (water-blue → cedar → copper → sun-amber → birch-cream), biophilic motifs, sky-to-water gradient banners. Shared CSS variables live in `shared/cedar_copper_tokens.py`.
|
| 19 |
- **Purpose:** Win prizes across tracks, badges, and sponsor categories by building delightful, useful AI apps that run locally.
|
| 20 |
- **UX:** Gradio web apps (gr.Blocks + mount_gradio_app custom frontends), hosted on HF Spaces.
|
|
@@ -73,20 +72,30 @@
|
|
| 73 |
- HF README metadata: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]` (no `emerald`/`amber`).
|
| 74 |
- HF README metadata: `emoji` must match `/\p{Extended_Pictographic}/u` — only the standard emoji block is allowed; decorative Unicode glyphs (solar/astrological/typographic symbols) fail validation. Use a real emoji.
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
### Local Test Environment
|
| 77 |
|
| 78 |
- Python: miniconda3 (Python 3.12)
|
| 79 |
- Gradio: 6.0.0
|
| 80 |
-
-
|
| 81 |
-
-
|
| 82 |
-
- VibeThinker-1.5B.Q8_0.gguf (in HF cache)
|
| 83 |
-
- LFM2-1.2B-Q4_K_M.gguf (in HF cache)
|
| 84 |
-
- LFM2-8B-A1B-Q4_K_M.gguf (in ggufy/models/)
|
| 85 |
-
- Missing GGUF models (need download): Gemma 4 12B, Dolphin-X1-8B
|
| 86 |
|
| 87 |
-
### Local Servers
|
| 88 |
|
| 89 |
-
|
| 90 |
|
| 91 |
| Project | URL | Stack | HF Space |
|
| 92 |
|---|---|---|---|
|
|
@@ -100,34 +109,27 @@ All 3 apps run simultaneously on different ports for visual inspection:
|
|
| 100 |
|
| 101 |
### 1. CritterCalm (Backyard AI)
|
| 102 |
|
| 103 |
-
- **Status:** Code complete. Deployed.
|
| 104 |
-
- **Stack:** OmniVoice (0.6B
|
| 105 |
-
- **Badges:** Off the Grid, Well-Tuned (TBD), Field Notes,
|
| 106 |
- **GitHub:** github.com/nbiish/crittercalm
|
| 107 |
- **HF Space:** huggingface.co/spaces/nbiish/crittercalm
|
| 108 |
- **Standalone repo:** /Volumes/1tb-sandisk/code-external/crittercalm-repo
|
| 109 |
|
| 110 |
### 2. FocusFriend (Thousand Token Wood)
|
| 111 |
|
| 112 |
-
- **Status:** Code complete. Deployed.
|
| 113 |
-
- **Stack:**
|
| 114 |
-
- **Badges:** Off-Brand (sun-amber custom theme),
|
| 115 |
- **GitHub:** github.com/nbiish/focusfriend
|
| 116 |
- **HF Space:** huggingface.co/spaces/nbiish/focusfriend
|
| 117 |
- **Standalone repo:** /Volumes/1tb-sandisk/code-external/focusfriend-repo
|
| 118 |
-
- **Note:** Gemma 4 12B GGUF not yet downloaded. Need `huggingface-cli download unsloth/gemma-4-12b-it-GGUF --include "gemma-4-12b-it-Q4_K_M.gguf" --local-dir ./models`
|
| 119 |
|
| 120 |
### 3. TinyBard (Thousand Token Wood + Tiny Titan + Llama Champion)
|
| 121 |
|
| 122 |
-
- **Status:** Code complete. Deployed.
|
| 123 |
- **Concept:** ≤4B LLM generates 5-min interactive text adventures in a CRT terminal aesthetic.
|
| 124 |
-
- **Stack:**
|
| 125 |
-
- **Architecture:** FastAPI + mount_gradio_app at /gradio. Custom HTML/CSS/JS frontend. MCP tools: start_game, make_choice.
|
| 126 |
-
- **Badges:** Llama Champion, Tiny Titan (1.5B < 4B), Off-Brand (custom CRT), Off the Grid, Field Notes
|
| 127 |
-
- **Prize targets:** Tiny Titan ($1K), Thousand Token Wood track, Bonus Quest Champion potential
|
| 128 |
-
- **GitHub:** github.com/nbiish/tinybard
|
| 129 |
-
- **HF Space:** huggingface.co/spaces/nbiish/tinybard
|
| 130 |
-
- **Standalone repo:** /Volumes/1tb-sandisk/code-external/tinybard
|
| 131 |
|
| 132 |
---
|
| 133 |
|
|
@@ -155,17 +157,21 @@ All 3 apps run simultaneously on different ports for visual inspection:
|
|
| 155 |
- [x] INTELLIGENCE.md — full hackathon landscape analysis
|
| 156 |
- [x] SUBMISSION_DRAFTS.md — social posts + Field Notes drafts
|
| 157 |
- [x] HF CLI installed + skills configured (`hf skills add --global`)
|
| 158 |
-
- [x] llama-cpp-python installed (conda-forge v0.3.16)
|
| 159 |
- [x] Local verification: all 3 apps run on ports 7861/7862/7863
|
| 160 |
- [x] TinyBard end-to-end game loop verified (start → choose → next scene)
|
| 161 |
- [x] FocusFriend chat verified (user message → Pip reply)
|
| 162 |
- [x] CritterCalm UI navigation verified (all 3 tabs render)
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
---
|
| 165 |
|
| 166 |
## Short-term Goals
|
| 167 |
|
| 168 |
-
-
|
|
|
|
| 169 |
- Record demo videos and post to social media
|
| 170 |
- Write and publish Field Notes blog posts
|
| 171 |
- Share agent traces for Sharing is Caring badge
|
|
@@ -179,6 +185,7 @@ All 3 apps run simultaneously on different ports for visual inspection:
|
|
| 179 |
- FocusFriend: projects/focusfriend/ + github.com/nbiish/focusfriend
|
| 180 |
- TinyBard: projects/tinybard/ + github.com/nbiish/tinybard
|
| 181 |
- Aesthetic module: shared/cedar_copper_tokens.py
|
|
|
|
| 182 |
- ML Intern: github.com/huggingface/ml-intern
|
| 183 |
- HF Agents CLI: huggingface.co/docs/hub/en/agents-cli
|
| 184 |
- Gradio MCP: gradio.app/guides/model-context-protocol
|
|
|
|
| 13 |
## Overview
|
| 14 |
|
| 15 |
- **Name:** Build Small Hackathon 2026 — Team nbiish
|
| 16 |
+
- **Version:** 0.5.0 — Cedar-Copper Edition (HF Inference API)
|
|
|
|
| 17 |
- **Aesthetic:** Cedar-copper visual language — sky-to-sunrise palette (water-blue → cedar → copper → sun-amber → birch-cream), biophilic motifs, sky-to-water gradient banners. Shared CSS variables live in `shared/cedar_copper_tokens.py`.
|
| 18 |
- **Purpose:** Win prizes across tracks, badges, and sponsor categories by building delightful, useful AI apps that run locally.
|
| 19 |
- **UX:** Gradio web apps (gr.Blocks + mount_gradio_app custom frontends), hosted on HF Spaces.
|
|
|
|
| 72 |
- HF README metadata: `colorTo` must be one of `[red, yellow, green, blue, indigo, purple, pink, gray]` (no `emerald`/`amber`).
|
| 73 |
- HF README metadata: `emoji` must match `/\p{Extended_Pictographic}/u` — only the standard emoji block is allowed; decorative Unicode glyphs (solar/astrological/typographic symbols) fail validation. Use a real emoji.
|
| 74 |
|
| 75 |
+
### Inference Architecture (v0.5+)
|
| 76 |
+
|
| 77 |
+
- **All LLM inference** is now via the **Hugging Face Inference API** (serverless). No more local GGUF, no `llama-cpp-python` compile step.
|
| 78 |
+
- Shared module: `shared/inference_client.py` provides `cooldown_status()`, `cooldown_active()`, `generate()`, and `chat_messages()`.
|
| 79 |
+
- Default model: `Qwen/Qwen2.5-1.5B-Instruct` (free tier, fast, well-suited to chat). Override via `INFERENCE_MODEL`.
|
| 80 |
+
- Per-project model override: `TINYBARD_MODEL`, `FOCUSFRIEND_MODEL`, `CRITTERCALM_MODEL`.
|
| 81 |
+
- **Cooldowns** enforce a per-project minimum gap between inference calls (protects HF/Modal credit budget):
|
| 82 |
+
- `tinybard`: 6s
|
| 83 |
+
- `focusfriend`: 10s
|
| 84 |
+
- `crittercalm`: 12s
|
| 85 |
+
- Override via `TINYBARD_COOLDOWN_SECONDS`, etc., or global `INFERENCE_COOLDOWN_SECONDS`.
|
| 86 |
+
- **Always-fallback:** every LLM call falls back to procedural / template output if inference fails or is in cooldown. No LLM call ever blocks the UX.
|
| 87 |
+
- HF Spaces are the dev/test environment — iterate live at `huggingface.co/spaces/nbiish/{tinybard,focusfriend,crittercalm}` rather than localhost.
|
| 88 |
+
|
| 89 |
### Local Test Environment
|
| 90 |
|
| 91 |
- Python: miniconda3 (Python 3.12)
|
| 92 |
- Gradio: 6.0.0
|
| 93 |
+
- `huggingface_hub` (for Inference API client)
|
| 94 |
+
- Inference is serverless — no local model files needed unless you opt in to local mode
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
### Local Servers (optional)
|
| 97 |
|
| 98 |
+
Local servers were used during v0.4 development for visual inspection. v0.5+ prefers iterating on the live HF Spaces (which use your HF/Modal compute credits). Local servers can still be run for dev:
|
| 99 |
|
| 100 |
| Project | URL | Stack | HF Space |
|
| 101 |
|---|---|---|---|
|
|
|
|
| 109 |
|
| 110 |
### 1. CritterCalm (Backyard AI)
|
| 111 |
|
| 112 |
+
- **Status:** Code complete. Deployed. HF Inference API + cooldowns wired for script generation. OmniVoice voice cloning still requires local install.
|
| 113 |
+
- **Stack:** OmniVoice (0.6B, local optional) + Kokoro TTS (82M, local optional) + Qwen2.5-7B (default) via HF Inference API
|
| 114 |
+
- **Badges:** Off the Grid, Well-Tuned (TBD), Field Notes, Off-Brand
|
| 115 |
- **GitHub:** github.com/nbiish/crittercalm
|
| 116 |
- **HF Space:** huggingface.co/spaces/nbiish/crittercalm
|
| 117 |
- **Standalone repo:** /Volumes/1tb-sandisk/code-external/crittercalm-repo
|
| 118 |
|
| 119 |
### 2. FocusFriend (Thousand Token Wood)
|
| 120 |
|
| 121 |
+
- **Status:** Code complete. Deployed. HF Inference API + cooldowns wired. Gradio 6 Chatbot dict-format fixed.
|
| 122 |
+
- **Stack:** Qwen2.5-7B (default) via HF Inference API
|
| 123 |
+
- **Badges:** Off-Brand (sun-amber custom theme), Field Notes, Cooldowns badge
|
| 124 |
- **GitHub:** github.com/nbiish/focusfriend
|
| 125 |
- **HF Space:** huggingface.co/spaces/nbiish/focusfriend
|
| 126 |
- **Standalone repo:** /Volumes/1tb-sandisk/code-external/focusfriend-repo
|
|
|
|
| 127 |
|
| 128 |
### 3. TinyBard (Thousand Token Wood + Tiny Titan + Llama Champion)
|
| 129 |
|
| 130 |
+
- **Status:** Code complete. Deployed. HF Inference API + cooldowns wired. Local test verified (procedural fallback + cooldown UI).
|
| 131 |
- **Concept:** ≤4B LLM generates 5-min interactive text adventures in a CRT terminal aesthetic.
|
| 132 |
+
- **Stack:** Qwen2.5-1.5B (default) via HF Inference API + procedural fallback engine
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
---
|
| 135 |
|
|
|
|
| 157 |
- [x] INTELLIGENCE.md — full hackathon landscape analysis
|
| 158 |
- [x] SUBMISSION_DRAFTS.md — social posts + Field Notes drafts
|
| 159 |
- [x] HF CLI installed + skills configured (`hf skills add --global`)
|
| 160 |
+
- [x] llama-cpp-python installed (conda-forge v0.3.16) — for reference; v0.5+ uses HF Inference API
|
| 161 |
- [x] Local verification: all 3 apps run on ports 7861/7862/7863
|
| 162 |
- [x] TinyBard end-to-end game loop verified (start → choose → next scene)
|
| 163 |
- [x] FocusFriend chat verified (user message → Pip reply)
|
| 164 |
- [x] CritterCalm UI navigation verified (all 3 tabs render)
|
| 165 |
+
- [x] **v0.5: HF Inference API wired into all 3 apps** (no local GGUF, no build step)
|
| 166 |
+
- [x] **v0.5: Cooldown system** in `shared/inference_client.py` to protect HF/Modal credit budget
|
| 167 |
+
- [x] **v0.5: TinyBard local test** — procedural fallback works when no HF_TOKEN; cooldown UI shows in footer
|
| 168 |
|
| 169 |
---
|
| 170 |
|
| 171 |
## Short-term Goals
|
| 172 |
|
| 173 |
+
- Iterate on the live HF Spaces (nbiish/tinybard, nbiish/focusfriend, nbiish/crittercalm)
|
| 174 |
+
- Set HF_TOKEN + INFERENCE_MODEL Space secrets to enable real LLM-backed adventures
|
| 175 |
- Record demo videos and post to social media
|
| 176 |
- Write and publish Field Notes blog posts
|
| 177 |
- Share agent traces for Sharing is Caring badge
|
|
|
|
| 185 |
- FocusFriend: projects/focusfriend/ + github.com/nbiish/focusfriend
|
| 186 |
- TinyBard: projects/tinybard/ + github.com/nbiish/tinybard
|
| 187 |
- Aesthetic module: shared/cedar_copper_tokens.py
|
| 188 |
+
- Inference client: shared/inference_client.py
|
| 189 |
- ML Intern: github.com/huggingface/ml-intern
|
| 190 |
- HF Agents CLI: huggingface.co/docs/hub/en/agents-cli
|
| 191 |
- Gradio MCP: gradio.app/guides/model-context-protocol
|
projects/crittercalm/README.md
CHANGED
|
@@ -19,6 +19,8 @@ tags:
|
|
| 19 |
- off-the-grid
|
| 20 |
- anishinaabe
|
| 21 |
- solarpunk
|
|
|
|
|
|
|
| 22 |
---
|
| 23 |
|
| 24 |
# ◈──◆──◇ ᐴ CRITTERCALM ᔔ MAANAMEWIN / VOICE-COMFORT FOR THE FOUR-LEGGEDS ◇──◆──◈
|
|
@@ -53,7 +55,15 @@ git clone https://github.com/nbiish/crittercalm.git
|
|
| 53 |
cd crittercalm
|
| 54 |
pip install -r requirements.txt
|
| 55 |
|
| 56 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
python app.py
|
| 58 |
```
|
| 59 |
|
|
@@ -61,13 +71,18 @@ Then open <http://localhost:7863/>.
|
|
| 61 |
|
| 62 |
## ☼ ZHOONIYAAWICHIGEWIN / MODEL STACK ◈
|
| 63 |
|
| 64 |
-
|
|
| 65 |
-
|---
|
| 66 |
-
| OmniVoice |
|
| 67 |
-
|
|
| 68 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
-
**
|
|
|
|
| 71 |
|
| 72 |
## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
|
| 73 |
|
|
@@ -79,10 +94,11 @@ Runs with `mcp_server=True` — Streamable HTTP MCP server at `/gradio/gradio_ap
|
|
| 79 |
|
| 80 |
## ☼ GIIZHIITAA / BADGES ◈
|
| 81 |
|
| 82 |
-
- 🔌 **Off the Grid** —
|
| 83 |
- 🎯 **Well-Tuned** — Fine-tuned voice embeddings for pet-directed speech
|
| 84 |
- 📓 **Field Notes** — Blog post on animal psychoacoustics + voice cloning
|
| 85 |
- 🎨 **Off-Brand** — Anishinaabe-Solarpunk theme with sky-to-sunrise palette
|
|
|
|
| 86 |
|
| 87 |
## ☼ INA-WAABANDA'IWEWIN / PROJECT STRUCTURE ◈
|
| 88 |
|
|
|
|
| 19 |
- off-the-grid
|
| 20 |
- anishinaabe
|
| 21 |
- solarpunk
|
| 22 |
+
- inference-api
|
| 23 |
+
- cooldowns
|
| 24 |
---
|
| 25 |
|
| 26 |
# ◈──◆──◇ ᐴ CRITTERCALM ᔔ MAANAMEWIN / VOICE-COMFORT FOR THE FOUR-LEGGEDS ◇──◆──◈
|
|
|
|
| 55 |
cd crittercalm
|
| 56 |
pip install -r requirements.txt
|
| 57 |
|
| 58 |
+
# Optional: pick a model (default: Qwen/Qwen2.5-7B-Instruct)
|
| 59 |
+
export INFERENCE_MODEL="Qwen/Qwen2.5-7B-Instruct"
|
| 60 |
+
|
| 61 |
+
# Optional: set the HF token
|
| 62 |
+
export HF_TOKEN="hf_..."
|
| 63 |
+
|
| 64 |
+
# Optional: tune the cooldown
|
| 65 |
+
export CRITTERCALM_COOLDOWN_SECONDS=12
|
| 66 |
+
|
| 67 |
python app.py
|
| 68 |
```
|
| 69 |
|
|
|
|
| 71 |
|
| 72 |
## ☼ ZHOONIYAAWICHIGEWIN / MODEL STACK ◈
|
| 73 |
|
| 74 |
+
| Component | Source | Purpose | License |
|
| 75 |
+
|---|---|---|---|
|
| 76 |
+
| OmniVoice | local (when installed) | Voice cloning + TTS | Apache 2.0 |
|
| 77 |
+
| Kokoro TTS | local (when installed) | Built-in soothing voices (fallback) | Apache 2.0 |
|
| 78 |
+
| Script LLM | **HF Inference API** (serverless) | Calming script generation | varies |
|
| 79 |
+
|
| 80 |
+
The script LLM uses the HF Inference API — no local GGUF build, configurable per-Space.
|
| 81 |
+
Default: `Qwen/Qwen2.5-7B-Instruct` (small + fast + free tier friendly).
|
| 82 |
+
Override: `CRITTERCALM_MODEL` env var.
|
| 83 |
|
| 84 |
+
**Local components:** 0.6B (OmniVoice) + 82M (Kokoro) = ~0.7B (when installed).
|
| 85 |
+
**API LLM:** 1.5B-9B depending on `INFERENCE_MODEL` choice.
|
| 86 |
|
| 87 |
## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
|
| 88 |
|
|
|
|
| 94 |
|
| 95 |
## ☼ GIIZHIITAA / BADGES ◈
|
| 96 |
|
| 97 |
+
- 🔌 **Off the Grid** — Voice cloning + TTS run locally; only the script LLM uses Inference API
|
| 98 |
- 🎯 **Well-Tuned** — Fine-tuned voice embeddings for pet-directed speech
|
| 99 |
- 📓 **Field Notes** — Blog post on animal psychoacoustics + voice cloning
|
| 100 |
- 🎨 **Off-Brand** — Anishinaabe-Solarpunk theme with sky-to-sunrise palette
|
| 101 |
+
- 🌀 **Cooldowns** — Serverless inference with built-in credit protection
|
| 102 |
|
| 103 |
## ☼ INA-WAABANDA'IWEWIN / PROJECT STRUCTURE ◈
|
| 104 |
|
projects/crittercalm/app.py
CHANGED
|
@@ -47,10 +47,9 @@ log = logging.getLogger("crittercalm")
|
|
| 47 |
MODEL_DIR = Path(os.environ.get("CRITTERCALM_MODEL_DIR", Path(__file__).parent / "models"))
|
| 48 |
MODEL_DIR.mkdir(parents=True, exist_ok=True)
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
)
|
| 54 |
OMNIVOICE_MODEL_ID = os.environ.get("OMNIVOICE_MODEL_ID", "k2-fsa/OmniVoice")
|
| 55 |
KOKORO_MODEL_PATH = os.environ.get(
|
| 56 |
"KOKORO_MODEL_PATH",
|
|
@@ -96,34 +95,12 @@ def get_omnivoice():
|
|
| 96 |
return None
|
| 97 |
|
| 98 |
|
|
|
|
|
|
|
|
|
|
| 99 |
def get_dolphin_llm():
|
| 100 |
-
"""
|
| 101 |
-
|
| 102 |
-
if _dolphin_llm is not None:
|
| 103 |
-
return _dolphin_llm
|
| 104 |
-
gguf_path = Path(DOLPHIN_MODEL_PATH)
|
| 105 |
-
if not gguf_path.exists():
|
| 106 |
-
log.warning(f"Dolphin GGUF not found at {gguf_path}. "
|
| 107 |
-
"Download from https://huggingface.co/dphn/Dolphin-X1-8B-GGUF")
|
| 108 |
-
return None
|
| 109 |
-
try:
|
| 110 |
-
from llama_cpp import Llama
|
| 111 |
-
|
| 112 |
-
log.info(f"Loading Dolphin-X1-8B from {gguf_path} …")
|
| 113 |
-
_dolphin_llm = Llama(
|
| 114 |
-
model_path=str(gguf_path),
|
| 115 |
-
n_ctx=4096,
|
| 116 |
-
n_threads=os.cpu_count() or 4,
|
| 117 |
-
verbose=False,
|
| 118 |
-
)
|
| 119 |
-
log.info("Dolphin-X1-8B loaded ✓")
|
| 120 |
-
return _dolphin_llm
|
| 121 |
-
except ImportError:
|
| 122 |
-
log.warning("llama-cpp-python not installed.")
|
| 123 |
-
return None
|
| 124 |
-
except Exception as exc:
|
| 125 |
-
log.error(f"Dolphin load failed: {exc}")
|
| 126 |
-
return None
|
| 127 |
|
| 128 |
|
| 129 |
def get_kokoro():
|
|
@@ -406,8 +383,19 @@ def generate_calming_audio(
|
|
| 406 |
# ---------------------------------------------------------------------------
|
| 407 |
|
| 408 |
def get_model_status() -> str:
|
| 409 |
-
"""Return a markdown summary of which models are available."""
|
| 410 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 411 |
|
| 412 |
omni = get_omnivoice()
|
| 413 |
lines.append(
|
|
|
|
| 47 |
MODEL_DIR = Path(os.environ.get("CRITTERCALM_MODEL_DIR", Path(__file__).parent / "models"))
|
| 48 |
MODEL_DIR.mkdir(parents=True, exist_ok=True)
|
| 49 |
|
| 50 |
+
# Deprecated: Dolphin GGUF path kept as None. Script generation now uses
|
| 51 |
+
# the HF Inference API via content.script_generator (no local GGUF build).
|
| 52 |
+
DOLPHIN_MODEL_PATH = None
|
|
|
|
| 53 |
OMNIVOICE_MODEL_ID = os.environ.get("OMNIVOICE_MODEL_ID", "k2-fsa/OmniVoice")
|
| 54 |
KOKORO_MODEL_PATH = os.environ.get(
|
| 55 |
"KOKORO_MODEL_PATH",
|
|
|
|
| 95 |
return None
|
| 96 |
|
| 97 |
|
| 98 |
+
# Dolphin LLM is no longer used locally. Script generation now uses the
|
| 99 |
+
# HF Inference API via content.script_generator. The shim below preserves
|
| 100 |
+
# the call sites in case any external MCP tool references it.
|
| 101 |
def get_dolphin_llm():
|
| 102 |
+
"""Deprecated. Returns None — use the HF Inference API via script_generator."""
|
| 103 |
+
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
|
| 106 |
def get_kokoro():
|
|
|
|
| 383 |
# ---------------------------------------------------------------------------
|
| 384 |
|
| 385 |
def get_model_status() -> str:
|
| 386 |
+
"""Return a markdown summary of which models are available + cooldown."""
|
| 387 |
+
# Get the current cooldown snapshot from the script generator
|
| 388 |
+
try:
|
| 389 |
+
from content.script_generator import cooldown_snapshot
|
| 390 |
+
snap = cooldown_snapshot()
|
| 391 |
+
cooldown_line = (
|
| 392 |
+
f"Inference model: `{snap['model']}` · "
|
| 393 |
+
f"cooldown: {snap['cooldown']['active']} · "
|
| 394 |
+
f"window: {snap['cooldown']['window_seconds']}s"
|
| 395 |
+
)
|
| 396 |
+
except Exception as e:
|
| 397 |
+
cooldown_line = f"cooldown status unavailable: {e}"
|
| 398 |
+
lines = [f"> {cooldown_line}\n", "| Model | Status | Purpose |", "|-------|--------|---------|"]
|
| 399 |
|
| 400 |
omni = get_omnivoice()
|
| 401 |
lines.append(
|
projects/crittercalm/content/script_generator.py
CHANGED
|
@@ -1,14 +1,39 @@
|
|
| 1 |
"""
|
| 2 |
-
Calming script generation using
|
|
|
|
| 3 |
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
"""
|
|
|
|
| 9 |
|
| 10 |
import logging
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
log = logging.getLogger("crittercalm.content")
|
| 14 |
|
|
@@ -33,6 +58,10 @@ Guidelines:
|
|
| 33 |
Output ONLY the spoken script — no stage directions, no explanations."""
|
| 34 |
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
def create_script_prompt(
|
| 37 |
animal: str,
|
| 38 |
situation: str,
|
|
@@ -40,36 +69,12 @@ def create_script_prompt(
|
|
| 40 |
pet_name: str = "",
|
| 41 |
custom_message: str = "",
|
| 42 |
) -> str:
|
| 43 |
-
"""
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
Args:
|
| 47 |
-
animal: Animal type (Dog, Cat, Chicken, etc.)
|
| 48 |
-
situation: The stress situation
|
| 49 |
-
duration_minutes: Target session length in minutes
|
| 50 |
-
pet_name: Optional pet name
|
| 51 |
-
custom_message: Optional custom message to include
|
| 52 |
-
|
| 53 |
-
Returns:
|
| 54 |
-
Formatted prompt string
|
| 55 |
-
"""
|
| 56 |
-
duration_words = (
|
| 57 |
-
"very brief, about 30 seconds"
|
| 58 |
-
if duration_minutes <= 1
|
| 59 |
-
else f"about {duration_minutes} minutes when read aloud slowly"
|
| 60 |
-
)
|
| 61 |
-
name_clause = f"named {pet_name}" if pet_name.strip() else ""
|
| 62 |
-
custom_clause = (
|
| 63 |
-
f"\nInclude this personal message naturally: \"{custom_message}\""
|
| 64 |
-
if custom_message.strip()
|
| 65 |
-
else ""
|
| 66 |
-
)
|
| 67 |
-
|
| 68 |
return (
|
| 69 |
-
f"Write a calming spoken message for a {animal}
|
| 70 |
-
f"
|
| 71 |
-
f"Length: {duration_words}.{custom_clause}\n"
|
| 72 |
-
f"Make it warm, soothing, and specifically tailored to a {animal}'s needs."
|
| 73 |
)
|
| 74 |
|
| 75 |
|
|
@@ -79,10 +84,9 @@ def generate_calming_script(
|
|
| 79 |
duration_minutes: int,
|
| 80 |
custom_message: str = "",
|
| 81 |
pet_name: str = "",
|
| 82 |
-
dolphin_llm=None,
|
| 83 |
) -> str:
|
| 84 |
-
"""
|
| 85 |
-
Generate a calming script using Dolphin-X1-8B or fallback templates.
|
| 86 |
|
| 87 |
Args:
|
| 88 |
animal: Animal type
|
|
@@ -90,12 +94,12 @@ def generate_calming_script(
|
|
| 90 |
duration_minutes: Target session length
|
| 91 |
custom_message: Optional custom message
|
| 92 |
pet_name: Optional pet name
|
| 93 |
-
dolphin_llm:
|
| 94 |
|
| 95 |
Returns:
|
| 96 |
Generated calming script as a string
|
| 97 |
"""
|
| 98 |
-
|
| 99 |
animal=animal,
|
| 100 |
situation=situation,
|
| 101 |
duration_minutes=duration_minutes,
|
|
@@ -103,22 +107,34 @@ def generate_calming_script(
|
|
| 103 |
custom_message=custom_message,
|
| 104 |
)
|
| 105 |
|
| 106 |
-
# Try
|
| 107 |
-
if
|
| 108 |
try:
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
temperature=0.7,
|
| 115 |
-
max_tokens=1024,
|
| 116 |
)
|
| 117 |
-
script =
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
except Exception as exc:
|
| 121 |
log.warning(f"LLM generation failed, using template: {exc}")
|
|
|
|
|
|
|
| 122 |
|
| 123 |
# Fallback: pre-written templates
|
| 124 |
return get_template(animal, situation, pet_name, custom_message)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
Calming script generation using the Hugging Face Inference API
|
| 3 |
+
or pre-written templates.
|
| 4 |
|
| 5 |
+
The previous version used Dolphin-X1-8B via llama-cpp-python locally. That
|
| 6 |
+
required a heavy build step on HF Spaces. This version uses the serverless
|
| 7 |
+
HF Inference API and enforces a per-project cooldown via
|
| 8 |
+
`shared.inference_client` to protect credit budgets.
|
| 9 |
+
|
| 10 |
+
Override model: set `CRITTERCALM_MODEL` env var. Default is
|
| 11 |
+
`Qwen/Qwen2.5-7B-Instruct` (small, fast, free-tier friendly). The
|
| 12 |
+
system prompt is unchanged — output format is identical.
|
| 13 |
"""
|
| 14 |
+
from __future__ import annotations
|
| 15 |
|
| 16 |
import logging
|
| 17 |
+
import os
|
| 18 |
+
import sys
|
| 19 |
+
from pathlib import Path
|
| 20 |
+
from typing import List, Dict, Optional
|
| 21 |
+
|
| 22 |
+
# Repo-root path setup so we can import shared.inference_client
|
| 23 |
+
_THIS = Path(__file__).resolve()
|
| 24 |
+
_REPO_ROOT = _THIS.parent.parent.parent
|
| 25 |
+
if str(_REPO_ROOT) not in sys.path:
|
| 26 |
+
sys.path.insert(0, str(_REPO_ROOT))
|
| 27 |
+
|
| 28 |
+
from shared.inference_client import ( # noqa: E402
|
| 29 |
+
chat_messages,
|
| 30 |
+
cooldown_active,
|
| 31 |
+
cooldown_status,
|
| 32 |
+
generate as _client_generate,
|
| 33 |
+
INFERENCE_MODEL as DEFAULT_MODEL,
|
| 34 |
+
)
|
| 35 |
+
|
| 36 |
+
from content.templates import get_template # noqa: E402
|
| 37 |
|
| 38 |
log = logging.getLogger("crittercalm.content")
|
| 39 |
|
|
|
|
| 58 |
Output ONLY the spoken script — no stage directions, no explanations."""
|
| 59 |
|
| 60 |
|
| 61 |
+
def _model() -> str:
|
| 62 |
+
return os.environ.get("CRITTERCALM_MODEL", DEFAULT_MODEL)
|
| 63 |
+
|
| 64 |
+
|
| 65 |
def create_script_prompt(
|
| 66 |
animal: str,
|
| 67 |
situation: str,
|
|
|
|
| 69 |
pet_name: str = "",
|
| 70 |
custom_message: str = "",
|
| 71 |
) -> str:
|
| 72 |
+
"""Build the user prompt for script generation."""
|
| 73 |
+
pet_part = f" The pet's name is \"{pet_name}\"." if pet_name else ""
|
| 74 |
+
custom_part = f" Incorporate this personal note: \"{custom_message}\"" if custom_message else ""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
return (
|
| 76 |
+
f"Write a {duration_minutes}-minute calming spoken message for a {animal} "
|
| 77 |
+
f"that is experiencing {situation}.{pet_part}{custom_part}"
|
|
|
|
|
|
|
| 78 |
)
|
| 79 |
|
| 80 |
|
|
|
|
| 84 |
duration_minutes: int,
|
| 85 |
custom_message: str = "",
|
| 86 |
pet_name: str = "",
|
| 87 |
+
dolphin_llm=None, # legacy param — ignored; we use the HF Inference API
|
| 88 |
) -> str:
|
| 89 |
+
"""Generate a calming script using HF Inference API or fallback templates.
|
|
|
|
| 90 |
|
| 91 |
Args:
|
| 92 |
animal: Animal type
|
|
|
|
| 94 |
duration_minutes: Target session length
|
| 95 |
custom_message: Optional custom message
|
| 96 |
pet_name: Optional pet name
|
| 97 |
+
dolphin_llm: Legacy parameter (ignored)
|
| 98 |
|
| 99 |
Returns:
|
| 100 |
Generated calming script as a string
|
| 101 |
"""
|
| 102 |
+
user_prompt = create_script_prompt(
|
| 103 |
animal=animal,
|
| 104 |
situation=situation,
|
| 105 |
duration_minutes=duration_minutes,
|
|
|
|
| 107 |
custom_message=custom_message,
|
| 108 |
)
|
| 109 |
|
| 110 |
+
# Try inference (cooldown-aware)
|
| 111 |
+
if not cooldown_active("crittercalm"):
|
| 112 |
try:
|
| 113 |
+
messages = chat_messages(CALMING_SYSTEM_PROMPT, user_prompt)
|
| 114 |
+
result = _client_generate(
|
| 115 |
+
project="crittercalm",
|
| 116 |
+
messages=messages,
|
| 117 |
+
max_new_tokens=int(duration_minutes * 200), # rough token budget
|
| 118 |
temperature=0.7,
|
|
|
|
| 119 |
)
|
| 120 |
+
script = result.text.strip()
|
| 121 |
+
if script:
|
| 122 |
+
log.info(f"LLM script generated: {len(script)} chars")
|
| 123 |
+
return script
|
| 124 |
+
except RuntimeError:
|
| 125 |
+
# Cooldown — fall through to template
|
| 126 |
+
log.info("crittercalm inference cooldown; using template")
|
| 127 |
except Exception as exc:
|
| 128 |
log.warning(f"LLM generation failed, using template: {exc}")
|
| 129 |
+
else:
|
| 130 |
+
log.info("crittercalm inference cooldown active; using template")
|
| 131 |
|
| 132 |
# Fallback: pre-written templates
|
| 133 |
return get_template(animal, situation, pet_name, custom_message)
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def cooldown_snapshot() -> dict:
|
| 137 |
+
return {
|
| 138 |
+
"model": _model(),
|
| 139 |
+
"cooldown": cooldown_status("crittercalm"),
|
| 140 |
+
}
|
projects/crittercalm/requirements.txt
CHANGED
|
@@ -1,24 +1,18 @@
|
|
| 1 |
# CritterCalm — AI Voice Cloning Animal Soother
|
| 2 |
# Python 3.10+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
# === Core ===
|
| 5 |
gradio>=5.0
|
| 6 |
numpy>=1.24
|
| 7 |
-
soundfile>=0.12
|
| 8 |
-
torch>=2.0
|
| 9 |
-
|
| 10 |
-
# === Voice Cloning ===
|
| 11 |
-
omnivoice>=0.1.0
|
| 12 |
-
|
| 13 |
-
# === LLM Inference (Dolphin-X1-8B) ===
|
| 14 |
-
llama-cpp-python>=0.3.0
|
| 15 |
-
|
| 16 |
-
# === Fallback TTS ===
|
| 17 |
-
kokoro-onnx>=0.2.0
|
| 18 |
-
|
| 19 |
-
# === Audio Processing ===
|
| 20 |
-
librosa>=0.10
|
| 21 |
-
scipy>=1.10
|
| 22 |
-
|
| 23 |
-
# === Utilities ===
|
| 24 |
huggingface_hub>=0.20
|
|
|
|
| 1 |
# CritterCalm — AI Voice Cloning Animal Soother
|
| 2 |
# Python 3.10+
|
| 3 |
+
#
|
| 4 |
+
# Inference is via the Hugging Face Inference API. No local GGUF,
|
| 5 |
+
# no llama-cpp-python compile step. Cooldown is enforced in
|
| 6 |
+
# `shared/inference_client.py` to protect your credit budget.
|
| 7 |
+
#
|
| 8 |
+
# Space env vars (Settings → Variables and secrets):
|
| 9 |
+
# HF_TOKEN — your HF token (anonymous works for many models)
|
| 10 |
+
# INFERENCE_MODEL — default model id
|
| 11 |
+
# CRITTERCALM_MODEL — override the model for this project
|
| 12 |
+
# CRITTERCALM_COOLDOWN_SECONDS — gap between inference calls (default 12)
|
| 13 |
+
# INFERENCE_PROVIDER — "hf-inference" (default) or paid provider
|
| 14 |
+
# INFERENCE_MAX_TOKENS — per-call cap (default 220)
|
| 15 |
|
|
|
|
| 16 |
gradio>=5.0
|
| 17 |
numpy>=1.24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
huggingface_hub>=0.20
|
projects/focusfriend/README.md
CHANGED
|
@@ -1,6 +1,5 @@
|
|
| 1 |
---
|
| 2 |
title: ᐴ FocusFriend ᔔ
|
| 3 |
-
emoji: ☼
|
| 4 |
colorFrom: indigo
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: gradio
|
|
@@ -20,6 +19,8 @@ tags:
|
|
| 20 |
- tiny-titan
|
| 21 |
- anishinaabe
|
| 22 |
- solarpunk
|
|
|
|
|
|
|
| 23 |
---
|
| 24 |
|
| 25 |
# ◈──◆──◇ ᐴ FOCUSFRIEND ᔔ PIP, YOUR CEDAR-AND-SUN COMPANION ON THE LAKE ◇──◆──◈
|
|
@@ -44,8 +45,8 @@ therapy and wants you to actually feel better, not just hear platitudes."
|
|
| 44 |
## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
|
| 45 |
|
| 46 |
- Python 3.10+
|
| 47 |
-
-
|
| 48 |
-
- ~
|
| 49 |
|
| 50 |
## ☼ AABAJITOOWINAN / INSTALLATION ◈
|
| 51 |
|
|
@@ -54,10 +55,14 @@ git clone https://github.com/nbiish/focusfriend.git
|
|
| 54 |
cd focusfriend
|
| 55 |
pip install -r requirements.txt
|
| 56 |
|
| 57 |
-
#
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
python app.py
|
| 63 |
```
|
|
@@ -66,9 +71,11 @@ Then open <http://localhost:7862/>.
|
|
| 66 |
|
| 67 |
## ☼ ZHOONIYAAWICHIGEWIN / MODEL ◈
|
| 68 |
|
| 69 |
-
| Model | Size | Purpose | License |
|
| 70 |
-
|---
|
| 71 |
-
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
|
| 74 |
|
|
@@ -83,9 +90,10 @@ Runs with `mcp_server=True` — Streamable HTTP MCP server at `/gradio/gradio_ap
|
|
| 83 |
## ☼ GIIZHIITAA / BADGES ◈
|
| 84 |
|
| 85 |
- 🎨 **Off-Brand** — Anishinaabe-Solarpunk CSS theme with sun-amber gradients
|
| 86 |
-
- 🔌 **Off the Grid** — Fully local, no API calls
|
| 87 |
- 📓 **Field Notes** — Blog post about AI wellness companions
|
| 88 |
-
- 🦙 **Tiny Titan** —
|
|
|
|
|
|
|
| 89 |
|
| 90 |
## ☼ GANAWAABANDAAN / MEET PIP ◈
|
| 91 |
|
|
|
|
| 1 |
---
|
| 2 |
title: ᐴ FocusFriend ᔔ
|
|
|
|
| 3 |
colorFrom: indigo
|
| 4 |
colorTo: yellow
|
| 5 |
sdk: gradio
|
|
|
|
| 19 |
- tiny-titan
|
| 20 |
- anishinaabe
|
| 21 |
- solarpunk
|
| 22 |
+
- inference-api
|
| 23 |
+
- cooldowns
|
| 24 |
---
|
| 25 |
|
| 26 |
# ◈──◆──◇ ᐴ FOCUSFRIEND ᔔ PIP, YOUR CEDAR-AND-SUN COMPANION ON THE LAKE ◇──◆──◈
|
|
|
|
| 45 |
## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
|
| 46 |
|
| 47 |
- Python 3.10+
|
| 48 |
+
- A Hugging Face token (anonymous works for many small models)
|
| 49 |
+
- ~100MB disk, ~256MB RAM — inference is serverless
|
| 50 |
|
| 51 |
## ☼ AABAJITOOWINAN / INSTALLATION ◈
|
| 52 |
|
|
|
|
| 55 |
cd focusfriend
|
| 56 |
pip install -r requirements.txt
|
| 57 |
|
| 58 |
+
# Optional: pick a model (default: Qwen/Qwen2.5-7B-Instruct)
|
| 59 |
+
export INFERENCE_MODEL="Qwen/Qwen2.5-7B-Instruct"
|
| 60 |
+
|
| 61 |
+
# Optional: set the HF token
|
| 62 |
+
export HF_TOKEN="hf_..."
|
| 63 |
+
|
| 64 |
+
# Optional: tune the cooldown
|
| 65 |
+
export FOCUSFRIEND_COOLDOWN_SECONDS=10
|
| 66 |
|
| 67 |
python app.py
|
| 68 |
```
|
|
|
|
| 71 |
|
| 72 |
## ☼ ZHOONIYAAWICHIGEWIN / MODEL ◈
|
| 73 |
|
| 74 |
+
| Model (default) | Size | Purpose | License |
|
| 75 |
+
|---|---|---|---|
|
| 76 |
+
| Qwen2.5-7B-Instruct | 7B | Wellness companion chat | Apache 2.0 |
|
| 77 |
+
| Meta-Llama-3-8B-Instruct | 8B | Alternative | Llama 3 Community |
|
| 78 |
+
| gemma-2-9b-it | 9B | Alternative | Gemma License |
|
| 79 |
|
| 80 |
## ☼ MCP KINOOMAAGEWINAN / MCP TOOLS ◈
|
| 81 |
|
|
|
|
| 90 |
## ☼ GIIZHIITAA / BADGES ◈
|
| 91 |
|
| 92 |
- 🎨 **Off-Brand** — Anishinaabe-Solarpunk CSS theme with sun-amber gradients
|
|
|
|
| 93 |
- 📓 **Field Notes** — Blog post about AI wellness companions
|
| 94 |
+
- 🦙 **Tiny Titan** — Default model is 7B; can switch to 1.5B Qwen for true Tiny Titan
|
| 95 |
+
- 🌀 **Cooldowns** — Serverless inference with built-in credit protection
|
| 96 |
+
- ☁ **HF Inference API** — Uses Hugging Face serverless backend (no local GGUF build)
|
| 97 |
|
| 98 |
## ☼ GANAWAABANDAAN / MEET PIP ◈
|
| 99 |
|
projects/focusfriend/inference/llm.py
CHANGED
|
@@ -1,103 +1,72 @@
|
|
| 1 |
"""
|
| 2 |
-
LLM inference wrapper for FocusFriend using
|
| 3 |
-
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
"""
|
|
|
|
| 6 |
|
|
|
|
| 7 |
import os
|
|
|
|
| 8 |
import threading
|
| 9 |
-
import logging
|
| 10 |
from pathlib import Path
|
| 11 |
-
from typing import
|
| 12 |
|
| 13 |
log = logging.getLogger("focusfriend.inference")
|
| 14 |
|
| 15 |
-
#
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
)
|
| 25 |
-
DEFAULT_N_CTX = int(os.environ.get("GEMMA_N_CTX", "8192"))
|
| 26 |
-
DEFAULT_N_THREADS = int(os.environ.get("GEMMA_N_THREADS", str(os.cpu_count() or 4)))
|
| 27 |
|
| 28 |
|
| 29 |
-
def
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
n_threads: int = None,
|
| 33 |
-
) -> Optional[object]:
|
| 34 |
-
"""
|
| 35 |
-
Load the Gemma 4 12B GGUF model via llama.cpp.
|
| 36 |
|
| 37 |
-
Args:
|
| 38 |
-
model_path: Path to GGUF file. Uses env var / default if not provided.
|
| 39 |
-
n_ctx: Context window size. Default 8192.
|
| 40 |
-
n_threads: CPU threads. Default all cores.
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
""
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
with _llm_lock:
|
| 51 |
-
if _llm is not None:
|
| 52 |
-
return _llm
|
| 53 |
-
|
| 54 |
-
model_path = model_path or DEFAULT_MODEL_PATH
|
| 55 |
-
n_ctx = n_ctx or DEFAULT_N_CTX
|
| 56 |
-
n_threads = n_threads or DEFAULT_N_THREADS
|
| 57 |
-
|
| 58 |
-
gguf_path = Path(model_path)
|
| 59 |
-
if not gguf_path.exists():
|
| 60 |
-
log.warning(
|
| 61 |
-
f"Model not found at {gguf_path}. "
|
| 62 |
-
f"Download: huggingface-cli download unsloth/gemma-4-12b-it-GGUF "
|
| 63 |
-
f"--include 'gemma-4-12b-it-Q4_K_M.gguf' --local-dir {DEFAULT_MODEL_DIR}"
|
| 64 |
-
)
|
| 65 |
-
return None
|
| 66 |
-
|
| 67 |
-
try:
|
| 68 |
-
from llama_cpp import Llama
|
| 69 |
-
|
| 70 |
-
log.info(f"Loading Gemma 4 12B from {gguf_path}")
|
| 71 |
-
log.info(f" n_ctx={n_ctx}, n_threads={n_threads}")
|
| 72 |
-
|
| 73 |
-
_llm = Llama(
|
| 74 |
-
model_path=str(gguf_path),
|
| 75 |
-
n_ctx=n_ctx,
|
| 76 |
-
n_threads=n_threads,
|
| 77 |
-
verbose=False,
|
| 78 |
-
)
|
| 79 |
-
log.info("Gemma 4 12B loaded successfully ✓")
|
| 80 |
-
return _llm
|
| 81 |
-
|
| 82 |
-
except ImportError:
|
| 83 |
-
log.warning("llama-cpp-python not installed. pip install llama-cpp-python")
|
| 84 |
-
return None
|
| 85 |
-
except Exception as exc:
|
| 86 |
-
log.error(f"Failed to load Gemma 4 12B: {exc}")
|
| 87 |
-
return None
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
def get_model() -> Optional[object]:
|
| 91 |
-
"""Get the current LLM instance (lazy-loads if needed)."""
|
| 92 |
-
global _llm
|
| 93 |
-
if _llm is not None:
|
| 94 |
-
return _llm
|
| 95 |
-
return load_model()
|
| 96 |
|
| 97 |
|
| 98 |
-
def
|
| 99 |
-
"""
|
| 100 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
|
| 103 |
def generate_response(
|
|
@@ -105,30 +74,24 @@ def generate_response(
|
|
| 105 |
temperature: float = 0.8,
|
| 106 |
max_tokens: int = 300,
|
| 107 |
) -> Optional[str]:
|
| 108 |
-
"""
|
| 109 |
-
Generate a non-streaming response from the model.
|
| 110 |
-
|
| 111 |
-
Args:
|
| 112 |
-
messages: List of {'role': ..., 'content': ...} dicts
|
| 113 |
-
temperature: Generation temperature
|
| 114 |
-
max_tokens: Max output tokens
|
| 115 |
|
| 116 |
-
|
| 117 |
-
|
| 118 |
"""
|
| 119 |
-
|
| 120 |
-
|
| 121 |
return None
|
| 122 |
-
|
| 123 |
try:
|
| 124 |
-
|
|
|
|
| 125 |
messages=messages,
|
|
|
|
| 126 |
temperature=temperature,
|
| 127 |
-
max_tokens=max_tokens,
|
| 128 |
)
|
| 129 |
-
return
|
| 130 |
except Exception as exc:
|
| 131 |
-
log.
|
| 132 |
return None
|
| 133 |
|
| 134 |
|
|
@@ -137,42 +100,40 @@ def generate_stream(
|
|
| 137 |
temperature: float = 0.8,
|
| 138 |
max_tokens: int = 300,
|
| 139 |
) -> Generator[str, None, None]:
|
| 140 |
-
"""
|
| 141 |
-
Generate a streaming response from the model.
|
| 142 |
-
|
| 143 |
-
Args:
|
| 144 |
-
messages: List of {'role': ..., 'content': ...} dicts
|
| 145 |
-
temperature: Generation temperature
|
| 146 |
-
max_tokens: Max output tokens
|
| 147 |
|
| 148 |
-
|
| 149 |
-
|
|
|
|
| 150 |
"""
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
yield "⚠️ Model not loaded. I'm running on fallback mode right now."
|
| 154 |
return
|
| 155 |
-
|
| 156 |
try:
|
| 157 |
-
|
|
|
|
| 158 |
messages=messages,
|
|
|
|
| 159 |
temperature=temperature,
|
| 160 |
-
max_tokens=max_tokens,
|
| 161 |
-
stream=True,
|
| 162 |
)
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
| 170 |
except Exception as exc:
|
| 171 |
-
log.error(f"Streaming error: {exc}")
|
| 172 |
yield f"\n\n⚠️ Something went wrong: {exc}"
|
| 173 |
|
| 174 |
|
| 175 |
def unload_model():
|
| 176 |
-
"""
|
| 177 |
-
|
| 178 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
"""
|
| 2 |
+
LLM inference wrapper for FocusFriend using the Hugging Face Inference API.
|
| 3 |
+
|
| 4 |
+
The previous version loaded a local GGUF (Gemma 4 12B Q4_K_M) via llama-cpp-python.
|
| 5 |
+
That required a heavy compile step on HF Spaces and tied us to a single model. This
|
| 6 |
+
version uses `huggingface_hub.InferenceClient` (serverless) and enforces a
|
| 7 |
+
project-scoped cooldown via `shared.inference_client` to protect your credit budget.
|
| 8 |
+
|
| 9 |
+
To override the model: set `INFERENCE_MODEL` env var.
|
| 10 |
+
Common picks:
|
| 11 |
+
- "Qwen/Qwen2.5-7B-Instruct" (default; sweet spot for chat)
|
| 12 |
+
- "meta-llama/Meta-Llama-3-8B-Instruct"
|
| 13 |
+
- "google/gemma-2-9b-it"
|
| 14 |
"""
|
| 15 |
+
from __future__ import annotations
|
| 16 |
|
| 17 |
+
import logging
|
| 18 |
import os
|
| 19 |
+
import sys
|
| 20 |
import threading
|
|
|
|
| 21 |
from pathlib import Path
|
| 22 |
+
from typing import Generator, List, Dict, Optional
|
| 23 |
|
| 24 |
log = logging.getLogger("focusfriend.inference")
|
| 25 |
|
| 26 |
+
# Add monorepo root so we can import shared.inference_client
|
| 27 |
+
_THIS = Path(__file__).resolve()
|
| 28 |
+
_PROJECT = _THIS.parent.parent
|
| 29 |
+
_REPO_ROOT = _PROJECT.parent
|
| 30 |
+
if str(_REPO_ROOT) not in sys.path:
|
| 31 |
+
sys.path.insert(0, str(_REPO_ROOT))
|
| 32 |
+
|
| 33 |
+
from shared.inference_client import ( # noqa: E402
|
| 34 |
+
InferenceResult,
|
| 35 |
+
chat_messages,
|
| 36 |
+
cooldown_status,
|
| 37 |
+
cooldown_active,
|
| 38 |
+
generate as _client_generate,
|
| 39 |
+
INFERENCE_MODEL as DEFAULT_MODEL,
|
| 40 |
)
|
|
|
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
+
def _model() -> str:
|
| 44 |
+
"""Pick the FocusFriend-specific model, falling back to the default."""
|
| 45 |
+
return os.environ.get("FOCUSFRIEND_MODEL", DEFAULT_MODEL)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
+
def is_model_available() -> bool:
|
| 49 |
+
"""True if the inference API is configured (token or anonymous)."""
|
| 50 |
+
if cooldown_active("focusfriend"):
|
| 51 |
+
return False
|
| 52 |
+
has_token = bool(os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
|
| 53 |
+
# Many small models work anonymously; don't gate hard.
|
| 54 |
+
return bool(_model())
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
|
| 57 |
+
def get_model() -> Optional[str]:
|
| 58 |
+
"""Return the model id we plan to use. None if no model configured."""
|
| 59 |
+
if not _model():
|
| 60 |
+
return None
|
| 61 |
+
return _model()
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def cooldown_snapshot() -> dict:
|
| 65 |
+
"""Public status snapshot for the UI."""
|
| 66 |
+
return {
|
| 67 |
+
"model": _model(),
|
| 68 |
+
"cooldown": cooldown_status("focusfriend"),
|
| 69 |
+
}
|
| 70 |
|
| 71 |
|
| 72 |
def generate_response(
|
|
|
|
| 74 |
temperature: float = 0.8,
|
| 75 |
max_tokens: int = 300,
|
| 76 |
) -> Optional[str]:
|
| 77 |
+
"""One-shot generation. Returns text or None on cooldown/failure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
`messages` follows OpenAI chat format. Caller is responsible for system prompt
|
| 80 |
+
and prior turns.
|
| 81 |
"""
|
| 82 |
+
if cooldown_active("focusfriend"):
|
| 83 |
+
log.info("focusfriend inference skipped (cooldown active)")
|
| 84 |
return None
|
|
|
|
| 85 |
try:
|
| 86 |
+
result = _client_generate(
|
| 87 |
+
project="focusfriend",
|
| 88 |
messages=messages,
|
| 89 |
+
max_new_tokens=max_tokens,
|
| 90 |
temperature=temperature,
|
|
|
|
| 91 |
)
|
| 92 |
+
return result.text
|
| 93 |
except Exception as exc:
|
| 94 |
+
log.warning(f"HF Inference error: {exc}")
|
| 95 |
return None
|
| 96 |
|
| 97 |
|
|
|
|
| 100 |
temperature: float = 0.8,
|
| 101 |
max_tokens: int = 300,
|
| 102 |
) -> Generator[str, None, None]:
|
| 103 |
+
"""Streaming generator. Yields the full response in chunks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
The HF Inference API doesn't return true token-level streams from chat_completion
|
| 106 |
+
in the python client, so we yield the full text and let the UI's natural
|
| 107 |
+
chunking handle the appearance of streaming. Falls back to graceful error.
|
| 108 |
"""
|
| 109 |
+
if cooldown_active("focusfriend"):
|
| 110 |
+
yield "\n\n⏳ Pip is resting. (Inference cooldown — try again in a moment.)"
|
|
|
|
| 111 |
return
|
|
|
|
| 112 |
try:
|
| 113 |
+
result = _client_generate(
|
| 114 |
+
project="focusfriend",
|
| 115 |
messages=messages,
|
| 116 |
+
max_new_tokens=max_tokens,
|
| 117 |
temperature=temperature,
|
|
|
|
|
|
|
| 118 |
)
|
| 119 |
+
# Simulate streaming by chunking the response on word boundaries
|
| 120 |
+
text = result.text
|
| 121 |
+
if not text:
|
| 122 |
+
yield "\n\n[No response]"
|
| 123 |
+
return
|
| 124 |
+
# Yield in word-sized chunks for natural reading pace
|
| 125 |
+
words = text.split(" ")
|
| 126 |
+
for i, word in enumerate(words):
|
| 127 |
+
chunk = word if i == 0 else " " + word
|
| 128 |
+
yield chunk
|
| 129 |
except Exception as exc:
|
|
|
|
| 130 |
yield f"\n\n⚠️ Something went wrong: {exc}"
|
| 131 |
|
| 132 |
|
| 133 |
def unload_model():
|
| 134 |
+
"""No-op for serverless inference (kept for API compat)."""
|
| 135 |
+
return
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
# Re-export for callers that still expect this
|
| 139 |
+
load_model = lambda *args, **kwargs: get_model() # noqa: E731
|
projects/focusfriend/requirements.txt
CHANGED
|
@@ -1,12 +1,18 @@
|
|
| 1 |
-
# FocusFriend — ASCII Wellness Companion
|
| 2 |
# Python 3.10+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
# === Core ===
|
| 5 |
gradio>=5.0
|
| 6 |
numpy>=1.24
|
| 7 |
-
|
| 8 |
-
# === LLM Inference (Gemma 4 12B) ===
|
| 9 |
-
llama-cpp-python>=0.3.0
|
| 10 |
-
|
| 11 |
-
# === Utilities ===
|
| 12 |
huggingface_hub>=0.20
|
|
|
|
| 1 |
+
# FocusFriend — ASCII Wellness Companion "Pip"
|
| 2 |
# Python 3.10+
|
| 3 |
+
#
|
| 4 |
+
# Inference is via the Hugging Face Inference API. No local GGUF, no
|
| 5 |
+
# llama-cpp-python compile step. Cooldown is enforced in
|
| 6 |
+
# `shared/inference_client.py` to protect your credit budget.
|
| 7 |
+
#
|
| 8 |
+
# Space env vars (Settings → Variables and secrets):
|
| 9 |
+
# HF_TOKEN — your HF token (anonymous works for many models)
|
| 10 |
+
# INFERENCE_MODEL — default model id
|
| 11 |
+
# FOCUSFRIEND_MODEL — override the model for this project
|
| 12 |
+
# FOCUSFRIEND_COOLDOWN_SECONDS — gap between inference calls (default 10)
|
| 13 |
+
# INFERENCE_PROVIDER — "hf-inference" (default) or paid provider
|
| 14 |
+
# INFERENCE_MAX_TOKENS — per-call cap (default 220)
|
| 15 |
|
|
|
|
| 16 |
gradio>=5.0
|
| 17 |
numpy>=1.24
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
huggingface_hub>=0.20
|
projects/tinybard/README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
title: ᐴ TinyBard ᔔ
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: gradio
|
|
@@ -11,25 +11,24 @@ license: apache-2.0
|
|
| 11 |
tags:
|
| 12 |
- text-adventure
|
| 13 |
- interactive-fiction
|
| 14 |
-
- llama-cpp
|
| 15 |
- thousand-token-wood
|
| 16 |
- build-small-hackathon
|
| 17 |
- tiny-titan
|
| 18 |
-
- llama-champion
|
| 19 |
- off-brand
|
| 20 |
-
- off-the-grid
|
| 21 |
- mcp-server
|
| 22 |
- anishinaabe
|
| 23 |
- solarpunk
|
|
|
|
|
|
|
| 24 |
---
|
| 25 |
|
| 26 |
# ◈──◆──◇ ᐴ TINYBARD ᔔ AADIZOOKAAN-AKINOOMAAGEWIN / STORY-TELLING ENGINE ◇──◆──◈
|
| 27 |
|
| 28 |
-
> **A
|
| 29 |
>
|
| 30 |
> ᐴ The land remembers the stories. ᔔ ☼ ☘ ≈
|
| 31 |
|
| 32 |
-
TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML/CSS/JS frontend
|
| 33 |
|
| 34 |
## ◆ GASHKITOONAN / CAPABILITIES ◈
|
| 35 |
|
|
@@ -37,15 +36,16 @@ TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML
|
|
| 37 |
- **◇ Three Aadizookaanan / Genres ◇** — Aadizookaan (Fantasy), Ish piming (Sci-Fi), Mashkodewaazibi (Cyberpunk)
|
| 38 |
- **◇ Misko-Aki / CRT Terminal ◇** — Cedar-copper cabinet, sun-amber phosphor, frost-on-glass scanlines
|
| 39 |
- **◇ MCP Kinoomaagewinan / Tools ◇** — `start_game` and `make_choice` exposed as MCP tools
|
| 40 |
-
- **◇
|
| 41 |
-
- **◇
|
| 42 |
-
- **◇
|
|
|
|
| 43 |
|
| 44 |
## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
|
| 45 |
|
| 46 |
- Python 3.10+
|
| 47 |
-
-
|
| 48 |
-
- ~
|
| 49 |
|
| 50 |
## ◇ AABAJITOOWINAN / INSTALLATION ◈
|
| 51 |
|
|
@@ -54,12 +54,16 @@ git clone https://github.com/nbiish/tinybard.git
|
|
| 54 |
cd tinybard
|
| 55 |
pip install -r requirements.txt
|
| 56 |
|
| 57 |
-
#
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
python app.py
|
| 65 |
```
|
|
@@ -82,42 +86,58 @@ hum with a faint violet energy...
|
|
| 82 |
## ☼ NAANAAGADAWENINDIZOWIN / VERIFICATION ◈
|
| 83 |
|
| 84 |
```bash
|
| 85 |
-
curl -X POST http://localhost:7860/
|
| 86 |
-H "Content-Type: application/json" \
|
| 87 |
-
-d '{"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
```
|
| 89 |
|
| 90 |
-
Returns
|
| 91 |
|
| 92 |
## ◈ MODEL ◇
|
| 93 |
|
| 94 |
-
| Model | Size | Purpose | License |
|
| 95 |
-
|---
|
| 96 |
-
|
|
|
|
|
| 97 |
|
| 98 |
-
|
| 99 |
|
| 100 |
## ◇ MCP KINOOMAAGEWINAN / TOOLS ◈
|
| 101 |
|
| 102 |
-
TinyBard runs with `mcp_server=True`, exposing these tools:
|
| 103 |
|
| 104 |
-
- **`
|
| 105 |
-
- **`
|
|
|
|
| 106 |
|
| 107 |
Connect from any MCP client (Claude Desktop, Cursor, etc.) to the SSE endpoint at `/gradio/gradio_api/mcp/`.
|
| 108 |
|
| 109 |
-
##
|
| 110 |
|
| 111 |
-
- **◆
|
| 112 |
-
- **◆ Tiny Titan** — Model is 1.5B (well under 4B limit)
|
| 113 |
- **◆ Off-Brand** — Fully custom FastAPI+Gradio frontend
|
| 114 |
-
- **◆ Off the Grid** — Fully local, no API calls
|
| 115 |
- **◆ Field Notes** — Blog post about tiny model interactive fiction
|
| 116 |
|
| 117 |
## ☼ GANAWENDAAGWAD / SECURITY ◈
|
| 118 |
|
| 119 |
-
PQC standard for any future API keys via the `pqc-secrets` skill (ML-KEM-768 + AES-256-GCM). At present, the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
---
|
| 122 |
|
| 123 |
-
◈──◆──◇ ☼ TinyBard v1.
|
|
|
|
| 1 |
---
|
| 2 |
title: ᐴ TinyBard ᔔ
|
| 3 |
+
emoji: ☀️
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: yellow
|
| 6 |
sdk: gradio
|
|
|
|
| 11 |
tags:
|
| 12 |
- text-adventure
|
| 13 |
- interactive-fiction
|
|
|
|
| 14 |
- thousand-token-wood
|
| 15 |
- build-small-hackathon
|
| 16 |
- tiny-titan
|
|
|
|
| 17 |
- off-brand
|
|
|
|
| 18 |
- mcp-server
|
| 19 |
- anishinaabe
|
| 20 |
- solarpunk
|
| 21 |
+
- inference-api
|
| 22 |
+
- cooldowns
|
| 23 |
---
|
| 24 |
|
| 25 |
# ◈──◆──◇ ᐴ TINYBARD ᔔ AADIZOOKAAN-AKINOOMAAGEWIN / STORY-TELLING ENGINE ◇──◆──◈
|
| 26 |
|
| 27 |
+
> **A small LLM fires five-minute interactive text adventures in a cedar-and-copper CRT terminal.**
|
| 28 |
>
|
| 29 |
> ᐴ The land remembers the stories. ᔔ ☼ ☘ ≈
|
| 30 |
|
| 31 |
+
TinyBard uses FastAPI + `mount_gradio_app` (Gradio 6.0) with a fully custom HTML/CSS/JS frontend, **MCP server mode** enabled, and an **HF Inference API** backend. Every adventure is procedurally generated — rooms, NPCs, items, and branching narratives on the fly.
|
| 32 |
|
| 33 |
## ◆ GASHKITOONAN / CAPABILITIES ◈
|
| 34 |
|
|
|
|
| 36 |
- **◇ Three Aadizookaanan / Genres ◇** — Aadizookaan (Fantasy), Ish piming (Sci-Fi), Mashkodewaazibi (Cyberpunk)
|
| 37 |
- **◇ Misko-Aki / CRT Terminal ◇** — Cedar-copper cabinet, sun-amber phosphor, frost-on-glass scanlines
|
| 38 |
- **◇ MCP Kinoomaagewinan / Tools ◇** — `start_game` and `make_choice` exposed as MCP tools
|
| 39 |
+
- **◇ Giiwenaabik / Inference API ◇** — Serverless HF Inference API; no local GGUF, no build step
|
| 40 |
+
- **◇ Asabiikesiwin / Cooldown ◇** — 6s default between inference calls to protect your credit budget
|
| 41 |
+
- **◇ Bmaad-ziibi / Procedural Fallback ◇** — Full engine works without the LLM
|
| 42 |
+
- **◇ Anishinaabe-Solarpunk ◇** — Sky-to-sunrise palette, syllabic framings, biophilic motifs
|
| 43 |
|
| 44 |
## ☼ NITAM-AABAJICHIGANAN / PREREQUISITES ◈
|
| 45 |
|
| 46 |
- Python 3.10+
|
| 47 |
+
- A Hugging Face token (for the Inference API; many small models work anonymously)
|
| 48 |
+
- ~100MB disk, ~256MB RAM — the model is serverless, not local
|
| 49 |
|
| 50 |
## ◇ AABAJITOOWINAN / INSTALLATION ◈
|
| 51 |
|
|
|
|
| 54 |
cd tinybard
|
| 55 |
pip install -r requirements.txt
|
| 56 |
|
| 57 |
+
# Optional: pick a model (default: Qwen/Qwen2.5-1.5B-Instruct — small + fast + free)
|
| 58 |
+
export INFERENCE_MODEL="Qwen/Qwen2.5-1.5B-Instruct"
|
| 59 |
+
# Or for the originally-intended VibeThinker 1.5B:
|
| 60 |
+
# export INFERENCE_MODEL="mradermacher/VibeThinker-1.5B-GGUF"
|
| 61 |
|
| 62 |
+
# Optional: set the HF token (anonymous works for many models)
|
| 63 |
+
export HF_TOKEN="hf_..."
|
| 64 |
+
|
| 65 |
+
# Optional: tune the cooldown
|
| 66 |
+
export TINYBARD_COOLDOWN_SECONDS=6
|
| 67 |
|
| 68 |
python app.py
|
| 69 |
```
|
|
|
|
| 86 |
## ☼ NAANAAGADAWENINDIZOWIN / VERIFICATION ◈
|
| 87 |
|
| 88 |
```bash
|
| 89 |
+
curl -X POST http://localhost:7860/api/game/start \
|
| 90 |
-H "Content-Type: application/json" \
|
| 91 |
+
-d '{"genre": "fantasy"}'
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
Returns clean JSON: `{"story", "choices", "health", "step", "game_over", "history"}`.
|
| 95 |
+
|
| 96 |
+
```bash
|
| 97 |
+
curl http://localhost:7860/api/model_status
|
| 98 |
```
|
| 99 |
|
| 100 |
+
Returns: `{"model": "...", "cooldown": {"active": bool, "remaining_seconds": float, "window_seconds": float}}`.
|
| 101 |
|
| 102 |
## ◈ MODEL ◇
|
| 103 |
|
| 104 |
+
| Model (default) | Size | Purpose | License |
|
| 105 |
+
|---|---|---|---|
|
| 106 |
+
| Qwen2.5-1.5B-Instruct | 1.5B | Interactive story generation | Apache 2.0 |
|
| 107 |
+
| VibeThinker 1.5B | 1.5B | Alternative — also tiny | Apache 2.0 |
|
| 108 |
|
| 109 |
+
Override `INFERENCE_MODEL` to any model that supports `chat_completion` on the HF Inference API. The 1.5B defaults fit the **Tiny Titan** badge.
|
| 110 |
|
| 111 |
## ◇ MCP KINOOMAAGEWINAN / TOOLS ◈
|
| 112 |
|
| 113 |
+
TinyBard runs with `mcp_server=True`, exposing these tools (also available as FastAPI endpoints):
|
| 114 |
|
| 115 |
+
- **`/api/game/start`** (POST `{"genre": "fantasy|scifi|cyberpunk"}`) — Start an adventure
|
| 116 |
+
- **`/api/game/choice`** (POST `{choice, genre, step, health, history}`) — Submit a player choice
|
| 117 |
+
- **`/api/model_status`** (GET) — Check the inference model + cooldown state
|
| 118 |
|
| 119 |
Connect from any MCP client (Claude Desktop, Cursor, etc.) to the SSE endpoint at `/gradio/gradio_api/mcp/`.
|
| 120 |
|
| 121 |
+
## ◇ GIIZHIITAA / BADGE TARGETS ◇
|
| 122 |
|
| 123 |
+
- **◆ Tiny Titan** — Model ≤ 1.5B (well under 4B limit)
|
|
|
|
| 124 |
- **◆ Off-Brand** — Fully custom FastAPI+Gradio frontend
|
|
|
|
| 125 |
- **◆ Field Notes** — Blog post about tiny model interactive fiction
|
| 126 |
|
| 127 |
## ☼ GANAWENDAAGWAD / SECURITY ◈
|
| 128 |
|
| 129 |
+
PQC standard for any future API keys via the `pqc-secrets` skill (ML-KEM-768 + AES-256-GCM). At present, only the HF token is in flight (read from env var, never written to disk).
|
| 130 |
+
|
| 131 |
+
## ◇ AABAAJICHIGANAN / COOLDOWNS ◈
|
| 132 |
+
|
| 133 |
+
The `shared/inference_client.py` module enforces per-project cooldowns. Cooldown protects your HF/Modal credit budget from runaway re-rolls. Defaults:
|
| 134 |
+
|
| 135 |
+
- `tinybard`: 6s
|
| 136 |
+
- `focusfriend`: 10s
|
| 137 |
+
- `crittercalm`: 12s
|
| 138 |
+
|
| 139 |
+
Override per project via Space env vars (`TINYBARD_COOLDOWN_SECONDS`, etc.).
|
| 140 |
|
| 141 |
---
|
| 142 |
|
| 143 |
+
◈──◆──◇ ☼ TinyBard v1.1 · Cedar Edition · Anishinaabe Solarpunk · Inference API ◇──◆──◈
|
projects/tinybard/app.py
CHANGED
|
@@ -17,6 +17,7 @@ import os
|
|
| 17 |
import json
|
| 18 |
import random
|
| 19 |
import logging
|
|
|
|
| 20 |
from pathlib import Path
|
| 21 |
from typing import Optional, Dict, List
|
| 22 |
|
|
@@ -26,6 +27,19 @@ from fastapi.responses import HTMLResponse
|
|
| 26 |
from fastapi.staticfiles import StaticFiles
|
| 27 |
from gradio import mount_gradio_app
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
logging.basicConfig(
|
| 30 |
level=logging.INFO,
|
| 31 |
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
|
@@ -38,46 +52,34 @@ log = logging.getLogger("tinybard")
|
|
| 38 |
BASE_DIR = Path(__file__).parent
|
| 39 |
STATIC_DIR = BASE_DIR / "static"
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
)
|
| 45 |
|
| 46 |
# ---------------------------------------------------------------------------
|
| 47 |
# Llama.cpp Inference Setup
|
| 48 |
# ---------------------------------------------------------------------------
|
| 49 |
-
|
| 50 |
-
|
| 51 |
|
| 52 |
|
| 53 |
-
def
|
| 54 |
-
"""
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
| 60 |
|
| 61 |
-
if not Path(MODEL_PATH).exists():
|
| 62 |
-
log.warning(f"Model file not found at {MODEL_PATH}. Fallback mode active.")
|
| 63 |
-
_llm_failed = True
|
| 64 |
-
return None
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
n_threads=int(os.environ.get("TINYBARD_THREADS", "4")),
|
| 73 |
-
verbose=False,
|
| 74 |
-
)
|
| 75 |
-
log.info("Model loaded successfully ✓")
|
| 76 |
-
return _llm
|
| 77 |
-
except Exception as e:
|
| 78 |
-
log.error(f"Failed to load LLM model: {e}")
|
| 79 |
-
_llm_failed = True
|
| 80 |
-
return None
|
| 81 |
|
| 82 |
|
| 83 |
# ---------------------------------------------------------------------------
|
|
@@ -189,44 +191,72 @@ def generate_procedural_step(genre: str, step: int, health: int, choice: str = "
|
|
| 189 |
|
| 190 |
|
| 191 |
# ---------------------------------------------------------------------------
|
| 192 |
-
# LLM Generation Logic
|
| 193 |
# ---------------------------------------------------------------------------
|
| 194 |
-
def
|
| 195 |
-
"""
|
| 196 |
-
|
| 197 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 198 |
return ""
|
| 199 |
try:
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
|
|
|
|
|
|
| 203 |
temperature=0.7,
|
| 204 |
-
stop=["\n\n", "User:", "Narrator:"],
|
| 205 |
)
|
| 206 |
-
return
|
|
|
|
|
|
|
|
|
|
| 207 |
except Exception as e:
|
| 208 |
-
log.
|
| 209 |
return ""
|
| 210 |
|
| 211 |
|
| 212 |
-
def
|
| 213 |
-
"""
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
"
|
| 218 |
-
"
|
| 219 |
-
"2. Keep descriptions highly atmospheric, but short (under 3 sentences).\n"
|
| 220 |
-
"3. Focus on action, mystery, and choice.\n\n"
|
| 221 |
)
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
|
|
|
|
|
|
|
|
|
| 230 |
|
| 231 |
|
| 232 |
# ---------------------------------------------------------------------------
|
|
@@ -257,18 +287,11 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 257 |
if genre not in ["fantasy", "scifi", "cyberpunk"]:
|
| 258 |
genre = "fantasy"
|
| 259 |
|
| 260 |
-
|
| 261 |
-
if not llm:
|
| 262 |
-
result = generate_procedural_step(genre, 0, 100)
|
| 263 |
-
return (
|
| 264 |
-
result["story"], result["choices"], result["health"],
|
| 265 |
-
result["step"], result["game_over"],
|
| 266 |
-
json.dumps(result.get("history", []))
|
| 267 |
-
)
|
| 268 |
-
|
| 269 |
instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
|
| 270 |
-
story = generate_llm_story(
|
| 271 |
if not story:
|
|
|
|
| 272 |
result = generate_procedural_step(genre, 0, 100)
|
| 273 |
return (
|
| 274 |
result["story"], result["choices"], result["health"],
|
|
@@ -277,15 +300,11 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 277 |
)
|
| 278 |
|
| 279 |
history = [{"role": "narrator", "text": story}]
|
| 280 |
-
|
| 281 |
-
"Provide exactly 3 short, distinct choices for the player. "
|
| 282 |
-
"Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
|
| 283 |
-
)
|
| 284 |
-
choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
|
| 285 |
-
|
| 286 |
-
choices = _parse_choices(choices_text)
|
| 287 |
if len(choices) < 2:
|
| 288 |
-
|
|
|
|
|
|
|
| 289 |
|
| 290 |
return (story, choices[:3], 100, 1, False, json.dumps(history))
|
| 291 |
|
|
@@ -296,18 +315,10 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 296 |
except Exception:
|
| 297 |
history = []
|
| 298 |
|
| 299 |
-
llm = get_llm()
|
| 300 |
step = int(step)
|
| 301 |
health = int(health)
|
| 302 |
|
| 303 |
-
|
| 304 |
-
result = generate_procedural_step(genre, step, health, choice)
|
| 305 |
-
return (
|
| 306 |
-
result["story"], result["choices"], result["health"],
|
| 307 |
-
result["step"], result["game_over"],
|
| 308 |
-
json.dumps(result.get("history", history))
|
| 309 |
-
)
|
| 310 |
-
|
| 311 |
history.append({"role": "player", "text": choice})
|
| 312 |
|
| 313 |
health_delta = random.choice([-15, 0, 10])
|
|
@@ -315,7 +326,7 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 315 |
|
| 316 |
if new_health <= 0:
|
| 317 |
instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
|
| 318 |
-
story = generate_llm_story(
|
| 319 |
return (
|
| 320 |
story or "Your strength fails. The adventure ends in darkness.",
|
| 321 |
[], 0, step + 1, True, json.dumps(history)
|
|
@@ -323,14 +334,14 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 323 |
|
| 324 |
if step >= 4:
|
| 325 |
instruction = "Narrate the final glorious victory. The adventure ends in success."
|
| 326 |
-
story = generate_llm_story(
|
| 327 |
return (
|
| 328 |
story or "You have achieved your goal! You are victorious!",
|
| 329 |
[], new_health, step + 1, True, json.dumps(history)
|
| 330 |
)
|
| 331 |
|
| 332 |
instruction = "Narrate what happens next as a result of the player's choice."
|
| 333 |
-
story = generate_llm_story(
|
| 334 |
if not story:
|
| 335 |
result = generate_procedural_step(genre, step, health, choice)
|
| 336 |
return (
|
|
@@ -341,13 +352,7 @@ def create_gradio_app() -> gr.Blocks:
|
|
| 341 |
|
| 342 |
history.append({"role": "narrator", "text": story})
|
| 343 |
|
| 344 |
-
|
| 345 |
-
"Provide exactly 3 short, distinct choices. "
|
| 346 |
-
"Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
|
| 347 |
-
)
|
| 348 |
-
choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
|
| 349 |
-
|
| 350 |
-
choices = _parse_choices(choices_text)
|
| 351 |
if len(choices) < 2:
|
| 352 |
choices = ["Move forward", "Look around", "Rest a moment"]
|
| 353 |
|
|
@@ -400,13 +405,8 @@ async def homepage():
|
|
| 400 |
return HTMLResponse("<h1>TinyBard retro terminal under construction!</h1>")
|
| 401 |
@fastapi_app.get("/api/model_status")
|
| 402 |
async def model_status():
|
| 403 |
-
"""Check
|
| 404 |
-
|
| 405 |
-
return {
|
| 406 |
-
"available": llm is not None,
|
| 407 |
-
"model_path": MODEL_PATH,
|
| 408 |
-
"fallback": _llm_failed
|
| 409 |
-
}
|
| 410 |
|
| 411 |
|
| 412 |
# ---------------------------------------------------------------------------
|
|
@@ -418,23 +418,20 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
|
|
| 418 |
Returns a dict the frontend can consume directly. Used by both the
|
| 419 |
FastAPI /api/game/* endpoints and the Gradio MCP tools.
|
| 420 |
"""
|
| 421 |
-
|
|
|
|
|
|
|
| 422 |
|
| 423 |
if step == 0:
|
| 424 |
# New game
|
| 425 |
-
if
|
| 426 |
return generate_procedural_step(genre, 0, 100)
|
| 427 |
instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
|
| 428 |
-
story = generate_llm_story(
|
| 429 |
if not story:
|
| 430 |
return generate_procedural_step(genre, 0, 100)
|
| 431 |
history = [{"role": "narrator", "text": story}]
|
| 432 |
-
|
| 433 |
-
"Provide exactly 3 short, distinct choices for the player. "
|
| 434 |
-
"Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
|
| 435 |
-
)
|
| 436 |
-
choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
|
| 437 |
-
choices = _parse_choices(choices_text)
|
| 438 |
if len(choices) < 2:
|
| 439 |
choices = ["Explore the area", "Check your equipment", "Proceed carefully"]
|
| 440 |
return {
|
|
@@ -443,7 +440,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
|
|
| 443 |
}
|
| 444 |
|
| 445 |
# Subsequent turn
|
| 446 |
-
if
|
| 447 |
return generate_procedural_step(genre, step, health, choice)
|
| 448 |
|
| 449 |
history.append({"role": "player", "text": choice})
|
|
@@ -452,7 +449,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
|
|
| 452 |
|
| 453 |
if new_health <= 0:
|
| 454 |
instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
|
| 455 |
-
story = generate_llm_story(
|
| 456 |
return {
|
| 457 |
"story": story or "Your strength fails. The adventure ends in darkness.",
|
| 458 |
"choices": [], "health": 0, "step": step + 1, "game_over": True,
|
|
@@ -461,7 +458,7 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
|
|
| 461 |
|
| 462 |
if step >= 4:
|
| 463 |
instruction = "Narrate the final glorious victory. The adventure ends in success."
|
| 464 |
-
story = generate_llm_story(
|
| 465 |
return {
|
| 466 |
"story": story or "You have achieved your goal! You are victorious!",
|
| 467 |
"choices": [], "health": new_health, "step": step + 1, "game_over": True,
|
|
@@ -469,17 +466,12 @@ def _run_turn(choice: str, genre: str, step: int, health: int, history: List[Dic
|
|
| 469 |
}
|
| 470 |
|
| 471 |
instruction = "Narrate what happens next as a result of the player's choice."
|
| 472 |
-
story = generate_llm_story(
|
| 473 |
if not story:
|
| 474 |
return generate_procedural_step(genre, step, health, choice)
|
| 475 |
history.append({"role": "narrator", "text": story})
|
| 476 |
|
| 477 |
-
|
| 478 |
-
"Provide exactly 3 short, distinct choices. "
|
| 479 |
-
"Format: 1. [choice 1] | 2. [choice 2] | 3. [choice 3]"
|
| 480 |
-
)
|
| 481 |
-
choices_text = generate_llm_story(format_prompt(genre, history, choices_instruction), max_tokens=60)
|
| 482 |
-
choices = _parse_choices(choices_text)
|
| 483 |
if len(choices) < 2:
|
| 484 |
choices = ["Move forward", "Look around", "Rest a moment"]
|
| 485 |
return {
|
|
|
|
| 17 |
import json
|
| 18 |
import random
|
| 19 |
import logging
|
| 20 |
+
import sys
|
| 21 |
from pathlib import Path
|
| 22 |
from typing import Optional, Dict, List
|
| 23 |
|
|
|
|
| 27 |
from fastapi.staticfiles import StaticFiles
|
| 28 |
from gradio import mount_gradio_app
|
| 29 |
|
| 30 |
+
# Inference client with cooldown (no local GGUF, no llama-cpp-python build!)
|
| 31 |
+
# Path layout: monorepo/shared/inference_client.py — go up two parents from this file.
|
| 32 |
+
sys.path.insert(0, str(Path(__file__).resolve().parent.parent.parent))
|
| 33 |
+
from shared.inference_client import (
|
| 34 |
+
InferenceResult,
|
| 35 |
+
cooldown_status,
|
| 36 |
+
cooldown_remaining,
|
| 37 |
+
cooldown_active,
|
| 38 |
+
generate as inference_generate,
|
| 39 |
+
chat_messages,
|
| 40 |
+
INFERENCE_MODEL,
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
logging.basicConfig(
|
| 44 |
level=logging.INFO,
|
| 45 |
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
|
|
|
|
| 52 |
BASE_DIR = Path(__file__).parent
|
| 53 |
STATIC_DIR = BASE_DIR / "static"
|
| 54 |
|
| 55 |
+
# Use HF Inference API (VibeThinker 1.5B by default — small, fast, free tier).
|
| 56 |
+
# Override via Space env var: INFERENCE_MODEL.
|
| 57 |
+
# Cooldown enforced in shared.inference_client.
|
| 58 |
+
TINYBARD_MODEL = os.environ.get("TINYBARD_MODEL", INFERENCE_MODEL)
|
| 59 |
|
| 60 |
# ---------------------------------------------------------------------------
|
| 61 |
# Llama.cpp Inference Setup
|
| 62 |
# ---------------------------------------------------------------------------
|
| 63 |
+
# No local LLM state — every inference call goes through the HF Inference API
|
| 64 |
+
# with cooldown enforcement. Procedural fallback is always available.
|
| 65 |
|
| 66 |
|
| 67 |
+
def llm_available() -> bool:
|
| 68 |
+
"""True if we *might* succeed at an inference call (cooldown not active,
|
| 69 |
+
HF_TOKEN configured, model id is set)."""
|
| 70 |
+
import os
|
| 71 |
+
if not os.environ.get("HF_TOKEN") and not os.environ.get("HUGGINGFACEHUB_API_TOKEN"):
|
| 72 |
+
# Inference API still works anonymously for some models, so don't gate hard.
|
| 73 |
+
pass
|
| 74 |
+
return bool(TINYBARD_MODEL) and not cooldown_active("tinybard")
|
| 75 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
def last_inference_status() -> dict:
|
| 78 |
+
"""Snapshot of the current cooldown + model for /api/model_status."""
|
| 79 |
+
return {
|
| 80 |
+
"model": TINYBARD_MODEL,
|
| 81 |
+
"cooldown": cooldown_status("tinybard"),
|
| 82 |
+
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
|
| 85 |
# ---------------------------------------------------------------------------
|
|
|
|
| 191 |
|
| 192 |
|
| 193 |
# ---------------------------------------------------------------------------
|
| 194 |
+
# LLM Generation Logic (HF Inference API + cooldown)
|
| 195 |
# ---------------------------------------------------------------------------
|
| 196 |
+
def _parse_messages(genre: str, history: List[Dict[str, str]], next_instruction: str) -> list[Dict[str, str]]:
|
| 197 |
+
"""Translate internal history into OpenAI-style chat messages."""
|
| 198 |
+
system = (
|
| 199 |
+
"You are the narrator of an interactive text adventure game. "
|
| 200 |
+
f"Genre: {genre}. Write in the second person ('You...'). "
|
| 201 |
+
"Keep descriptions highly atmospheric but short (under 3 sentences). "
|
| 202 |
+
"Focus on action, mystery, and choice. Do not offer numbered choices unless asked."
|
| 203 |
+
)
|
| 204 |
+
msgs: List[Dict[str, str]] = [{"role": "system", "content": system}]
|
| 205 |
+
for h in (history or []):
|
| 206 |
+
if h.get("role") == "player":
|
| 207 |
+
msgs.append({"role": "user", "content": h["text"]})
|
| 208 |
+
elif h.get("role") == "narrator":
|
| 209 |
+
msgs.append({"role": "assistant", "content": h["text"]})
|
| 210 |
+
msgs.append({"role": "user", "content": next_instruction})
|
| 211 |
+
return msgs
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
def generate_llm_story(
|
| 215 |
+
genre: str,
|
| 216 |
+
history: List[Dict[str, str]],
|
| 217 |
+
next_instruction: str,
|
| 218 |
+
max_tokens: int = 180,
|
| 219 |
+
) -> str:
|
| 220 |
+
"""Generate story text via HF Inference API (with cooldown)."""
|
| 221 |
+
if cooldown_active("tinybard"):
|
| 222 |
+
log.info("tinybard inference skipped (cooldown active)")
|
| 223 |
return ""
|
| 224 |
try:
|
| 225 |
+
msgs = _parse_messages(genre, history, next_instruction)
|
| 226 |
+
result = inference_generate(
|
| 227 |
+
project="tinybard",
|
| 228 |
+
messages=msgs,
|
| 229 |
+
max_new_tokens=max_tokens,
|
| 230 |
temperature=0.7,
|
|
|
|
| 231 |
)
|
| 232 |
+
return result.text
|
| 233 |
+
except RuntimeError:
|
| 234 |
+
# Cooldown — let caller fall back
|
| 235 |
+
return ""
|
| 236 |
except Exception as e:
|
| 237 |
+
log.warning(f"HF Inference error (fallback to procedural): {e}")
|
| 238 |
return ""
|
| 239 |
|
| 240 |
|
| 241 |
+
def generate_llm_choices(genre: str, story_context: str) -> List[str]:
|
| 242 |
+
"""Ask the LLM to produce 3 short distinct choices for the player."""
|
| 243 |
+
if cooldown_active("tinybard"):
|
| 244 |
+
return []
|
| 245 |
+
system = (
|
| 246 |
+
"You generate 3 short, distinct player choices for an interactive text adventure. "
|
| 247 |
+
"Output exactly in the format: 1. <choice> | 2. <choice> | 3. <choice>"
|
|
|
|
|
|
|
| 248 |
)
|
| 249 |
+
user = f"Genre: {genre}. Last story beat: {story_context[:400]}. Give 3 choices."
|
| 250 |
+
try:
|
| 251 |
+
result = inference_generate(
|
| 252 |
+
project="tinybard",
|
| 253 |
+
messages=[{"role": "system", "content": system}, {"role": "user", "content": user}],
|
| 254 |
+
max_new_tokens=80,
|
| 255 |
+
temperature=0.8,
|
| 256 |
+
)
|
| 257 |
+
return _parse_choices(result.text)
|
| 258 |
+
except Exception:
|
| 259 |
+
return []
|
| 260 |
|
| 261 |
|
| 262 |
# ---------------------------------------------------------------------------
|
|
|
|
| 287 |
if genre not in ["fantasy", "scifi", "cyberpunk"]:
|
| 288 |
genre = "fantasy"
|
| 289 |
|
| 290 |
+
# Try LLM first (will skip if cooldown is active)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 291 |
instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
|
| 292 |
+
story = generate_llm_story(genre, [], instruction)
|
| 293 |
if not story:
|
| 294 |
+
# Procedural fallback
|
| 295 |
result = generate_procedural_step(genre, 0, 100)
|
| 296 |
return (
|
| 297 |
result["story"], result["choices"], result["health"],
|
|
|
|
| 300 |
)
|
| 301 |
|
| 302 |
history = [{"role": "narrator", "text": story}]
|
| 303 |
+
choices = generate_llm_choices(genre, story)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
if len(choices) < 2:
|
| 305 |
+
# Use the procedural choices
|
| 306 |
+
fallback = generate_procedural_step(genre, 0, 100)
|
| 307 |
+
choices = fallback["choices"]
|
| 308 |
|
| 309 |
return (story, choices[:3], 100, 1, False, json.dumps(history))
|
| 310 |
|
|
|
|
| 315 |
except Exception:
|
| 316 |
history = []
|
| 317 |
|
|
|
|
| 318 |
step = int(step)
|
| 319 |
health = int(health)
|
| 320 |
|
| 321 |
+
# First try LLM narration
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 322 |
history.append({"role": "player", "text": choice})
|
| 323 |
|
| 324 |
health_delta = random.choice([-15, 0, 10])
|
|
|
|
| 326 |
|
| 327 |
if new_health <= 0:
|
| 328 |
instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
|
| 329 |
+
story = generate_llm_story(genre, history, instruction)
|
| 330 |
return (
|
| 331 |
story or "Your strength fails. The adventure ends in darkness.",
|
| 332 |
[], 0, step + 1, True, json.dumps(history)
|
|
|
|
| 334 |
|
| 335 |
if step >= 4:
|
| 336 |
instruction = "Narrate the final glorious victory. The adventure ends in success."
|
| 337 |
+
story = generate_llm_story(genre, history, instruction)
|
| 338 |
return (
|
| 339 |
story or "You have achieved your goal! You are victorious!",
|
| 340 |
[], new_health, step + 1, True, json.dumps(history)
|
| 341 |
)
|
| 342 |
|
| 343 |
instruction = "Narrate what happens next as a result of the player's choice."
|
| 344 |
+
story = generate_llm_story(genre, history, instruction)
|
| 345 |
if not story:
|
| 346 |
result = generate_procedural_step(genre, step, health, choice)
|
| 347 |
return (
|
|
|
|
| 352 |
|
| 353 |
history.append({"role": "narrator", "text": story})
|
| 354 |
|
| 355 |
+
choices = generate_llm_choices(genre, story)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 356 |
if len(choices) < 2:
|
| 357 |
choices = ["Move forward", "Look around", "Rest a moment"]
|
| 358 |
|
|
|
|
| 405 |
return HTMLResponse("<h1>TinyBard retro terminal under construction!</h1>")
|
| 406 |
@fastapi_app.get("/api/model_status")
|
| 407 |
async def model_status():
|
| 408 |
+
"""Check the inference client + cooldown status."""
|
| 409 |
+
return last_inference_status()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 410 |
|
| 411 |
|
| 412 |
# ---------------------------------------------------------------------------
|
|
|
|
| 418 |
Returns a dict the frontend can consume directly. Used by both the
|
| 419 |
FastAPI /api/game/* endpoints and the Gradio MCP tools.
|
| 420 |
"""
|
| 421 |
+
# Cooldown short-circuit: if active, the game just uses the procedural
|
| 422 |
+
# engine for this turn. This protects your HF/Modal credit budget.
|
| 423 |
+
in_cooldown = cooldown_active("tinybard")
|
| 424 |
|
| 425 |
if step == 0:
|
| 426 |
# New game
|
| 427 |
+
if in_cooldown:
|
| 428 |
return generate_procedural_step(genre, 0, 100)
|
| 429 |
instruction = "Narrate the beginning of the adventure. What happens first? Do not offer choices yet."
|
| 430 |
+
story = generate_llm_story(genre, [], instruction)
|
| 431 |
if not story:
|
| 432 |
return generate_procedural_step(genre, 0, 100)
|
| 433 |
history = [{"role": "narrator", "text": story}]
|
| 434 |
+
choices = generate_llm_choices(genre, story)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 435 |
if len(choices) < 2:
|
| 436 |
choices = ["Explore the area", "Check your equipment", "Proceed carefully"]
|
| 437 |
return {
|
|
|
|
| 440 |
}
|
| 441 |
|
| 442 |
# Subsequent turn
|
| 443 |
+
if in_cooldown:
|
| 444 |
return generate_procedural_step(genre, step, health, choice)
|
| 445 |
|
| 446 |
history.append({"role": "player", "text": choice})
|
|
|
|
| 449 |
|
| 450 |
if new_health <= 0:
|
| 451 |
instruction = "The player has run out of health. Narrate a quick, dramatic end. Game Over."
|
| 452 |
+
story = generate_llm_story(genre, history, instruction)
|
| 453 |
return {
|
| 454 |
"story": story or "Your strength fails. The adventure ends in darkness.",
|
| 455 |
"choices": [], "health": 0, "step": step + 1, "game_over": True,
|
|
|
|
| 458 |
|
| 459 |
if step >= 4:
|
| 460 |
instruction = "Narrate the final glorious victory. The adventure ends in success."
|
| 461 |
+
story = generate_llm_story(genre, history, instruction)
|
| 462 |
return {
|
| 463 |
"story": story or "You have achieved your goal! You are victorious!",
|
| 464 |
"choices": [], "health": new_health, "step": step + 1, "game_over": True,
|
|
|
|
| 466 |
}
|
| 467 |
|
| 468 |
instruction = "Narrate what happens next as a result of the player's choice."
|
| 469 |
+
story = generate_llm_story(genre, history, instruction)
|
| 470 |
if not story:
|
| 471 |
return generate_procedural_step(genre, step, health, choice)
|
| 472 |
history.append({"role": "narrator", "text": story})
|
| 473 |
|
| 474 |
+
choices = generate_llm_choices(genre, story)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 475 |
if len(choices) < 2:
|
| 476 |
choices = ["Move forward", "Look around", "Rest a moment"]
|
| 477 |
return {
|
projects/tinybard/requirements.txt
CHANGED
|
@@ -1,7 +1,18 @@
|
|
| 1 |
# TinyBard — Micro Text Adventure Generator
|
| 2 |
# Python 3.10+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
gradio>=5.0
|
| 5 |
-
|
| 6 |
-
llama-cpp-python>=0.3.0
|
| 7 |
huggingface_hub>=0.20
|
|
|
|
|
|
| 1 |
# TinyBard — Micro Text Adventure Generator
|
| 2 |
# Python 3.10+
|
| 3 |
+
#
|
| 4 |
+
# Inference is via the Hugging Face Inference API (no local GGUF,
|
| 5 |
+
# no llama-cpp-python compile). Cooldown is enforced in
|
| 6 |
+
# `shared/inference_client.py` to protect your credit budget.
|
| 7 |
+
#
|
| 8 |
+
# Set these Space secrets/variables to configure:
|
| 9 |
+
# HF_TOKEN — your HF token (anonymous works for many small models)
|
| 10 |
+
# INFERENCE_MODEL — model id (default: Qwen/Qwen2.5-1.5B-Instruct)
|
| 11 |
+
# TINYBARD_COOLDOWN_SECONDS — gap between inference calls (default 6)
|
| 12 |
+
# INFERENCE_PROVIDER — "hf-inference" (default, free serverless) or paid
|
| 13 |
+
# INFERENCE_MAX_TOKENS — per-call token cap (default 220)
|
| 14 |
|
| 15 |
gradio>=5.0
|
| 16 |
+
fastapi>=0.110
|
|
|
|
| 17 |
huggingface_hub>=0.20
|
| 18 |
+
uvicorn[standard]>=0.27
|
projects/tinybard/static/main.js
CHANGED
|
@@ -36,18 +36,26 @@ async function checkModelStatus() {
|
|
| 36 |
try {
|
| 37 |
const resp = await fetch(`${GRADIO_CLIENT_URL}/api/model_status`);
|
| 38 |
if (!resp.ok) return;
|
| 39 |
-
const
|
| 40 |
-
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
modelStatus.style.color = "var(--asp-sun)";
|
| 43 |
} else {
|
| 44 |
-
modelStatus.textContent = "☘ MODEL
|
| 45 |
modelStatus.style.color = "var(--asp-frost)";
|
| 46 |
}
|
| 47 |
} catch {
|
| 48 |
modelStatus.textContent = "☘ MODEL: ?";
|
| 49 |
}
|
| 50 |
}
|
|
|
|
|
|
|
|
|
|
| 51 |
async function apiCall(endpoint, payload) {
|
| 52 |
// Use the FastAPI clean-JSON endpoints (returns a dict directly).
|
| 53 |
// /api/game/start -> start_game
|
|
|
|
| 36 |
try {
|
| 37 |
const resp = await fetch(`${GRADIO_CLIENT_URL}/api/model_status`);
|
| 38 |
if (!resp.ok) return;
|
| 39 |
+
const s = await resp.json();
|
| 40 |
+
const model = s.model || "inference";
|
| 41 |
+
const cd = s.cooldown || { active: false, remaining_seconds: 0, window_seconds: 0 };
|
| 42 |
+
if (cd.active) {
|
| 43 |
+
modelStatus.textContent = `☘ ${model} / COOLDOWN ${cd.remaining_seconds.toFixed(1)}s`;
|
| 44 |
+
modelStatus.style.color = "var(--asp-ember)";
|
| 45 |
+
} else if (model) {
|
| 46 |
+
modelStatus.textContent = `☘ ${model} / READY`;
|
| 47 |
modelStatus.style.color = "var(--asp-sun)";
|
| 48 |
} else {
|
| 49 |
+
modelStatus.textContent = "☘ NO MODEL / FALLBACK";
|
| 50 |
modelStatus.style.color = "var(--asp-frost)";
|
| 51 |
}
|
| 52 |
} catch {
|
| 53 |
modelStatus.textContent = "☘ MODEL: ?";
|
| 54 |
}
|
| 55 |
}
|
| 56 |
+
|
| 57 |
+
// Poll model status every 2s so cooldown countdown updates
|
| 58 |
+
setInterval(checkModelStatus, 2000);
|
| 59 |
async function apiCall(endpoint, payload) {
|
| 60 |
// Use the FastAPI clean-JSON endpoints (returns a dict directly).
|
| 61 |
// /api/game/start -> start_game
|
shared/inference_client.py
ADDED
|
@@ -0,0 +1,209 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Shared HF Inference Client + Cooldown
|
| 3 |
+
======================================
|
| 4 |
+
Lightweight wrapper around `huggingface_hub.InferenceClient` with:
|
| 5 |
+
|
| 6 |
+
- Per-call cooldown to prevent credit burn on live HF Spaces
|
| 7 |
+
- Async-friendly API
|
| 8 |
+
- Auto-fallback to procedural/story-template engines when inference fails
|
| 9 |
+
- Environment-driven config (works in HF Spaces and local)
|
| 10 |
+
|
| 11 |
+
The cooldown model:
|
| 12 |
+
- Each project has its own cooldown window (default 8s for cheap inference APIs)
|
| 13 |
+
- Within a session, after a successful inference, no new call can run until cooldown expires
|
| 14 |
+
- Failed inference does not start a cooldown (allow quick retry)
|
| 15 |
+
- `cooldown_active()` is the public check; FastAPI handlers short-circuit on active cooldown
|
| 16 |
+
"""
|
| 17 |
+
from __future__ import annotations
|
| 18 |
+
|
| 19 |
+
import os
|
| 20 |
+
import time
|
| 21 |
+
import logging
|
| 22 |
+
import threading
|
| 23 |
+
from dataclasses import dataclass, field
|
| 24 |
+
from typing import Optional, Dict, Any, Callable, List
|
| 25 |
+
|
| 26 |
+
log = logging.getLogger("inference")
|
| 27 |
+
|
| 28 |
+
# ── Environment knobs ─────────────────────────────────────────────────────────
|
| 29 |
+
# Override these in your Space's "Settings → Variables and secrets".
|
| 30 |
+
|
| 31 |
+
# The HF model id used for text generation (VibeThinker 1.5B, Gemma 4 12B, etc.)
|
| 32 |
+
INFERENCE_MODEL = os.environ.get(
|
| 33 |
+
"INFERENCE_MODEL",
|
| 34 |
+
"Qwen/Qwen2.5-1.5B-Instruct", # small, fast, free-tier friendly
|
| 35 |
+
)
|
| 36 |
+
|
| 37 |
+
# Provider: "hf-inference" (free serverless), "together", "fal-ai", "replicate"
|
| 38 |
+
# Free HF inference works for many small models; otherwise use a paid provider.
|
| 39 |
+
INFERENCE_PROVIDER = os.environ.get("INFERENCE_PROVIDER", "hf-inference")
|
| 40 |
+
|
| 41 |
+
# Token — read from HF Space secrets at runtime.
|
| 42 |
+
HF_TOKEN = os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
|
| 43 |
+
|
| 44 |
+
# Default cooldown between inferences, in seconds.
|
| 45 |
+
COOLDOWN_SECONDS = float(os.environ.get("INFERENCE_COOLDOWN_SECONDS", "8"))
|
| 46 |
+
|
| 47 |
+
# Per-project override (keyed by app name)
|
| 48 |
+
PROJECT_COOLDOWN_OVERRIDES = {
|
| 49 |
+
"tinybard": float(os.environ.get("TINYBARD_COOLDOWN_SECONDS", "6")),
|
| 50 |
+
"focusfriend": float(os.environ.get("FOCUSFRIEND_COOLDOWN_SECONDS", "10")),
|
| 51 |
+
"crittercalm": float(os.environ.get("CRITTERCALM_COOLDOWN_SECONDS", "12")),
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
# Max tokens to request (keeps costs bounded)
|
| 55 |
+
MAX_NEW_TOKENS = int(os.environ.get("INFERENCE_MAX_TOKENS", "220"))
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
# ── Cooldown registry ────────────────────────────────────────────────────────
|
| 59 |
+
@dataclass
|
| 60 |
+
class _CooldownState:
|
| 61 |
+
last_call: float = 0.0
|
| 62 |
+
lock: threading.Lock = field(default_factory=threading.Lock)
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
_states: Dict[str, _CooldownState] = {}
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
def _state(project: str) -> _CooldownState:
|
| 69 |
+
if project not in _states:
|
| 70 |
+
_states[project] = _CooldownState()
|
| 71 |
+
return _states[project]
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
def cooldown_seconds_for(project: str) -> float:
|
| 75 |
+
return PROJECT_COOLDOWN_OVERRIDES.get(project, COOLDOWN_SECONDS)
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
def cooldown_active(project: str) -> bool:
|
| 79 |
+
"""Return True if the project is currently in cooldown (cannot run inference)."""
|
| 80 |
+
state = _state(project)
|
| 81 |
+
now = time.time()
|
| 82 |
+
if now - state.last_call < cooldown_seconds_for(project):
|
| 83 |
+
return True
|
| 84 |
+
return False
|
| 85 |
+
|
| 86 |
+
|
| 87 |
+
def cooldown_remaining(project: str) -> float:
|
| 88 |
+
"""Seconds left in the cooldown window (0 if not in cooldown)."""
|
| 89 |
+
state = _state(project)
|
| 90 |
+
elapsed = time.time() - state.last_call
|
| 91 |
+
remaining = cooldown_seconds_for(project) - elapsed
|
| 92 |
+
return max(0.0, remaining)
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def cooldown_status(project: str) -> dict:
|
| 96 |
+
"""Snapshot of cooldown state for the UI."""
|
| 97 |
+
return {
|
| 98 |
+
"active": cooldown_active(project),
|
| 99 |
+
"remaining_seconds": round(cooldown_remaining(project), 2),
|
| 100 |
+
"window_seconds": cooldown_seconds_for(project),
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
def _mark_called(project: str) -> None:
|
| 105 |
+
state = _state(project)
|
| 106 |
+
with state.lock:
|
| 107 |
+
state.last_call = time.time()
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
# ── Inference client wrapper ─────────────────────────────────────────────────
|
| 111 |
+
class InferenceResult:
|
| 112 |
+
"""A small wrapper so callers don't need to know which API returned text."""
|
| 113 |
+
def __init__(self, text: str, model: str, provider: str, latency_s: float):
|
| 114 |
+
self.text = text
|
| 115 |
+
self.model = model
|
| 116 |
+
self.provider = provider
|
| 117 |
+
self.latency_s = latency_s
|
| 118 |
+
|
| 119 |
+
def __repr__(self) -> str:
|
| 120 |
+
return f"InferenceResult(text={self.text[:50]!r}…, model={self.model!r}, latency={self.latency_s:.2f}s)"
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
def _get_client():
|
| 124 |
+
"""Lazy-load the InferenceClient to keep boot fast."""
|
| 125 |
+
from huggingface_hub import InferenceClient
|
| 126 |
+
return InferenceClient(
|
| 127 |
+
model=INFERENCE_MODEL,
|
| 128 |
+
token=HF_TOKEN,
|
| 129 |
+
provider=INFERENCE_PROVIDER,
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def generate(
|
| 134 |
+
project: str,
|
| 135 |
+
messages: List[Dict[str, str]],
|
| 136 |
+
*,
|
| 137 |
+
max_new_tokens: Optional[int] = None,
|
| 138 |
+
temperature: float = 0.7,
|
| 139 |
+
) -> InferenceResult:
|
| 140 |
+
"""Run a chat-style inference call, with cooldown enforcement.
|
| 141 |
+
|
| 142 |
+
`messages` follows OpenAI chat format: [{"role": "user|assistant|system", "content": "..."}].
|
| 143 |
+
Returns InferenceResult with `.text` (string) on success, or raises on failure.
|
| 144 |
+
Caller is responsible for fallback handling.
|
| 145 |
+
"""
|
| 146 |
+
if cooldown_active(project):
|
| 147 |
+
remaining = cooldown_remaining(project)
|
| 148 |
+
raise RuntimeError(
|
| 149 |
+
f"cooldown active for {project!r}: {remaining:.1f}s remaining. "
|
| 150 |
+
f"This protects your HF/Modal credit budget."
|
| 151 |
+
)
|
| 152 |
+
|
| 153 |
+
max_new_tokens = max_new_tokens or MAX_NEW_TOKENS
|
| 154 |
+
client = _get_client()
|
| 155 |
+
start = time.time()
|
| 156 |
+
response = client.chat_completion(
|
| 157 |
+
messages=messages,
|
| 158 |
+
max_tokens=max_new_tokens,
|
| 159 |
+
temperature=temperature,
|
| 160 |
+
)
|
| 161 |
+
latency = time.time() - start
|
| 162 |
+
text = response.choices[0].message.content or ""
|
| 163 |
+
text = text.strip()
|
| 164 |
+
_mark_called(project)
|
| 165 |
+
return InferenceResult(
|
| 166 |
+
text=text,
|
| 167 |
+
model=INFERENCE_MODEL,
|
| 168 |
+
provider=INFERENCE_PROVIDER,
|
| 169 |
+
latency_s=latency,
|
| 170 |
+
)
|
| 171 |
+
|
| 172 |
+
|
| 173 |
+
def force_clear_cooldown(project: str) -> None:
|
| 174 |
+
"""Manual escape hatch (e.g. for testing or admin overrides)."""
|
| 175 |
+
_state(project).last_call = 0.0
|
| 176 |
+
|
| 177 |
+
|
| 178 |
+
# ── Convenience: build messages + format result ──────────────────────────────
|
| 179 |
+
def chat_messages(system: str, user: str, history: Optional[List[Dict[str, str]]] = None) -> List[Dict[str, str]]:
|
| 180 |
+
"""Build an OpenAI-style message list with optional prior turns.
|
| 181 |
+
|
| 182 |
+
`history` is in the same [{role, content}, ...] format. New turns are appended.
|
| 183 |
+
"""
|
| 184 |
+
msgs: List[Dict[str, str]] = [{"role": "system", "content": system}]
|
| 185 |
+
if history:
|
| 186 |
+
msgs.extend(history)
|
| 187 |
+
msgs.append({"role": "user", "content": user})
|
| 188 |
+
return msgs
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
__all__ = [
|
| 192 |
+
"InferenceResult",
|
| 193 |
+
"cooldown_active",
|
| 194 |
+
"cooldown_remaining",
|
| 195 |
+
"cooldown_seconds_for",
|
| 196 |
+
"cooldown_status",
|
| 197 |
+
"force_clear_cooldown",
|
| 198 |
+
"generate",
|
| 199 |
+
"chat_messages",
|
| 200 |
+
"INFERENCE_MODEL",
|
| 201 |
+
"INFERENCE_PROVIDER",
|
| 202 |
+
"MAX_NEW_TOKENS",
|
| 203 |
+
]
|
| 204 |
+
|
| 205 |
+
|
| 206 |
+
if __name__ == "__main__":
|
| 207 |
+
# Smoke test
|
| 208 |
+
for p in ("tinybard", "focusfriend", "crittercalm"):
|
| 209 |
+
print(p, "cooldown:", cooldown_status(p))
|