Spaces:

Nomos42
/

gemma4-chat

Paused

App Files Files Community

Nomos42 commited on 17 days ago

Commit

c613cce

verified ·

1 Parent(s): b40c51f

fix: README.md (gemma3-1b default + examples list-of-lists)

Browse files

Files changed (1) hide show

README.md +13 -21

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: Nomos42 Gemma 4 Chat
 emoji: 🏀
 colorFrom: blue
 colorTo: indigo
@@ -7,39 +7,31 @@ sdk: gradio
 sdk_version: 5.9.1
 app_file: app.py
 pinned: true
-short_description: Nomos42 owned Gemma 4 E2B-IT endpoint for NBA council agents
 ---
-# Nomos42 — Gemma 4 E2B-IT
 Owned LLM inference endpoint for Nomos42 NBA Quant AI and Political Alpha council agents. Replaces external HF Inference Providers as the primary LLM for D1-D9 Hermes councils and the 5 trading-floor agents.
-- **Model:** `google/gemma-4-E2B-it` (smallest Gemma 4 flavor, fits free cpu-basic 16GB)
-- **Effective params:** 2B
-- **Context window:** 4K (app-level cap; model supports 128K but CPU latency is prohibitive)
-- **Multimodal:** text + image + audio input, text output
-- **Hardware:** `cpu-basic` (free, 16 GB RAM, 2 vCPU) — slow inference but stable
-- **Paired with:** `Nomos42/qwen35-chat` for dual-provider redundancy
-## Upgrade path
-To run `google/gemma-4-E4B-it` (4B) or `gemma-4-26B-A4B-it` (MoE): upgrade Nomos42 account to **HF PRO** ($9/mo), switch `hardware: zero-a10g` in this README's frontmatter, and set `MODEL_ID` env var. ZeroGPU gives free A100 bursts.
-## API (programmatic, from the council runner)
-```bash
-curl -X POST https://nomos42-gemma4-chat.hf.space/api/predict \
-  -H "Content-Type: application/json" \
-  -d '{"data": ["Return JSON: {ok:true}", [], 256, 0.3, 0.9]}'
-```
-Or via `gradio_client`:
 ```python
 from gradio_client import Client
 client = Client("Nomos42/gemma4-chat")
 result = client.predict(
     message="Return JSON: {brier:0.22}",
-    history=[],
     max_new_tokens=256,
     temperature=0.3,
     top_p=0.9,
@@ -49,4 +41,4 @@ result = client.predict(
 ## License
-Gemma 4 is governed by the Gemma Terms of Use. This Space is for Nomos42 internal council orchestration only.

 ---
+title: Nomos42 Gemma Chat
 emoji: 🏀
 colorFrom: blue
 colorTo: indigo
 sdk_version: 5.9.1
 app_file: app.py
 pinned: true
+short_description: Nomos42 owned Gemma chat endpoint for council agents
 ---
+# Nomos42 — Gemma chat (council endpoint)
 Owned LLM inference endpoint for Nomos42 NBA Quant AI and Political Alpha council agents. Replaces external HF Inference Providers as the primary LLM for D1-D9 Hermes councils and the 5 trading-floor agents.
+## Models
+| Tier | MODEL_ID env var | Size on disk | Hardware | Status |
+|---|---|---|---|---|
+| **Default (free)** | `google/gemma-3-1b-it` | ~2 GB | `cpu-basic` (16 GB RAM, 2 vCPU) | Active |
+| Gemma 4 E2B (multimodal) | `google/gemma-4-E2B-it` | ~10 GB | needs ZeroGPU/A100 (HF PRO) | Pending PRO upgrade |
+| Gemma 4 E4B (multimodal) | `google/gemma-4-E4B-it` | ~16 GB | needs ZeroGPU/A100 (HF PRO) | Pending PRO upgrade |
+| Gemma 4 MoE | `google/gemma-4-26B-A4B-it` | ~50 GB | needs zero-a10g + PRO | Aspirational |
+The Space name is `gemma4-chat` because the **target** model is Gemma 4. The current default is `gemma-3-1b-it` because the smallest Gemma 4 (`E2B-it`) is 10 GB on disk and OOM-kills `cpu-basic`. Once Nomos42 is upgraded to HF PRO, switch the `MODEL_ID` env var and set `hardware: zero-a10g` in the README frontmatter — no code changes needed.
+## API
 ```python
 from gradio_client import Client
 client = Client("Nomos42/gemma4-chat")
 result = client.predict(
     message="Return JSON: {brier:0.22}",
     max_new_tokens=256,
     temperature=0.3,
     top_p=0.9,
 ## License
+Gemma is governed by the Gemma Terms of Use. This Space is for Nomos42 internal council orchestration only.