Nomos42 commited on
Commit
c613cce
Β·
verified Β·
1 Parent(s): b40c51f

fix: README.md (gemma3-1b default + examples list-of-lists)

Browse files
Files changed (1) hide show
  1. README.md +13 -21
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Nomos42 Gemma 4 Chat
3
  emoji: πŸ€
4
  colorFrom: blue
5
  colorTo: indigo
@@ -7,39 +7,31 @@ sdk: gradio
7
  sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: true
10
- short_description: Nomos42 owned Gemma 4 E2B-IT endpoint for NBA council agents
11
  ---
12
 
13
- # Nomos42 β€” Gemma 4 E2B-IT
14
 
15
  Owned LLM inference endpoint for Nomos42 NBA Quant AI and Political Alpha council agents. Replaces external HF Inference Providers as the primary LLM for D1-D9 Hermes councils and the 5 trading-floor agents.
16
 
17
- - **Model:** `google/gemma-4-E2B-it` (smallest Gemma 4 flavor, fits free cpu-basic 16GB)
18
- - **Effective params:** 2B
19
- - **Context window:** 4K (app-level cap; model supports 128K but CPU latency is prohibitive)
20
- - **Multimodal:** text + image + audio input, text output
21
- - **Hardware:** `cpu-basic` (free, 16 GB RAM, 2 vCPU) β€” slow inference but stable
22
- - **Paired with:** `Nomos42/qwen35-chat` for dual-provider redundancy
23
 
24
- ## Upgrade path
25
- To run `google/gemma-4-E4B-it` (4B) or `gemma-4-26B-A4B-it` (MoE): upgrade Nomos42 account to **HF PRO** ($9/mo), switch `hardware: zero-a10g` in this README's frontmatter, and set `MODEL_ID` env var. ZeroGPU gives free A100 bursts.
 
 
 
 
26
 
27
- ## API (programmatic, from the council runner)
28
 
29
- ```bash
30
- curl -X POST https://nomos42-gemma4-chat.hf.space/api/predict \
31
- -H "Content-Type: application/json" \
32
- -d '{"data": ["Return JSON: {ok:true}", [], 256, 0.3, 0.9]}'
33
- ```
34
-
35
- Or via `gradio_client`:
36
 
37
  ```python
38
  from gradio_client import Client
39
  client = Client("Nomos42/gemma4-chat")
40
  result = client.predict(
41
  message="Return JSON: {brier:0.22}",
42
- history=[],
43
  max_new_tokens=256,
44
  temperature=0.3,
45
  top_p=0.9,
@@ -49,4 +41,4 @@ result = client.predict(
49
 
50
  ## License
51
 
52
- Gemma 4 is governed by the Gemma Terms of Use. This Space is for Nomos42 internal council orchestration only.
 
1
  ---
2
+ title: Nomos42 Gemma Chat
3
  emoji: πŸ€
4
  colorFrom: blue
5
  colorTo: indigo
 
7
  sdk_version: 5.9.1
8
  app_file: app.py
9
  pinned: true
10
+ short_description: Nomos42 owned Gemma chat endpoint for council agents
11
  ---
12
 
13
+ # Nomos42 β€” Gemma chat (council endpoint)
14
 
15
  Owned LLM inference endpoint for Nomos42 NBA Quant AI and Political Alpha council agents. Replaces external HF Inference Providers as the primary LLM for D1-D9 Hermes councils and the 5 trading-floor agents.
16
 
17
+ ## Models
 
 
 
 
 
18
 
19
+ | Tier | MODEL_ID env var | Size on disk | Hardware | Status |
20
+ |---|---|---|---|---|
21
+ | **Default (free)** | `google/gemma-3-1b-it` | ~2 GB | `cpu-basic` (16 GB RAM, 2 vCPU) | Active |
22
+ | Gemma 4 E2B (multimodal) | `google/gemma-4-E2B-it` | ~10 GB | needs ZeroGPU/A100 (HF PRO) | Pending PRO upgrade |
23
+ | Gemma 4 E4B (multimodal) | `google/gemma-4-E4B-it` | ~16 GB | needs ZeroGPU/A100 (HF PRO) | Pending PRO upgrade |
24
+ | Gemma 4 MoE | `google/gemma-4-26B-A4B-it` | ~50 GB | needs zero-a10g + PRO | Aspirational |
25
 
26
+ The Space name is `gemma4-chat` because the **target** model is Gemma 4. The current default is `gemma-3-1b-it` because the smallest Gemma 4 (`E2B-it`) is 10 GB on disk and OOM-kills `cpu-basic`. Once Nomos42 is upgraded to HF PRO, switch the `MODEL_ID` env var and set `hardware: zero-a10g` in the README frontmatter β€” no code changes needed.
27
 
28
+ ## API
 
 
 
 
 
 
29
 
30
  ```python
31
  from gradio_client import Client
32
  client = Client("Nomos42/gemma4-chat")
33
  result = client.predict(
34
  message="Return JSON: {brier:0.22}",
 
35
  max_new_tokens=256,
36
  temperature=0.3,
37
  top_p=0.9,
 
41
 
42
  ## License
43
 
44
+ Gemma is governed by the Gemma Terms of Use. This Space is for Nomos42 internal council orchestration only.