File size: 4,967 Bytes
5dcfc5c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# Figment Prerequisites

This page captures the setup contract for building and demoing Figment v1.

## Eligibility And Repos

Required for the Build Small Hackathon:

* Hugging Face account registered for the hackathon.
* Membership in the `build-small-hackathon` Hugging Face org.
* Gradio Space hosted under that org:
  `https://huggingface.co/spaces/build-small-hackathon/figment`
* Public repo for code and documentation.
* Final submission assets: Space link, demo video, and social post.
* Model total parameters at or below 32B.

## Accounts And Tokens

Required:

* Hugging Face token with write access for repo/Space pushes.
* NVIDIA API Catalog key for hosted Nemotron 3 Nano Omni live mode.
* Hugging Face token or endpoint access only if using a dedicated HF endpoint or Space push flow.
* Modal account with credits for optional future fine-tuning and batch eval.

Build-time optional, depending on the synthetic-data path:

* Mistral API access for teacher generation or critique.
* MiniMax API access for teacher generation or critique.

## Local Machine

Reference local demo machine:

* macOS dev machine with 48 GB unified memory.
* Enough disk/RAM headroom for the local 4B text model, optional quantized weights, and Parakeet ASR dependencies.
* Internet access for initial model/tool downloads.

Local/offline proof target:

* `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16` for local text navigation and first fine-tune target.
* `nvidia/parakeet-rnnt-1.1b` for offline ASR after the local ASR gate passes.
* Local OpenAI-compatible server on `http://127.0.0.1:8001`.
* 16k context by default, 8k fallback.

## CLI Tools

Install or verify:

```bash
git --version
python3 --version
uv --version
hf auth whoami
modal --version
docker --version
llama-server --help
```

Recommended install commands on macOS:

```bash
brew install llama.cpp
python3 -m pip install --upgrade huggingface_hub modal
```

`uvx --from huggingface_hub hf ...` is also acceptable when the `hf` executable is not installed globally.

## Python Dependencies

Runtime dependencies live in `requirements.txt`.

Development, testing, and training dependencies live in `requirements-dev.txt`.

Install:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt -r requirements-dev.txt
```

## Environment Variables

Copy `.env.example` to `.env` locally and fill secrets there. Do not commit `.env`.

Required or expected variables:

* `FIGMENT_MODE` β€” `hosted`, `local`, or `canned`.
* `MODEL_STACK` β€” `omni_native` for hosted demo mode or `local_4b_parakeet` for the gated local/offline path.
* `MODEL_BACKEND` β€” `hosted_omni`, `llama_cpp`, or `canned`.
* `AUDIO_BACKEND` β€” `omni_native`, `parakeet_nemo`, `canned`, or `none`.
* `ALLOW_LOCAL_ASR` β€” set `true` only after Parakeet local ASR is proven and gated.
* `HF_MODEL_ID` β€” defaults to `nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16`.
* `NVIDIA_API_KEY` β€” NVIDIA API Catalog key for hosted Omni mode.
* `NVIDIA_BASE_URL` β€” defaults to `https://integrate.api.nvidia.com/v1`.
* `NVIDIA_MODEL_ID` β€” defaults to `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning`.
* `LOCAL_MODEL_ID` β€” local OpenAI-compatible model id or alias; default target is `nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16`.
* `HF_TOKEN` β€” Hugging Face token for Space pushes or optional HF endpoint access.
* `HF_ENDPOINT_URL` β€” optional dedicated HF Inference Endpoint URL.
* `LLAMA_BASE_URL` β€” local OpenAI-compatible endpoint.
* `FIGMENT_TRACE_DIR` β€” trace export directory.
* `MODAL_PROFILE` β€” optional Modal profile name.
* `MISTRAL_API_KEY` / `MINIMAX_API_KEY` β€” optional teacher-model keys.

## Runtime Modes

Hosted live demo:

* Gradio Space under `build-small-hackathon/figment`.
* Hosted NVIDIA API Catalog / NIM-compatible Nemotron Omni powers live navigator output.
* Rules, retrieval, validation, and trace rendering run in the Space.

Local/offline proof:

* Local Gradio app.
* Local protocol cards and SQLite retrieval.
* Local deterministic rules and validators.
* Local OpenAI-compatible server with Nemotron 3 Nano 4B.
* Optional Parakeet ASR only after `ALLOW_LOCAL_ASR=true` and the local gate passes.

Fallback only:

* Canned traces if hosted model, quota, or Space cold-start reliability fails.
* Canned navigator output if the live model returns invalid JSON or violates validation.

## Verification Checklist

Before implementation starts:

```bash
hf auth whoami
hf repos list --namespace build-small-hackathon --type space --search figment --limit 10
modal token info || modal setup
llama-server --help
python -m pip install -r requirements.txt -r requirements-dev.txt
```

Before submission:

```text
Space boots cold under build-small-hackathon/figment.
Hosted live mode returns validated NVIDIA-hosted Nemotron output.
Local 4B mode runs the same demo case without internet.
No patient PHI is used, logged, or committed.
```