munyew commited on
Commit
b25c445
Β·
verified Β·
1 Parent(s): c15ab4d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +207 -0
README.md ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ - ms
6
+ - ta
7
+ license: apache-2.0
8
+ tags:
9
+ - singapore
10
+ - sovereign-ai
11
+ - edge-ai
12
+ - meralion
13
+ - singlish
14
+ - agentic-innovation
15
+ - android
16
+ - flask
17
+ - whisper
18
+ - on-device
19
+ pipeline_tag: text-generation
20
+ library_name: custom
21
+ ---
22
+
23
+ # MINA Bridge v4 β€” Sovereign Edge AI Gateway
24
+
25
+ **MINA** (My Intelligent National Assistant) is Singapore's sovereign edge AI companion, built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA.
26
+
27
+ `mina-bridge` is the intelligence gateway between the MINA Android APK and the on-device MERaLiON model β€” a lightweight Flask server that handles speech transcription, rule-based agent routing, response generation, and autonomous gap logging, all running locally on a Termux environment with no cloud dependency for inference.
28
+
29
+ ---
30
+
31
+ ## Architecture
32
+
33
+ ```
34
+ Android APK
35
+ β”‚ base64 WAV / pre-transcribed text
36
+ β–Ό
37
+ mina-bridge (Flask :8081)
38
+ β”œβ”€β”€ whisper-cli ← speech-to-text (offline)
39
+ β”œβ”€β”€ route_agent() ← rule-based ARIA routing (no LLM call)
40
+ β”œβ”€β”€ build_prompt() ← agent-specific focused prompt
41
+ β”œβ”€β”€ llama-server :8080 ← MERaLiON-2-3B GGUF inference
42
+ β”œβ”€β”€ append_resources() ← hotlines from mina_knowledge.json
43
+ └── log_gap() + ntfy ← autonomous cloud sync
44
+ ```
45
+
46
+ **Option 3 architecture**: routing is pure Python β€” deterministic, zero-latency, zero hallucination risk. The LLM is called exactly once per turn, only to generate the response text.
47
+
48
+ ---
49
+
50
+ ## Features
51
+
52
+ ### πŸŽ™οΈ Whisper.cpp STT Integration
53
+ Offline speech-to-text via `whisper-cli` subprocess. Accepts base64-encoded WAV from the Android APK, decodes to a temp file, runs `ggml-base.bin`, strips noise tokens (`[BLANK_AUDIO]`, `debugfs`, `MEMPROF`), and returns clean transcript text. No cloud STT dependency.
54
+
55
+ ### 🧭 ARIA Agent Routing
56
+ Four specialist agents dispatched by keyword matching β€” no LLM routing call:
57
+
58
+ | Agent | Trigger keywords | Purpose |
59
+ |---|---|---|
60
+ | **VITA** | `giving up`, `want to die`, `hopeless`, `hurt myself` … | Crisis support |
61
+ | **SENTINEL** | `scam`, `bank account`, `transfer money`, `spf` … | Scam detection |
62
+ | **KRONOS** | `meeting`, `calendar`, `schedule`, `tomorrow` … | Calendar assistance |
63
+ | **MINA** | *(default)* | Stress / general emotional support |
64
+
65
+ ### 🧠 Knowledge Base Integration
66
+ Reads `mina_knowledge.json` at runtime for:
67
+ - Crisis hotline numbers (SOS Lifeline, IMH) β€” phone + WhatsApp links
68
+ - Capability flags (`make_phone_call`, `send_whatsapp`, `check_calendar`, …)
69
+
70
+ Resources appended to VITA and SENTINEL replies are driven by the knowledge file, not hardcoded strings. Update the JSON to update the response β€” no code change needed.
71
+
72
+ ### πŸ“‹ Gap Logging & Autonomous Learning
73
+ Every time a user requests a capability MINA doesn't yet have, `log_gap()`:
74
+ 1. Appends a structured entry to `gaps/gap_log.jsonl` (local, persistent)
75
+ 2. POSTs to `ntfy.sh/{NTFY_TOPIC}` for real-time cloud sync
76
+
77
+ ```json
78
+ {
79
+ "timestamp": "2026-05-02T14:23:01",
80
+ "gap_type": "make_phone_call",
81
+ "user_request": "can you call SOS for me",
82
+ "context": "User requested phone call to SOS",
83
+ "status": "pending"
84
+ }
85
+ ```
86
+
87
+ The `NTFY_TOPIC` env var controls the notification channel (default: `roar-imda-demo`). Gap notifications appear in the ntfy app with tag `brain` for triage. Network failures are caught silently β€” gap is always written locally first.
88
+
89
+ ### πŸ”’ Sovereign & Offline-First
90
+ All inference runs on-device. The only outbound network call is the optional ntfy gap sync (non-blocking, non-critical path). No user speech or transcript data leaves the device during inference.
91
+
92
+ ---
93
+
94
+ ## Endpoints
95
+
96
+ ### `GET /health`
97
+ Liveness probe. Android APK polls this at startup every 3 s.
98
+ ```json
99
+ {"status": "ok", "llama": true, "bridge": "v2"}
100
+ ```
101
+
102
+ ### `POST /completion`
103
+ Main inference endpoint. Accepts two input modes:
104
+
105
+ **Mode A β€” Pre-transcribed text** (fast path):
106
+ ```json
107
+ {"transcript": "I have a meeting tomorrow morning"}
108
+ ```
109
+
110
+ **Mode B β€” Raw WAV audio** (whisper path):
111
+ ```json
112
+ {
113
+ "prompt": [{
114
+ "prompt_string": "...",
115
+ "multimodal_data": ["<base64-WAV>"]
116
+ }]
117
+ }
118
+ ```
119
+
120
+ **Response**:
121
+ ```json
122
+ {
123
+ "reply": "Sure lah, let me check your calendar!",
124
+ "content": "Sure lah, let me check your calendar!",
125
+ "transcript": "I have a meeting tomorrow morning",
126
+ "emotion": "neutral",
127
+ "valence": 0.50,
128
+ "arousal": 0.38,
129
+ "dominance": 0.50,
130
+ "agent": "KRONOS",
131
+ "risk": "none",
132
+ "elapsed": 1.84
133
+ }
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Configuration
139
+
140
+ | Env var | Default | Description |
141
+ |---|---|---|
142
+ | `LLAMA_URL` | `http://localhost:8080` | llama-server endpoint |
143
+ | `BRIDGE_PORT` | `8081` | Flask listen port |
144
+ | `MAX_TOKENS` | `256` | Max tokens for transcription call |
145
+ | `NTFY_TOPIC` | `roar-imda-demo` | ntfy.sh topic for gap sync |
146
+
147
+ ---
148
+
149
+ ## Deployment (Termux)
150
+
151
+ ```bash
152
+ # Prerequisites on device
153
+ pkg install python whisper-cpp llama-cpp
154
+
155
+ # Clone and deploy
156
+ git clone https://huggingface.co/munyew/mina-bridge
157
+ cd mina-bridge
158
+
159
+ # Start bridge (watchdog via start_mina.sh)
160
+ nohup python3 bridge.py >> bridge.log 2>&1 &
161
+
162
+ # Or restart after update
163
+ pkill -f bridge.py && sleep 3 && nohup python3 bridge.py >> bridge.log 2>&1 &
164
+ ```
165
+
166
+ Expected paths on Termux:
167
+ ```
168
+ ~/whisper.cpp/build/bin/whisper-cli
169
+ ~/whisper.cpp/models/ggml-base.bin
170
+ ~/meralion/meralion-3b-decoder-q8_0.gguf
171
+ ~/meralion/mina_knowledge.json
172
+ ~/meralion/gaps/gap_log.jsonl ← auto-created
173
+ ```
174
+
175
+ ---
176
+
177
+ ## Roadmap
178
+
179
+ | Priority | Gap | Solution |
180
+ |---|---|---|
181
+ | πŸ”΄ Critical | Emotion detection upgrade | Replace VAD lookup table with [MERaLiON-SER-v1](https://huggingface.co/MERaLiON/MERaLiON-SER-v1) |
182
+ | 🟠 High | Singlish Mental Health ASR | Fine-tune MERaLiON-2-3B on v5 dataset (3240 audio files) |
183
+ | 🟠 High | Singapore Legal Domain ASR | Generate + fine-tune on CPF/HDB/PDPA domain |
184
+ | 🟑 Medium | Edge-optimised SER | Quantize MERaLiON-SER-v1 to INT8/TFLite < 200 MB |
185
+ | 🟑 Medium | Code-switched Singlish-Mandarin | Pending MNSC dataset from IMDA/NUS |
186
+
187
+ ---
188
+
189
+ ## Citation
190
+
191
+ ```bibtex
192
+ @software{mina_bridge_2026,
193
+ title = {MINA Bridge: Sovereign Edge AI Gateway for Singapore},
194
+ author = {Loh, Mun Yew (Darren)},
195
+ year = {2026},
196
+ url = {https://huggingface.co/munyew/mina-bridge},
197
+ note = {IMDA NMLP β€” ATxSG 2026}
198
+ }
199
+ ```
200
+
201
+ ---
202
+
203
+ ## Acknowledgements
204
+
205
+ Built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA National Multimodal LLM Programme.
206
+ Speech transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
207
+ On-device inference via [llama.cpp](https://github.com/ggerganov/llama.cpp).