File size: 6,804 Bytes
264f56c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | ---
language:
- en
- zh
- ms
- ta
license: apache-2.0
tags:
- singapore
- sovereign-ai
- edge-ai
- meralion
- singlish
- agentic-innovation
- android
- flask
- whisper
- on-device
pipeline_tag: text-generation
library_name: custom
---
# MINA Bridge v4 β Sovereign Edge AI Gateway
**MINA** (My Intelligent National Assistant) is Singapore's sovereign edge AI companion, built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA.
`mina-bridge` is the intelligence gateway between the MINA Android APK and the on-device MERaLiON model β a lightweight Flask server that handles speech transcription, rule-based agent routing, response generation, and autonomous gap logging, all running locally on a Termux environment with no cloud dependency for inference.
---
## Architecture
```
Android APK
β base64 WAV / pre-transcribed text
βΌ
mina-bridge (Flask :8081)
βββ whisper-cli β speech-to-text (offline)
βββ route_agent() β rule-based ARIA routing (no LLM call)
βββ build_prompt() β agent-specific focused prompt
βββ llama-server :8080 β MERaLiON-2-3B GGUF inference
βββ append_resources() β hotlines from mina_knowledge.json
βββ log_gap() + ntfy β autonomous cloud sync
```
**Option 3 architecture**: routing is pure Python β deterministic, zero-latency, zero hallucination risk. The LLM is called exactly once per turn, only to generate the response text.
---
## Features
### ποΈ Whisper.cpp STT Integration
Offline speech-to-text via `whisper-cli` subprocess. Accepts base64-encoded WAV from the Android APK, decodes to a temp file, runs `ggml-base.bin`, strips noise tokens (`[BLANK_AUDIO]`, `debugfs`, `MEMPROF`), and returns clean transcript text. No cloud STT dependency.
### π§ ARIA Agent Routing
Four specialist agents dispatched by keyword matching β no LLM routing call:
| Agent | Trigger keywords | Purpose |
|---|---|---|
| **VITA** | `giving up`, `want to die`, `hopeless`, `hurt myself` β¦ | Crisis support |
| **SENTINEL** | `scam`, `bank account`, `transfer money`, `spf` β¦ | Scam detection |
| **KRONOS** | `meeting`, `calendar`, `schedule`, `tomorrow` β¦ | Calendar assistance |
| **MINA** | *(default)* | Stress / general emotional support |
### π§ Knowledge Base Integration
Reads `mina_knowledge.json` at runtime for:
- Crisis hotline numbers (SOS Lifeline, IMH) β phone + WhatsApp links
- Capability flags (`make_phone_call`, `send_whatsapp`, `check_calendar`, β¦)
Resources appended to VITA and SENTINEL replies are driven by the knowledge file, not hardcoded strings. Update the JSON to update the response β no code change needed.
### π Gap Logging & Autonomous Learning
Every time a user requests a capability MINA doesn't yet have, `log_gap()`:
1. Appends a structured entry to `gaps/gap_log.jsonl` (local, persistent)
2. POSTs to `ntfy.sh/{NTFY_TOPIC}` for real-time cloud sync
```json
{
"timestamp": "2026-05-02T14:23:01",
"gap_type": "make_phone_call",
"user_request": "can you call SOS for me",
"context": "User requested phone call to SOS",
"status": "pending"
}
```
The `NTFY_TOPIC` env var controls the notification channel (default: `roar-imda-demo`). Gap notifications appear in the ntfy app with tag `brain` for triage. Network failures are caught silently β gap is always written locally first.
### π Sovereign & Offline-First
All inference runs on-device. The only outbound network call is the optional ntfy gap sync (non-blocking, non-critical path). No user speech or transcript data leaves the device during inference.
---
## Endpoints
### `GET /health`
Liveness probe. Android APK polls this at startup every 3 s.
```json
{"status": "ok", "llama": true, "bridge": "v2"}
```
### `POST /completion`
Main inference endpoint. Accepts two input modes:
**Mode A β Pre-transcribed text** (fast path):
```json
{"transcript": "I have a meeting tomorrow morning"}
```
**Mode B β Raw WAV audio** (whisper path):
```json
{
"prompt": [{
"prompt_string": "...",
"multimodal_data": ["<base64-WAV>"]
}]
}
```
**Response**:
```json
{
"reply": "Sure lah, let me check your calendar!",
"content": "Sure lah, let me check your calendar!",
"transcript": "I have a meeting tomorrow morning",
"emotion": "neutral",
"valence": 0.50,
"arousal": 0.38,
"dominance": 0.50,
"agent": "KRONOS",
"risk": "none",
"elapsed": 1.84
}
```
---
## Configuration
| Env var | Default | Description |
|---|---|---|
| `LLAMA_URL` | `http://localhost:8080` | llama-server endpoint |
| `BRIDGE_PORT` | `8081` | Flask listen port |
| `MAX_TOKENS` | `256` | Max tokens for transcription call |
| `NTFY_TOPIC` | `roar-imda-demo` | ntfy.sh topic for gap sync |
---
## Deployment (Termux)
```bash
# Prerequisites on device
pkg install python whisper-cpp llama-cpp
# Clone and deploy
git clone https://huggingface.co/munyew/mina-bridge
cd mina-bridge
# Start bridge (watchdog via start_mina.sh)
nohup python3 bridge.py >> bridge.log 2>&1 &
# Or restart after update
pkill -f bridge.py && sleep 3 && nohup python3 bridge.py >> bridge.log 2>&1 &
```
Expected paths on Termux:
```
~/whisper.cpp/build/bin/whisper-cli
~/whisper.cpp/models/ggml-base.bin
~/meralion/meralion-3b-decoder-q8_0.gguf
~/meralion/mina_knowledge.json
~/meralion/gaps/gap_log.jsonl β auto-created
```
---
## Roadmap
| Priority | Gap | Solution |
|---|---|---|
| π΄ Critical | Emotion detection upgrade | Replace VAD lookup table with [MERaLiON-SER-v1](https://huggingface.co/MERaLiON/MERaLiON-SER-v1) |
| π High | Singlish Mental Health ASR | Fine-tune MERaLiON-2-3B on v5 dataset (3240 audio files) |
| π High | Singapore Legal Domain ASR | Generate + fine-tune on CPF/HDB/PDPA domain |
| π‘ Medium | Edge-optimised SER | Quantize MERaLiON-SER-v1 to INT8/TFLite < 200 MB |
| π‘ Medium | Code-switched Singlish-Mandarin | Pending MNSC dataset from NUS |
---
## Citation
```bibtex
@software{mina_bridge_2026,
title = {MINA Bridge: Sovereign Edge AI Gateway for Singapore},
author = {Loh, Mun Yew (Darren)},
year = {2026},
url = {https://huggingface.co/munyew/mina-bridge},
note = {Singapore AI Research β ATxSG 2026}
}
```
---
## Acknowledgements
Built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA National Multimodal LLM Programme.
Speech transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
On-device inference via [llama.cpp](https://github.com/ggerganov/llama.cpp).
|