mina-bridge / README.md
munyew's picture
Remove/replace IMDA references (keep roar-imda-demo topic)
264f56c verified
---
language:
- en
- zh
- ms
- ta
license: apache-2.0
tags:
- singapore
- sovereign-ai
- edge-ai
- meralion
- singlish
- agentic-innovation
- android
- flask
- whisper
- on-device
pipeline_tag: text-generation
library_name: custom
---
# MINA Bridge v4 β€” Sovereign Edge AI Gateway
**MINA** (My Intelligent National Assistant) is Singapore's sovereign edge AI companion, built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA.
`mina-bridge` is the intelligence gateway between the MINA Android APK and the on-device MERaLiON model β€” a lightweight Flask server that handles speech transcription, rule-based agent routing, response generation, and autonomous gap logging, all running locally on a Termux environment with no cloud dependency for inference.
---
## Architecture
```
Android APK
β”‚ base64 WAV / pre-transcribed text
β–Ό
mina-bridge (Flask :8081)
β”œβ”€β”€ whisper-cli ← speech-to-text (offline)
β”œβ”€β”€ route_agent() ← rule-based ARIA routing (no LLM call)
β”œβ”€β”€ build_prompt() ← agent-specific focused prompt
β”œβ”€β”€ llama-server :8080 ← MERaLiON-2-3B GGUF inference
β”œβ”€β”€ append_resources() ← hotlines from mina_knowledge.json
└── log_gap() + ntfy ← autonomous cloud sync
```
**Option 3 architecture**: routing is pure Python β€” deterministic, zero-latency, zero hallucination risk. The LLM is called exactly once per turn, only to generate the response text.
---
## Features
### πŸŽ™οΈ Whisper.cpp STT Integration
Offline speech-to-text via `whisper-cli` subprocess. Accepts base64-encoded WAV from the Android APK, decodes to a temp file, runs `ggml-base.bin`, strips noise tokens (`[BLANK_AUDIO]`, `debugfs`, `MEMPROF`), and returns clean transcript text. No cloud STT dependency.
### 🧭 ARIA Agent Routing
Four specialist agents dispatched by keyword matching β€” no LLM routing call:
| Agent | Trigger keywords | Purpose |
|---|---|---|
| **VITA** | `giving up`, `want to die`, `hopeless`, `hurt myself` … | Crisis support |
| **SENTINEL** | `scam`, `bank account`, `transfer money`, `spf` … | Scam detection |
| **KRONOS** | `meeting`, `calendar`, `schedule`, `tomorrow` … | Calendar assistance |
| **MINA** | *(default)* | Stress / general emotional support |
### 🧠 Knowledge Base Integration
Reads `mina_knowledge.json` at runtime for:
- Crisis hotline numbers (SOS Lifeline, IMH) β€” phone + WhatsApp links
- Capability flags (`make_phone_call`, `send_whatsapp`, `check_calendar`, …)
Resources appended to VITA and SENTINEL replies are driven by the knowledge file, not hardcoded strings. Update the JSON to update the response β€” no code change needed.
### πŸ“‹ Gap Logging & Autonomous Learning
Every time a user requests a capability MINA doesn't yet have, `log_gap()`:
1. Appends a structured entry to `gaps/gap_log.jsonl` (local, persistent)
2. POSTs to `ntfy.sh/{NTFY_TOPIC}` for real-time cloud sync
```json
{
"timestamp": "2026-05-02T14:23:01",
"gap_type": "make_phone_call",
"user_request": "can you call SOS for me",
"context": "User requested phone call to SOS",
"status": "pending"
}
```
The `NTFY_TOPIC` env var controls the notification channel (default: `roar-imda-demo`). Gap notifications appear in the ntfy app with tag `brain` for triage. Network failures are caught silently β€” gap is always written locally first.
### πŸ”’ Sovereign & Offline-First
All inference runs on-device. The only outbound network call is the optional ntfy gap sync (non-blocking, non-critical path). No user speech or transcript data leaves the device during inference.
---
## Endpoints
### `GET /health`
Liveness probe. Android APK polls this at startup every 3 s.
```json
{"status": "ok", "llama": true, "bridge": "v2"}
```
### `POST /completion`
Main inference endpoint. Accepts two input modes:
**Mode A β€” Pre-transcribed text** (fast path):
```json
{"transcript": "I have a meeting tomorrow morning"}
```
**Mode B β€” Raw WAV audio** (whisper path):
```json
{
"prompt": [{
"prompt_string": "...",
"multimodal_data": ["<base64-WAV>"]
}]
}
```
**Response**:
```json
{
"reply": "Sure lah, let me check your calendar!",
"content": "Sure lah, let me check your calendar!",
"transcript": "I have a meeting tomorrow morning",
"emotion": "neutral",
"valence": 0.50,
"arousal": 0.38,
"dominance": 0.50,
"agent": "KRONOS",
"risk": "none",
"elapsed": 1.84
}
```
---
## Configuration
| Env var | Default | Description |
|---|---|---|
| `LLAMA_URL` | `http://localhost:8080` | llama-server endpoint |
| `BRIDGE_PORT` | `8081` | Flask listen port |
| `MAX_TOKENS` | `256` | Max tokens for transcription call |
| `NTFY_TOPIC` | `roar-imda-demo` | ntfy.sh topic for gap sync |
---
## Deployment (Termux)
```bash
# Prerequisites on device
pkg install python whisper-cpp llama-cpp
# Clone and deploy
git clone https://huggingface.co/munyew/mina-bridge
cd mina-bridge
# Start bridge (watchdog via start_mina.sh)
nohup python3 bridge.py >> bridge.log 2>&1 &
# Or restart after update
pkill -f bridge.py && sleep 3 && nohup python3 bridge.py >> bridge.log 2>&1 &
```
Expected paths on Termux:
```
~/whisper.cpp/build/bin/whisper-cli
~/whisper.cpp/models/ggml-base.bin
~/meralion/meralion-3b-decoder-q8_0.gguf
~/meralion/mina_knowledge.json
~/meralion/gaps/gap_log.jsonl ← auto-created
```
---
## Roadmap
| Priority | Gap | Solution |
|---|---|---|
| πŸ”΄ Critical | Emotion detection upgrade | Replace VAD lookup table with [MERaLiON-SER-v1](https://huggingface.co/MERaLiON/MERaLiON-SER-v1) |
| 🟠 High | Singlish Mental Health ASR | Fine-tune MERaLiON-2-3B on v5 dataset (3240 audio files) |
| 🟠 High | Singapore Legal Domain ASR | Generate + fine-tune on CPF/HDB/PDPA domain |
| 🟑 Medium | Edge-optimised SER | Quantize MERaLiON-SER-v1 to INT8/TFLite < 200 MB |
| 🟑 Medium | Code-switched Singlish-Mandarin | Pending MNSC dataset from NUS |
---
## Citation
```bibtex
@software{mina_bridge_2026,
title = {MINA Bridge: Sovereign Edge AI Gateway for Singapore},
author = {Loh, Mun Yew (Darren)},
year = {2026},
url = {https://huggingface.co/munyew/mina-bridge},
note = {Singapore AI Research β€” ATxSG 2026}
}
```
---
## Acknowledgements
Built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA National Multimodal LLM Programme.
Speech transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
On-device inference via [llama.cpp](https://github.com/ggerganov/llama.cpp).