Remove/replace IMDA references (keep roar-imda-demo topic)

264f56c verified 10 days ago

6.8 kB

	---
	language:
	- en
	- zh
	- ms
	- ta
	license: apache-2.0
	tags:
	- singapore
	- sovereign-ai
	- edge-ai
	- meralion
	- singlish
	- agentic-innovation
	- android
	- flask
	- whisper
	- on-device
	pipeline_tag: text-generation
	library_name: custom
	---

	# MINA Bridge v4 — Sovereign Edge AI Gateway

	MINA (My Intelligent National Assistant) is Singapore's sovereign edge AI companion, built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA.

	`mina-bridge` is the intelligence gateway between the MINA Android APK and the on-device MERaLiON model — a lightweight Flask server that handles speech transcription, rule-based agent routing, response generation, and autonomous gap logging, all running locally on a Termux environment with no cloud dependency for inference.

	---

	## Architecture

	```
	Android APK
	│ base64 WAV / pre-transcribed text
	▼
	mina-bridge (Flask :8081)
	├── whisper-cli ← speech-to-text (offline)
	├── route_agent() ← rule-based ARIA routing (no LLM call)
	├── build_prompt() ← agent-specific focused prompt
	├── llama-server :8080 ← MERaLiON-2-3B GGUF inference
	├── append_resources() ← hotlines from mina_knowledge.json
	└── log_gap() + ntfy ← autonomous cloud sync
	```

	Option 3 architecture: routing is pure Python — deterministic, zero-latency, zero hallucination risk. The LLM is called exactly once per turn, only to generate the response text.

	---

	## Features

	### 🎙️ Whisper.cpp STT Integration
	Offline speech-to-text via `whisper-cli` subprocess. Accepts base64-encoded WAV from the Android APK, decodes to a temp file, runs `ggml-base.bin`, strips noise tokens (`[BLANK_AUDIO]`, `debugfs`, `MEMPROF`), and returns clean transcript text. No cloud STT dependency.

	### 🧭 ARIA Agent Routing
	Four specialist agents dispatched by keyword matching — no LLM routing call:

	\| Agent \| Trigger keywords \| Purpose \|
	\|---\|---\|---\|
	\| VITA \| `giving up`, `want to die`, `hopeless`, `hurt myself` … \| Crisis support \|
	\| SENTINEL \| `scam`, `bank account`, `transfer money`, `spf` … \| Scam detection \|
	\| KRONOS \| `meeting`, `calendar`, `schedule`, `tomorrow` … \| Calendar assistance \|
	\| MINA \| (default) \| Stress / general emotional support \|

	### 🧠 Knowledge Base Integration
	Reads `mina_knowledge.json` at runtime for:
	- Crisis hotline numbers (SOS Lifeline, IMH) — phone + WhatsApp links
	- Capability flags (`make_phone_call`, `send_whatsapp`, `check_calendar`, …)

	Resources appended to VITA and SENTINEL replies are driven by the knowledge file, not hardcoded strings. Update the JSON to update the response — no code change needed.

	### 📋 Gap Logging & Autonomous Learning
	Every time a user requests a capability MINA doesn't yet have, `log_gap()`:
	1. Appends a structured entry to `gaps/gap_log.jsonl` (local, persistent)
	2. POSTs to `ntfy.sh/{NTFY_TOPIC}` for real-time cloud sync

	```json
	{
	"timestamp": "2026-05-02T14:23:01",
	"gap_type": "make_phone_call",
	"user_request": "can you call SOS for me",
	"context": "User requested phone call to SOS",
	"status": "pending"
	}
	```

	The `NTFY_TOPIC` env var controls the notification channel (default: `roar-imda-demo`). Gap notifications appear in the ntfy app with tag `brain` for triage. Network failures are caught silently — gap is always written locally first.

	### 🔒 Sovereign & Offline-First
	All inference runs on-device. The only outbound network call is the optional ntfy gap sync (non-blocking, non-critical path). No user speech or transcript data leaves the device during inference.

	---

	## Endpoints

	### `GET /health`
	Liveness probe. Android APK polls this at startup every 3 s.
	```json
	{"status": "ok", "llama": true, "bridge": "v2"}
	```

	### `POST /completion`
	Main inference endpoint. Accepts two input modes:

	Mode A — Pre-transcribed text (fast path):
	```json
	{"transcript": "I have a meeting tomorrow morning"}
	```

	Mode B — Raw WAV audio (whisper path):
	```json
	{
	"prompt": [{
	"prompt_string": "...",
	"multimodal_data": ["<base64-WAV>"]
	}]
	}
	```

	Response:
	```json
	{
	"reply": "Sure lah, let me check your calendar!",
	"content": "Sure lah, let me check your calendar!",
	"transcript": "I have a meeting tomorrow morning",
	"emotion": "neutral",
	"valence": 0.50,
	"arousal": 0.38,
	"dominance": 0.50,
	"agent": "KRONOS",
	"risk": "none",
	"elapsed": 1.84
	}
	```

	---

	## Configuration

	\| Env var \| Default \| Description \|
	\|---\|---\|---\|
	\| `LLAMA_URL` \| `http://localhost:8080` \| llama-server endpoint \|
	\| `BRIDGE_PORT` \| `8081` \| Flask listen port \|
	\| `MAX_TOKENS` \| `256` \| Max tokens for transcription call \|
	\| `NTFY_TOPIC` \| `roar-imda-demo` \| ntfy.sh topic for gap sync \|

	---

	## Deployment (Termux)

	```bash
	# Prerequisites on device
	pkg install python whisper-cpp llama-cpp

	# Clone and deploy
	git clone https://huggingface.co/munyew/mina-bridge
	cd mina-bridge

	# Start bridge (watchdog via start_mina.sh)
	nohup python3 bridge.py >> bridge.log 2>&1 &

	# Or restart after update
	pkill -f bridge.py && sleep 3 && nohup python3 bridge.py >> bridge.log 2>&1 &
	```

	Expected paths on Termux:
	```
	~/whisper.cpp/build/bin/whisper-cli
	~/whisper.cpp/models/ggml-base.bin
	~/meralion/meralion-3b-decoder-q8_0.gguf
	~/meralion/mina_knowledge.json
	~/meralion/gaps/gap_log.jsonl ← auto-created
	```

	---

	## Roadmap

	\| Priority \| Gap \| Solution \|
	\|---\|---\|---\|
	\| 🔴 Critical \| Emotion detection upgrade \| Replace VAD lookup table with [MERaLiON-SER-v1](https://huggingface.co/MERaLiON/MERaLiON-SER-v1) \|
	\| 🟠 High \| Singlish Mental Health ASR \| Fine-tune MERaLiON-2-3B on v5 dataset (3240 audio files) \|
	\| 🟠 High \| Singapore Legal Domain ASR \| Generate + fine-tune on CPF/HDB/PDPA domain \|
	\| 🟡 Medium \| Edge-optimised SER \| Quantize MERaLiON-SER-v1 to INT8/TFLite < 200 MB \|
	\| 🟡 Medium \| Code-switched Singlish-Mandarin \| Pending MNSC dataset from NUS \|

	---

	## Citation

	```bibtex
	@software{mina_bridge_2026,
	title = {MINA Bridge: Sovereign Edge AI Gateway for Singapore},
	author = {Loh, Mun Yew (Darren)},
	year = {2026},
	url = {https://huggingface.co/munyew/mina-bridge},
	note = {Singapore AI Research — ATxSG 2026}
	}
	```

	---

	## Acknowledgements

	Built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA National Multimodal LLM Programme.
	Speech transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
	On-device inference via [llama.cpp](https://github.com/ggerganov/llama.cpp).