| ---
|
| language:
|
| - en
|
| - zh
|
| - ms
|
| - ta
|
| license: apache-2.0
|
| tags:
|
| - singapore
|
| - sovereign-ai
|
| - edge-ai
|
| - meralion
|
| - singlish
|
| - agentic-innovation
|
| - android
|
| - flask
|
| - whisper
|
| - on-device
|
| pipeline_tag: text-generation
|
| library_name: custom
|
| ---
|
|
|
| # MINA Bridge v4 β Sovereign Edge AI Gateway
|
|
|
| **MINA** (My Intelligent National Assistant) is Singapore's sovereign edge AI companion, built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA.
|
|
|
| `mina-bridge` is the intelligence gateway between the MINA Android APK and the on-device MERaLiON model β a lightweight Flask server that handles speech transcription, rule-based agent routing, response generation, and autonomous gap logging, all running locally on a Termux environment with no cloud dependency for inference.
|
|
|
| ---
|
|
|
| ## Architecture
|
|
|
| ```
|
| Android APK
|
| β base64 WAV / pre-transcribed text
|
| βΌ
|
| mina-bridge (Flask :8081)
|
| βββ whisper-cli β speech-to-text (offline)
|
| βββ route_agent() β rule-based ARIA routing (no LLM call)
|
| βββ build_prompt() β agent-specific focused prompt
|
| βββ llama-server :8080 β MERaLiON-2-3B GGUF inference
|
| βββ append_resources() β hotlines from mina_knowledge.json
|
| βββ log_gap() + ntfy β autonomous cloud sync
|
| ```
|
|
|
| **Option 3 architecture**: routing is pure Python β deterministic, zero-latency, zero hallucination risk. The LLM is called exactly once per turn, only to generate the response text.
|
|
|
| ---
|
|
|
| ## Features
|
|
|
| ### ποΈ Whisper.cpp STT Integration
|
| Offline speech-to-text via `whisper-cli` subprocess. Accepts base64-encoded WAV from the Android APK, decodes to a temp file, runs `ggml-base.bin`, strips noise tokens (`[BLANK_AUDIO]`, `debugfs`, `MEMPROF`), and returns clean transcript text. No cloud STT dependency.
|
|
|
| ### π§ ARIA Agent Routing
|
| Four specialist agents dispatched by keyword matching β no LLM routing call:
|
|
|
| | Agent | Trigger keywords | Purpose |
|
| |---|---|---|
|
| | **VITA** | `giving up`, `want to die`, `hopeless`, `hurt myself` β¦ | Crisis support |
|
| | **SENTINEL** | `scam`, `bank account`, `transfer money`, `spf` β¦ | Scam detection |
|
| | **KRONOS** | `meeting`, `calendar`, `schedule`, `tomorrow` β¦ | Calendar assistance |
|
| | **MINA** | *(default)* | Stress / general emotional support |
|
|
|
| ### π§ Knowledge Base Integration
|
| Reads `mina_knowledge.json` at runtime for:
|
| - Crisis hotline numbers (SOS Lifeline, IMH) β phone + WhatsApp links
|
| - Capability flags (`make_phone_call`, `send_whatsapp`, `check_calendar`, β¦)
|
|
|
| Resources appended to VITA and SENTINEL replies are driven by the knowledge file, not hardcoded strings. Update the JSON to update the response β no code change needed.
|
|
|
| ### π Gap Logging & Autonomous Learning
|
| Every time a user requests a capability MINA doesn't yet have, `log_gap()`:
|
| 1. Appends a structured entry to `gaps/gap_log.jsonl` (local, persistent)
|
| 2. POSTs to `ntfy.sh/{NTFY_TOPIC}` for real-time cloud sync
|
|
|
| ```json
|
| {
|
| "timestamp": "2026-05-02T14:23:01",
|
| "gap_type": "make_phone_call",
|
| "user_request": "can you call SOS for me",
|
| "context": "User requested phone call to SOS",
|
| "status": "pending"
|
| }
|
| ```
|
|
|
| The `NTFY_TOPIC` env var controls the notification channel (default: `roar-imda-demo`). Gap notifications appear in the ntfy app with tag `brain` for triage. Network failures are caught silently β gap is always written locally first.
|
|
|
| ### π Sovereign & Offline-First
|
| All inference runs on-device. The only outbound network call is the optional ntfy gap sync (non-blocking, non-critical path). No user speech or transcript data leaves the device during inference.
|
|
|
| ---
|
|
|
| ## Endpoints
|
|
|
| ### `GET /health`
|
| Liveness probe. Android APK polls this at startup every 3 s.
|
| ```json
|
| {"status": "ok", "llama": true, "bridge": "v2"}
|
| ```
|
|
|
| ### `POST /completion`
|
| Main inference endpoint. Accepts two input modes:
|
|
|
| **Mode A β Pre-transcribed text** (fast path):
|
| ```json
|
| {"transcript": "I have a meeting tomorrow morning"}
|
| ```
|
|
|
| **Mode B β Raw WAV audio** (whisper path):
|
| ```json
|
| {
|
| "prompt": [{
|
| "prompt_string": "...",
|
| "multimodal_data": ["<base64-WAV>"]
|
| }]
|
| }
|
| ```
|
|
|
| **Response**:
|
| ```json
|
| {
|
| "reply": "Sure lah, let me check your calendar!",
|
| "content": "Sure lah, let me check your calendar!",
|
| "transcript": "I have a meeting tomorrow morning",
|
| "emotion": "neutral",
|
| "valence": 0.50,
|
| "arousal": 0.38,
|
| "dominance": 0.50,
|
| "agent": "KRONOS",
|
| "risk": "none",
|
| "elapsed": 1.84
|
| }
|
| ```
|
|
|
| ---
|
|
|
| ## Configuration
|
|
|
| | Env var | Default | Description |
|
| |---|---|---|
|
| | `LLAMA_URL` | `http://localhost:8080` | llama-server endpoint |
|
| | `BRIDGE_PORT` | `8081` | Flask listen port |
|
| | `MAX_TOKENS` | `256` | Max tokens for transcription call |
|
| | `NTFY_TOPIC` | `roar-imda-demo` | ntfy.sh topic for gap sync |
|
|
|
| ---
|
|
|
| ## Deployment (Termux)
|
|
|
| ```bash
|
| # Prerequisites on device
|
| pkg install python whisper-cpp llama-cpp
|
|
|
| # Clone and deploy
|
| git clone https://huggingface.co/munyew/mina-bridge
|
| cd mina-bridge
|
|
|
| # Start bridge (watchdog via start_mina.sh)
|
| nohup python3 bridge.py >> bridge.log 2>&1 &
|
|
|
| # Or restart after update
|
| pkill -f bridge.py && sleep 3 && nohup python3 bridge.py >> bridge.log 2>&1 &
|
| ```
|
|
|
| Expected paths on Termux:
|
| ```
|
| ~/whisper.cpp/build/bin/whisper-cli
|
| ~/whisper.cpp/models/ggml-base.bin
|
| ~/meralion/meralion-3b-decoder-q8_0.gguf
|
| ~/meralion/mina_knowledge.json
|
| ~/meralion/gaps/gap_log.jsonl β auto-created
|
| ```
|
|
|
| ---
|
|
|
| ## Roadmap
|
|
|
| | Priority | Gap | Solution |
|
| |---|---|---|
|
| | π΄ Critical | Emotion detection upgrade | Replace VAD lookup table with [MERaLiON-SER-v1](https://huggingface.co/MERaLiON/MERaLiON-SER-v1) |
|
| | π High | Singlish Mental Health ASR | Fine-tune MERaLiON-2-3B on v5 dataset (3240 audio files) |
|
| | π High | Singapore Legal Domain ASR | Generate + fine-tune on CPF/HDB/PDPA domain |
|
| | π‘ Medium | Edge-optimised SER | Quantize MERaLiON-SER-v1 to INT8/TFLite < 200 MB |
|
| | π‘ Medium | Code-switched Singlish-Mandarin | Pending MNSC dataset from NUS |
|
|
|
| ---
|
|
|
| ## Citation
|
|
|
| ```bibtex
|
| @software{mina_bridge_2026,
|
| title = {MINA Bridge: Sovereign Edge AI Gateway for Singapore},
|
| author = {Loh, Mun Yew (Darren)},
|
| year = {2026},
|
| url = {https://huggingface.co/munyew/mina-bridge},
|
| note = {Singapore AI Research β ATxSG 2026}
|
| }
|
| ```
|
|
|
| ---
|
|
|
| ## Acknowledgements
|
|
|
| Built on [MERaLiON-2-3B](https://huggingface.co/MERaLiON/MERaLiON-2-3B) by IMDA National Multimodal LLM Programme.
|
| Speech transcription via [whisper.cpp](https://github.com/ggerganov/whisper.cpp).
|
| On-device inference via [llama.cpp](https://github.com/ggerganov/llama.cpp).
|
|
|