| --- |
| title: Android Skill Router |
| emoji: π± |
| colorFrom: indigo |
| colorTo: blue |
| sdk: gradio |
| sdk_version: "5.34.2" |
| python_version: '3.13' |
| app_file: app.py |
| pinned: false |
| license: apache-2.0 |
| short_description: Natural language β Android automation skill β UI trajectory |
| tags: |
| - build-small-hackathon |
| - track:backyard |
| - track:wood |
| - sponsor:modal |
| - achievement:offbrand |
| - achievement:fieldnotes |
| --- |
| |
| # Android Skill Router |
|
|
| **Build Small Hackathon β Backyard AI track Β· Modal sponsor** |
|
|
| You say *"text mom on whatsapp i'm on my way"* β a voice assistant might web-search or shrug. Android Skill Router closes that gap with a **3B-parameter intent classifier** that maps messy phone language to structured `{skill, parameters}` JSON, then loads a **pre-recorded UI trajectory** captured on a real Android device. It is the classifier layer of the **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** stack: record a flow once on your phone, route to it forever with a tiny model. |
|
|
| ``` |
| "play my workout playlist" β spotify_play_playlist β trajectories/spotify_play_playlist.json |
| ``` |
|
|
| **Tech:** fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) via 4-bit QLoRA + SFT ([Unsloth](https://github.com/unslothai/unsloth) on Modal) β skill router β parameterized trajectory β Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio. |
|
|
| ``` |
| Modal /predict (or pasted JSON) β parameter dialog β ParameterBinder β replay β device taps |
| ``` |
|
|
| **Submission links** |
|
|
| - **Blog post:** [Hugging Face Blog β Android Skill Router](https://huggingface.co/blog/build-small-hackathon/android-skill-router) |
| - **Demo video:** [YouTube Short](https://youtube.com/shorts/IQRHf7HfTDA) |
| - **Social post:** [Twitter/X](https://x.com/kriyanshii/status/2066587828839141634) |
| - **Live Space:** [android-skill-router](https://huggingface.co/spaces/build-small-hackathon/android-skill-router) |
| - **Android recorder:** [Pocket Automator](https://github.com/kriyanshii/pocket-automator) |
|
|
| ## Related repos |
|
|
| | Repo | Role | |
| | --- | --- | |
| | **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** | Android recorder, parameter dialog, `ParameterBinder`, and on-device replay | |
| | **[android-dataset](https://github.com/kriyanshii/android-dataset)** | Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) | |
| | **[Live Space](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)** | Hosted demo β natural language in, skill + parameterized trajectory out | |
| | **[Blog post](https://huggingface.co/blog/build-small-hackathon/android-skill-router)** | Full write-up of the classify β bind β replay architecture | |
|
|
| ## Hackathon tags |
|
|
| | Tag | Why | |
| | --- | --- | |
| | `track:backyard` | Personal automation on hardware you own | |
| | `sponsor:modal` | Training, evaluation, and inference on Modal | |
| | `achievement:tinytitan` | Full stack on Qwen2.5-3B (β€4B params) | |
| | `achievement:agent` | Classify β route β load multi-step UI plan | |
|
|
| ## Recording trajectories |
|
|
| UI traces in `trajectories/` were captured with **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** β an Android accessibility recorder that exports JSON for training and replay. Record a flow on device β export β map to a skill via `scripts/generate_skill_dataset.py`. |
|
|
| ## Tech stack |
|
|
| | Piece | What | |
| | --- | --- | |
| | **Base model** | [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) | |
| | **Fine-tune** | 4-bit QLoRA + SFT with [Unsloth](https://github.com/unslothai/unsloth) on Modal (`modal_apps/train_modal.py`) | |
| | **Inference** | Modal GPU API (`modal_apps/predict_api.py`) β returns skill + parameters | |
| | **Parameter binding** | `src/parameter_binder.py` + `data/skill_schemas.json` bindings β substitutes runtime values into trajectory steps | |
| | **Demo UI** | Gradio (`app.py`) β shows parameterized trajectory preview | |
| | **Recorder / replay** | [Pocket Automator](https://github.com/kriyanshii/pocket-automator) β accessibility capture, parameter dialog, `ParameterBinder`, replay | |
| | **Data** | 15 Android trajectories β `data/skills.jsonl` β ~510 prompt variations in `data/train.jsonl` | |
|
|
| ## Quick start (local dev) |
|
|
| ```bash |
| # 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume) |
| pip install modal |
| modal setup |
| python scripts/generate_intent_dataset.py |
| modal run modal_apps/train_modal.py --dataset train_intent.jsonl |
| |
| # 2. Deploy inference API |
| modal deploy modal_apps/predict_api.py |
| # Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run |
| |
| # 3. Run the Gradio demo |
| pip install -r requirements.txt |
| export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run" |
| python app.py |
| ``` |
|
|
| The `/predict` endpoint returns structured intents: |
|
|
| ```json |
| {"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}} |
| ``` |
|
|
| ### Hugging Face Space setup |
|
|
| 1. Create a **Gradio Space** inside the [build-small-hackathon](https://huggingface.co/build-small-hackathon) org. |
| 2. Upload this repo (exclude `trained_model/` β inference stays on Modal). |
| 3. Add a Space secret: `MODAL_PREDICT_URL` = your deployed Modal `/predict` base URL. |
| 4. Link the demo video and social post in the README (see **Submission links** above). |
|
|
| ## Project layout |
|
|
| ``` |
| app.py # Gradio demo (hackathon submission UI) |
| requirements.txt # Space dependencies |
| data/ |
| train.jsonl # SFT training data (~510 examples) |
| eval_prompts.json # 50 held-out evaluation prompts |
| skills.jsonl # Canonical skill β task mapping |
| src/ |
| skill_router.py # Skill name β trajectory JSON |
| parameter_binder.py # Runtime parameter β trajectory step substitution |
| skill_utils.py # Shared JSON parsing helpers |
| evaluate.py # Local CPU/MPS evaluation |
| modal_apps/ # Modal training + inference (not named "modal" β avoids import clash) |
| train_modal.py |
| predict_api.py |
| infer_modal.py |
| evaluate_modal.py |
| run_modal.py |
| requirements-modal.txt |
| scripts/ |
| generate_skill_dataset.py |
| generate_training_data.py |
| train.py # Local GPU training (optional) |
| trajectories/ # Pocket Automator exports (Android UI automation traces) |
| trained_model/ # Local model weights (gitignored) |
| ``` |
|
|
| ## Evaluation |
|
|
| ```bash |
| # On Modal GPU |
| modal run modal_apps/evaluate_modal.py |
| |
| # Locally (needs adapter in trained_model/adapter or merged weights) |
| python -m src.evaluate |
| ``` |
|
|
| ## Regenerating data |
|
|
| ```bash |
| python scripts/generate_skill_dataset.py # trajectories β data/skills.jsonl |
| python scripts/generate_training_data.py # data/skills.jsonl β data/train.jsonl |
| ``` |
|
|
| ## V2: Intent extraction |
|
|
| V1 maps prompts to a skill label only. V2 extracts structured intents: |
|
|
| "text mom on whatsapp i'm on my way" |
| β {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}} |
| |
| The Gradio demo and Modal `/predict` API both return skill + parameters. |
|
|
| ### Parameterized replay |
|
|
| V2 extracts `{skill, parameters}` at inference time. **Slot-filling at replay** substitutes those values into recorded `set_text` / post-search click steps before replay. |
|
|
| **End-to-end flow (validated on WhatsApp):** |
|
|
| ``` |
| "text mom on whatsapp i'm on my way" |
| β {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}} |
| β ParameterBinder (Gradio preview + Pocket Automator on device) |
| β replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi" |
| ``` |
|
|
| Bindings live in `data/skill_schemas.json` per skill. Supported in preview today: **WhatsApp**, **Gmail**, **YouTube**. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog. |
|
|
| ```bash |
| python -m src.parameter_binder # self-test bindings |
| ``` |
|
|
| ### Data |
| - `data/skill_schemas.json` β parameter definitions and trajectory bindings per skill |
| - `data/train_intent.jsonl` β ~15k synthetic SFT examples (generated locally via script; gitignored β upload to Modal for training) |
| - `data/eval_intent_prompts.json` β held-out intent eval set |
| - `data/pocket_benchmark_prompts.json` β 200 real-world messy prompts |
|
|
| ### Train & evaluate |
| ```bash |
| python scripts/generate_intent_dataset.py |
| modal run modal_apps/train_modal.py --dataset train_intent.jsonl |
| modal run modal_apps/evaluate_intent_modal.py |
| modal run modal_apps/evaluate_pocket_benchmark_modal.py |
| ``` |
|
|
| ### Benchmark results (Pocket Automator, 200 prompts) |
|
|
| | Metric | Score | |
| |--------|-------| |
| | Skill accuracy | 99.0% | |
| | Parameter accuracy | 86.0% | |
| | Exact JSON match | 85.5% | |
|
|
| **Next:** self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills. |
|
|
| ## License |
|
|
| Apache 2.0. Base model weights subject to [Qwen license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct). |
|
|