--- title: Android Skill Router emoji: ๐Ÿ“ฑ colorFrom: indigo colorTo: blue sdk: gradio sdk_version: "5.34.2" python_version: '3.13' app_file: app.py pinned: false license: apache-2.0 short_description: Natural language โ†’ Android automation skill โ†’ UI trajectory tags: - build-small-hackathon - track:backyard - track:wood - sponsor:modal - achievement:offbrand - achievement:fieldnotes --- # Android Skill Router **Build Small Hackathon โ€” Backyard AI track ยท Modal sponsor** You say *"text mom on whatsapp i'm on my way"* โ€” a voice assistant might web-search or shrug. Android Skill Router closes that gap with a **3B-parameter intent classifier** that maps messy phone language to structured `{skill, parameters}` JSON, then loads a **pre-recorded UI trajectory** captured on a real Android device. It is the classifier layer of the **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** stack: record a flow once on your phone, route to it forever with a tiny model. ``` "play my workout playlist" โ†’ spotify_play_playlist โ†’ trajectories/spotify_play_playlist.json ``` **Tech:** fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) via 4-bit QLoRA + SFT ([Unsloth](https://github.com/unslothai/unsloth) on Modal) โ†’ skill router โ†’ parameterized trajectory โ†’ Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio. ``` Modal /predict (or pasted JSON) โ†’ parameter dialog โ†’ ParameterBinder โ†’ replay โ†’ device taps ``` **Submission links** - **Blog post:** [Hugging Face Blog โ€” Android Skill Router](https://huggingface.co/blog/build-small-hackathon/android-skill-router) - **Demo video:** [YouTube Short](https://youtube.com/shorts/IQRHf7HfTDA) - **Social post:** [Twitter/X](https://x.com/kriyanshii/status/2066587828839141634) - **Live Space:** [android-skill-router](https://huggingface.co/spaces/build-small-hackathon/android-skill-router) - **Android recorder:** [Pocket Automator](https://github.com/kriyanshii/pocket-automator) ## Related repos | Repo | Role | | --- | --- | | **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** | Android recorder, parameter dialog, `ParameterBinder`, and on-device replay | | **[android-dataset](https://github.com/kriyanshii/android-dataset)** | Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) | | **[Live Space](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)** | Hosted demo โ€” natural language in, skill + parameterized trajectory out | | **[Blog post](https://huggingface.co/blog/build-small-hackathon/android-skill-router)** | Full write-up of the classify โ†’ bind โ†’ replay architecture | ## Hackathon tags | Tag | Why | | --- | --- | | `track:backyard` | Personal automation on hardware you own | | `sponsor:modal` | Training, evaluation, and inference on Modal | | `achievement:tinytitan` | Full stack on Qwen2.5-3B (โ‰ค4B params) | | `achievement:agent` | Classify โ†’ route โ†’ load multi-step UI plan | ## Recording trajectories UI traces in `trajectories/` were captured with **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** โ€” an Android accessibility recorder that exports JSON for training and replay. Record a flow on device โ†’ export โ†’ map to a skill via `scripts/generate_skill_dataset.py`. ## Tech stack | Piece | What | | --- | --- | | **Base model** | [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) | | **Fine-tune** | 4-bit QLoRA + SFT with [Unsloth](https://github.com/unslothai/unsloth) on Modal (`modal_apps/train_modal.py`) | | **Inference** | Modal GPU API (`modal_apps/predict_api.py`) โ€” returns skill + parameters | | **Parameter binding** | `src/parameter_binder.py` + `data/skill_schemas.json` bindings โ€” substitutes runtime values into trajectory steps | | **Demo UI** | Gradio (`app.py`) โ€” shows parameterized trajectory preview | | **Recorder / replay** | [Pocket Automator](https://github.com/kriyanshii/pocket-automator) โ€” accessibility capture, parameter dialog, `ParameterBinder`, replay | | **Data** | 15 Android trajectories โ†’ `data/skills.jsonl` โ†’ ~510 prompt variations in `data/train.jsonl` | ## Quick start (local dev) ```bash # 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume) pip install modal modal setup python scripts/generate_intent_dataset.py modal run modal_apps/train_modal.py --dataset train_intent.jsonl # 2. Deploy inference API modal deploy modal_apps/predict_api.py # Copy the printed URL, e.g. https://--android-skill-predict-api-skillpredictor-web.modal.run # 3. Run the Gradio demo pip install -r requirements.txt export MODAL_PREDICT_URL="https://--android-skill-predict-api-skillpredictor-web.modal.run" python app.py ``` The `/predict` endpoint returns structured intents: ```json {"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}} ``` ### Hugging Face Space setup 1. Create a **Gradio Space** inside the [build-small-hackathon](https://huggingface.co/build-small-hackathon) org. 2. Upload this repo (exclude `trained_model/` โ€” inference stays on Modal). 3. Add a Space secret: `MODAL_PREDICT_URL` = your deployed Modal `/predict` base URL. 4. Link the demo video and social post in the README (see **Submission links** above). ## Project layout ``` app.py # Gradio demo (hackathon submission UI) requirements.txt # Space dependencies data/ train.jsonl # SFT training data (~510 examples) eval_prompts.json # 50 held-out evaluation prompts skills.jsonl # Canonical skill โ†” task mapping src/ skill_router.py # Skill name โ†’ trajectory JSON parameter_binder.py # Runtime parameter โ†’ trajectory step substitution skill_utils.py # Shared JSON parsing helpers evaluate.py # Local CPU/MPS evaluation modal_apps/ # Modal training + inference (not named "modal" โ€” avoids import clash) train_modal.py predict_api.py infer_modal.py evaluate_modal.py run_modal.py requirements-modal.txt scripts/ generate_skill_dataset.py generate_training_data.py train.py # Local GPU training (optional) trajectories/ # Pocket Automator exports (Android UI automation traces) trained_model/ # Local model weights (gitignored) ``` ## Evaluation ```bash # On Modal GPU modal run modal_apps/evaluate_modal.py # Locally (needs adapter in trained_model/adapter or merged weights) python -m src.evaluate ``` ## Regenerating data ```bash python scripts/generate_skill_dataset.py # trajectories โ†’ data/skills.jsonl python scripts/generate_training_data.py # data/skills.jsonl โ†’ data/train.jsonl ``` ## V2: Intent extraction V1 maps prompts to a skill label only. V2 extracts structured intents: "text mom on whatsapp i'm on my way" โ†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}} The Gradio demo and Modal `/predict` API both return skill + parameters. ### Parameterized replay V2 extracts `{skill, parameters}` at inference time. **Slot-filling at replay** substitutes those values into recorded `set_text` / post-search click steps before replay. **End-to-end flow (validated on WhatsApp):** ``` "text mom on whatsapp i'm on my way" โ†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}} โ†’ ParameterBinder (Gradio preview + Pocket Automator on device) โ†’ replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi" ``` Bindings live in `data/skill_schemas.json` per skill. Supported in preview today: **WhatsApp**, **Gmail**, **YouTube**. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog. ```bash python -m src.parameter_binder # self-test bindings ``` ### Data - `data/skill_schemas.json` โ€” parameter definitions and trajectory bindings per skill - `data/train_intent.jsonl` โ€” ~15k synthetic SFT examples (generated locally via script; gitignored โ€” upload to Modal for training) - `data/eval_intent_prompts.json` โ€” held-out intent eval set - `data/pocket_benchmark_prompts.json` โ€” 200 real-world messy prompts ### Train & evaluate ```bash python scripts/generate_intent_dataset.py modal run modal_apps/train_modal.py --dataset train_intent.jsonl modal run modal_apps/evaluate_intent_modal.py modal run modal_apps/evaluate_pocket_benchmark_modal.py ``` ### Benchmark results (Pocket Automator, 200 prompts) | Metric | Score | |--------|-------| | Skill accuracy | 99.0% | | Parameter accuracy | 86.0% | | Exact JSON match | 85.5% | **Next:** self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills. ## License Apache 2.0. Base model weights subject to [Qwen license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct).