Spaces:

build-small-hackathon
/

android-skill-router

Sleeping

App Files Files Community

android-skill-router / README.md

kriyanshi

Add Related repos section linking Pocket Automator and android-dataset.

e52eee4 19 days ago

preview code

Raw

History Blame Contribute Delete

9.13 kB

	---
	title: Android Skill Router
	emoji: 📱
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: "5.34.2"
	python_version: '3.13'
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: Natural language → Android automation skill → UI trajectory
	tags:
	- build-small-hackathon
	- track:backyard
	- track:wood
	- sponsor:modal
	- achievement:offbrand
	- achievement:fieldnotes
	---

	# Android Skill Router

	Build Small Hackathon — Backyard AI track · Modal sponsor

	You say "text mom on whatsapp i'm on my way" — a voice assistant might web-search or shrug. Android Skill Router closes that gap with a 3B-parameter intent classifier that maps messy phone language to structured `{skill, parameters}` JSON, then loads a pre-recorded UI trajectory captured on a real Android device. It is the classifier layer of the [Pocket Automator](https://github.com/kriyanshii/pocket-automator) stack: record a flow once on your phone, route to it forever with a tiny model.

	```
	"play my workout playlist" → spotify_play_playlist → trajectories/spotify_play_playlist.json
	```

	Tech: fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) via 4-bit QLoRA + SFT ([Unsloth](https://github.com/unslothai/unsloth) on Modal) → skill router → parameterized trajectory → Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.

	```
	Modal /predict (or pasted JSON) → parameter dialog → ParameterBinder → replay → device taps
	```

	Submission links

	- Blog post: [Hugging Face Blog — Android Skill Router](https://huggingface.co/blog/build-small-hackathon/android-skill-router)
	- Demo video: [YouTube Short](https://youtube.com/shorts/IQRHf7HfTDA)
	- Social post: [Twitter/X](https://x.com/kriyanshii/status/2066587828839141634)
	- Live Space: [android-skill-router](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)
	- Android recorder: [Pocket Automator](https://github.com/kriyanshii/pocket-automator)

	## Related repos

	\| Repo \| Role \|
	\| --- \| --- \|
	\| [Pocket Automator](https://github.com/kriyanshii/pocket-automator) \| Android recorder, parameter dialog, `ParameterBinder`, and on-device replay \|
	\| [android-dataset](https://github.com/kriyanshii/android-dataset) \| Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) \|
	\| [Live Space](https://huggingface.co/spaces/build-small-hackathon/android-skill-router) \| Hosted demo — natural language in, skill + parameterized trajectory out \|
	\| [Blog post](https://huggingface.co/blog/build-small-hackathon/android-skill-router) \| Full write-up of the classify → bind → replay architecture \|

	## Hackathon tags

	\| Tag \| Why \|
	\| --- \| --- \|
	\| `track:backyard` \| Personal automation on hardware you own \|
	\| `sponsor:modal` \| Training, evaluation, and inference on Modal \|
	\| `achievement:tinytitan` \| Full stack on Qwen2.5-3B (≤4B params) \|
	\| `achievement:agent` \| Classify → route → load multi-step UI plan \|

	## Recording trajectories

	UI traces in `trajectories/` were captured with [Pocket Automator](https://github.com/kriyanshii/pocket-automator) — an Android accessibility recorder that exports JSON for training and replay. Record a flow on device → export → map to a skill via `scripts/generate_skill_dataset.py`.

	## Tech stack

	\| Piece \| What \|
	\| --- \| --- \|
	\| Base model \| [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) \|
	\| Fine-tune \| 4-bit QLoRA + SFT with [Unsloth](https://github.com/unslothai/unsloth) on Modal (`modal_apps/train_modal.py`) \|
	\| Inference \| Modal GPU API (`modal_apps/predict_api.py`) — returns skill + parameters \|
	\| Parameter binding \| `src/parameter_binder.py` + `data/skill_schemas.json` bindings — substitutes runtime values into trajectory steps \|
	\| Demo UI \| Gradio (`app.py`) — shows parameterized trajectory preview \|
	\| Recorder / replay \| [Pocket Automator](https://github.com/kriyanshii/pocket-automator) — accessibility capture, parameter dialog, `ParameterBinder`, replay \|
	\| Data \| 15 Android trajectories → `data/skills.jsonl` → ~510 prompt variations in `data/train.jsonl` \|

	## Quick start (local dev)

	```bash
	# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
	pip install modal
	modal setup
	python scripts/generate_intent_dataset.py
	modal run modal_apps/train_modal.py --dataset train_intent.jsonl

	# 2. Deploy inference API
	modal deploy modal_apps/predict_api.py
	# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run

	# 3. Run the Gradio demo
	pip install -r requirements.txt
	export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
	python app.py
	```

	The `/predict` endpoint returns structured intents:

	```json
	{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}
	```

	### Hugging Face Space setup

	1. Create a Gradio Space inside the [build-small-hackathon](https://huggingface.co/build-small-hackathon) org.
	2. Upload this repo (exclude `trained_model/` — inference stays on Modal).
	3. Add a Space secret: `MODAL_PREDICT_URL` = your deployed Modal `/predict` base URL.
	4. Link the demo video and social post in the README (see Submission links above).

	## Project layout

	```
	app.py # Gradio demo (hackathon submission UI)
	requirements.txt # Space dependencies
	data/
	train.jsonl # SFT training data (~510 examples)
	eval_prompts.json # 50 held-out evaluation prompts
	skills.jsonl # Canonical skill ↔ task mapping
	src/
	skill_router.py # Skill name → trajectory JSON
	parameter_binder.py # Runtime parameter → trajectory step substitution
	skill_utils.py # Shared JSON parsing helpers
	evaluate.py # Local CPU/MPS evaluation
	modal_apps/ # Modal training + inference (not named "modal" — avoids import clash)
	train_modal.py
	predict_api.py
	infer_modal.py
	evaluate_modal.py
	run_modal.py
	requirements-modal.txt
	scripts/
	generate_skill_dataset.py
	generate_training_data.py
	train.py # Local GPU training (optional)
	trajectories/ # Pocket Automator exports (Android UI automation traces)
	trained_model/ # Local model weights (gitignored)
	```

	## Evaluation

	```bash
	# On Modal GPU
	modal run modal_apps/evaluate_modal.py

	# Locally (needs adapter in trained_model/adapter or merged weights)
	python -m src.evaluate
	```

	## Regenerating data

	```bash
	python scripts/generate_skill_dataset.py # trajectories → data/skills.jsonl
	python scripts/generate_training_data.py # data/skills.jsonl → data/train.jsonl
	```

	## V2: Intent extraction

	V1 maps prompts to a skill label only. V2 extracts structured intents:

	"text mom on whatsapp i'm on my way"
	→ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}

	The Gradio demo and Modal `/predict` API both return skill + parameters.

	### Parameterized replay

	V2 extracts `{skill, parameters}` at inference time. Slot-filling at replay substitutes those values into recorded `set_text` / post-search click steps before replay.

	End-to-end flow (validated on WhatsApp):

	```
	"text mom on whatsapp i'm on my way"
	→ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
	→ ParameterBinder (Gradio preview + Pocket Automator on device)
	→ replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"
	```

	Bindings live in `data/skill_schemas.json` per skill. Supported in preview today: WhatsApp, Gmail, YouTube. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.

	```bash
	python -m src.parameter_binder # self-test bindings
	```

	### Data
	- `data/skill_schemas.json` — parameter definitions and trajectory bindings per skill
	- `data/train_intent.jsonl` — ~15k synthetic SFT examples (generated locally via script; gitignored — upload to Modal for training)
	- `data/eval_intent_prompts.json` — held-out intent eval set
	- `data/pocket_benchmark_prompts.json` — 200 real-world messy prompts

	### Train & evaluate
	```bash
	python scripts/generate_intent_dataset.py
	modal run modal_apps/train_modal.py --dataset train_intent.jsonl
	modal run modal_apps/evaluate_intent_modal.py
	modal run modal_apps/evaluate_pocket_benchmark_modal.py
	```

	### Benchmark results (Pocket Automator, 200 prompts)

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Skill accuracy \| 99.0% \|
	\| Parameter accuracy \| 86.0% \|
	\| Exact JSON match \| 85.5% \|

	Next: self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.

	## License

	Apache 2.0. Base model weights subject to [Qwen license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct).