kriyanshi's picture
Add Related repos section linking Pocket Automator and android-dataset.
e52eee4
|
Raw
History Blame Contribute Delete
9.13 kB
---
title: Android Skill Router
emoji: πŸ“±
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "5.34.2"
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: Natural language β†’ Android automation skill β†’ UI trajectory
tags:
- build-small-hackathon
- track:backyard
- track:wood
- sponsor:modal
- achievement:offbrand
- achievement:fieldnotes
---
# Android Skill Router
**Build Small Hackathon β€” Backyard AI track Β· Modal sponsor**
You say *"text mom on whatsapp i'm on my way"* β€” a voice assistant might web-search or shrug. Android Skill Router closes that gap with a **3B-parameter intent classifier** that maps messy phone language to structured `{skill, parameters}` JSON, then loads a **pre-recorded UI trajectory** captured on a real Android device. It is the classifier layer of the **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** stack: record a flow once on your phone, route to it forever with a tiny model.
```
"play my workout playlist" β†’ spotify_play_playlist β†’ trajectories/spotify_play_playlist.json
```
**Tech:** fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) via 4-bit QLoRA + SFT ([Unsloth](https://github.com/unslothai/unsloth) on Modal) β†’ skill router β†’ parameterized trajectory β†’ Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.
```
Modal /predict (or pasted JSON) β†’ parameter dialog β†’ ParameterBinder β†’ replay β†’ device taps
```
**Submission links**
- **Blog post:** [Hugging Face Blog β€” Android Skill Router](https://huggingface.co/blog/build-small-hackathon/android-skill-router)
- **Demo video:** [YouTube Short](https://youtube.com/shorts/IQRHf7HfTDA)
- **Social post:** [Twitter/X](https://x.com/kriyanshii/status/2066587828839141634)
- **Live Space:** [android-skill-router](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)
- **Android recorder:** [Pocket Automator](https://github.com/kriyanshii/pocket-automator)
## Related repos
| Repo | Role |
| --- | --- |
| **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** | Android recorder, parameter dialog, `ParameterBinder`, and on-device replay |
| **[android-dataset](https://github.com/kriyanshii/android-dataset)** | Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) |
| **[Live Space](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)** | Hosted demo β€” natural language in, skill + parameterized trajectory out |
| **[Blog post](https://huggingface.co/blog/build-small-hackathon/android-skill-router)** | Full write-up of the classify β†’ bind β†’ replay architecture |
## Hackathon tags
| Tag | Why |
| --- | --- |
| `track:backyard` | Personal automation on hardware you own |
| `sponsor:modal` | Training, evaluation, and inference on Modal |
| `achievement:tinytitan` | Full stack on Qwen2.5-3B (≀4B params) |
| `achievement:agent` | Classify β†’ route β†’ load multi-step UI plan |
## Recording trajectories
UI traces in `trajectories/` were captured with **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** β€” an Android accessibility recorder that exports JSON for training and replay. Record a flow on device β†’ export β†’ map to a skill via `scripts/generate_skill_dataset.py`.
## Tech stack
| Piece | What |
| --- | --- |
| **Base model** | [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) |
| **Fine-tune** | 4-bit QLoRA + SFT with [Unsloth](https://github.com/unslothai/unsloth) on Modal (`modal_apps/train_modal.py`) |
| **Inference** | Modal GPU API (`modal_apps/predict_api.py`) β€” returns skill + parameters |
| **Parameter binding** | `src/parameter_binder.py` + `data/skill_schemas.json` bindings β€” substitutes runtime values into trajectory steps |
| **Demo UI** | Gradio (`app.py`) β€” shows parameterized trajectory preview |
| **Recorder / replay** | [Pocket Automator](https://github.com/kriyanshii/pocket-automator) β€” accessibility capture, parameter dialog, `ParameterBinder`, replay |
| **Data** | 15 Android trajectories β†’ `data/skills.jsonl` β†’ ~510 prompt variations in `data/train.jsonl` |
## Quick start (local dev)
```bash
# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
pip install modal
modal setup
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
# 2. Deploy inference API
modal deploy modal_apps/predict_api.py
# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run
# 3. Run the Gradio demo
pip install -r requirements.txt
export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
python app.py
```
The `/predict` endpoint returns structured intents:
```json
{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}
```
### Hugging Face Space setup
1. Create a **Gradio Space** inside the [build-small-hackathon](https://huggingface.co/build-small-hackathon) org.
2. Upload this repo (exclude `trained_model/` β€” inference stays on Modal).
3. Add a Space secret: `MODAL_PREDICT_URL` = your deployed Modal `/predict` base URL.
4. Link the demo video and social post in the README (see **Submission links** above).
## Project layout
```
app.py # Gradio demo (hackathon submission UI)
requirements.txt # Space dependencies
data/
train.jsonl # SFT training data (~510 examples)
eval_prompts.json # 50 held-out evaluation prompts
skills.jsonl # Canonical skill ↔ task mapping
src/
skill_router.py # Skill name β†’ trajectory JSON
parameter_binder.py # Runtime parameter β†’ trajectory step substitution
skill_utils.py # Shared JSON parsing helpers
evaluate.py # Local CPU/MPS evaluation
modal_apps/ # Modal training + inference (not named "modal" β€” avoids import clash)
train_modal.py
predict_api.py
infer_modal.py
evaluate_modal.py
run_modal.py
requirements-modal.txt
scripts/
generate_skill_dataset.py
generate_training_data.py
train.py # Local GPU training (optional)
trajectories/ # Pocket Automator exports (Android UI automation traces)
trained_model/ # Local model weights (gitignored)
```
## Evaluation
```bash
# On Modal GPU
modal run modal_apps/evaluate_modal.py
# Locally (needs adapter in trained_model/adapter or merged weights)
python -m src.evaluate
```
## Regenerating data
```bash
python scripts/generate_skill_dataset.py # trajectories β†’ data/skills.jsonl
python scripts/generate_training_data.py # data/skills.jsonl β†’ data/train.jsonl
```
## V2: Intent extraction
V1 maps prompts to a skill label only. V2 extracts structured intents:
"text mom on whatsapp i'm on my way"
β†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
The Gradio demo and Modal `/predict` API both return skill + parameters.
### Parameterized replay
V2 extracts `{skill, parameters}` at inference time. **Slot-filling at replay** substitutes those values into recorded `set_text` / post-search click steps before replay.
**End-to-end flow (validated on WhatsApp):**
```
"text mom on whatsapp i'm on my way"
β†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
β†’ ParameterBinder (Gradio preview + Pocket Automator on device)
β†’ replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"
```
Bindings live in `data/skill_schemas.json` per skill. Supported in preview today: **WhatsApp**, **Gmail**, **YouTube**. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.
```bash
python -m src.parameter_binder # self-test bindings
```
### Data
- `data/skill_schemas.json` β€” parameter definitions and trajectory bindings per skill
- `data/train_intent.jsonl` β€” ~15k synthetic SFT examples (generated locally via script; gitignored β€” upload to Modal for training)
- `data/eval_intent_prompts.json` β€” held-out intent eval set
- `data/pocket_benchmark_prompts.json` β€” 200 real-world messy prompts
### Train & evaluate
```bash
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
modal run modal_apps/evaluate_intent_modal.py
modal run modal_apps/evaluate_pocket_benchmark_modal.py
```
### Benchmark results (Pocket Automator, 200 prompts)
| Metric | Score |
|--------|-------|
| Skill accuracy | 99.0% |
| Parameter accuracy | 86.0% |
| Exact JSON match | 85.5% |
**Next:** self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.
## License
Apache 2.0. Base model weights subject to [Qwen license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct).