A newer version of the Gradio SDK is available: 6.19.0
title: Android Skill Router
emoji: π±
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.34.2
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: Natural language β Android automation skill β UI trajectory
tags:
- build-small-hackathon
- track:backyard
- track:wood
- sponsor:modal
- achievement:offbrand
- achievement:fieldnotes
Android Skill Router
Build Small Hackathon β Backyard AI track Β· Modal sponsor
You say "text mom on whatsapp i'm on my way" β a voice assistant might web-search or shrug. Android Skill Router closes that gap with a 3B-parameter intent classifier that maps messy phone language to structured {skill, parameters} JSON, then loads a pre-recorded UI trajectory captured on a real Android device. It is the classifier layer of the Pocket Automator stack: record a flow once on your phone, route to it forever with a tiny model.
"play my workout playlist" β spotify_play_playlist β trajectories/spotify_play_playlist.json
Tech: fine-tuned Qwen2.5-3B-Instruct via 4-bit QLoRA + SFT (Unsloth on Modal) β skill router β parameterized trajectory β Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.
Modal /predict (or pasted JSON) β parameter dialog β ParameterBinder β replay β device taps
Submission links
- Blog post: Hugging Face Blog β Android Skill Router
- Demo video: YouTube Short
- Social post: Twitter/X
- Live Space: android-skill-router
- Android recorder: Pocket Automator
Related repos
| Repo | Role |
|---|---|
| Pocket Automator | Android recorder, parameter dialog, ParameterBinder, and on-device replay |
| android-dataset | Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) |
| Live Space | Hosted demo β natural language in, skill + parameterized trajectory out |
| Blog post | Full write-up of the classify β bind β replay architecture |
Hackathon tags
| Tag | Why |
|---|---|
track:backyard |
Personal automation on hardware you own |
sponsor:modal |
Training, evaluation, and inference on Modal |
achievement:tinytitan |
Full stack on Qwen2.5-3B (β€4B params) |
achievement:agent |
Classify β route β load multi-step UI plan |
Recording trajectories
UI traces in trajectories/ were captured with Pocket Automator β an Android accessibility recorder that exports JSON for training and replay. Record a flow on device β export β map to a skill via scripts/generate_skill_dataset.py.
Tech stack
| Piece | What |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Fine-tune | 4-bit QLoRA + SFT with Unsloth on Modal (modal_apps/train_modal.py) |
| Inference | Modal GPU API (modal_apps/predict_api.py) β returns skill + parameters |
| Parameter binding | src/parameter_binder.py + data/skill_schemas.json bindings β substitutes runtime values into trajectory steps |
| Demo UI | Gradio (app.py) β shows parameterized trajectory preview |
| Recorder / replay | Pocket Automator β accessibility capture, parameter dialog, ParameterBinder, replay |
| Data | 15 Android trajectories β data/skills.jsonl β ~510 prompt variations in data/train.jsonl |
Quick start (local dev)
# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
pip install modal
modal setup
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
# 2. Deploy inference API
modal deploy modal_apps/predict_api.py
# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run
# 3. Run the Gradio demo
pip install -r requirements.txt
export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
python app.py
The /predict endpoint returns structured intents:
{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}
Hugging Face Space setup
- Create a Gradio Space inside the build-small-hackathon org.
- Upload this repo (exclude
trained_model/β inference stays on Modal). - Add a Space secret:
MODAL_PREDICT_URL= your deployed Modal/predictbase URL. - Link the demo video and social post in the README (see Submission links above).
Project layout
app.py # Gradio demo (hackathon submission UI)
requirements.txt # Space dependencies
data/
train.jsonl # SFT training data (~510 examples)
eval_prompts.json # 50 held-out evaluation prompts
skills.jsonl # Canonical skill β task mapping
src/
skill_router.py # Skill name β trajectory JSON
parameter_binder.py # Runtime parameter β trajectory step substitution
skill_utils.py # Shared JSON parsing helpers
evaluate.py # Local CPU/MPS evaluation
modal_apps/ # Modal training + inference (not named "modal" β avoids import clash)
train_modal.py
predict_api.py
infer_modal.py
evaluate_modal.py
run_modal.py
requirements-modal.txt
scripts/
generate_skill_dataset.py
generate_training_data.py
train.py # Local GPU training (optional)
trajectories/ # Pocket Automator exports (Android UI automation traces)
trained_model/ # Local model weights (gitignored)
Evaluation
# On Modal GPU
modal run modal_apps/evaluate_modal.py
# Locally (needs adapter in trained_model/adapter or merged weights)
python -m src.evaluate
Regenerating data
python scripts/generate_skill_dataset.py # trajectories β data/skills.jsonl
python scripts/generate_training_data.py # data/skills.jsonl β data/train.jsonl
V2: Intent extraction
V1 maps prompts to a skill label only. V2 extracts structured intents:
"text mom on whatsapp i'm on my way"
β {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
The Gradio demo and Modal /predict API both return skill + parameters.
Parameterized replay
V2 extracts {skill, parameters} at inference time. Slot-filling at replay substitutes those values into recorded set_text / post-search click steps before replay.
End-to-end flow (validated on WhatsApp):
"text mom on whatsapp i'm on my way"
β {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
β ParameterBinder (Gradio preview + Pocket Automator on device)
β replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"
Bindings live in data/skill_schemas.json per skill. Supported in preview today: WhatsApp, Gmail, YouTube. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.
python -m src.parameter_binder # self-test bindings
Data
data/skill_schemas.jsonβ parameter definitions and trajectory bindings per skilldata/train_intent.jsonlβ ~15k synthetic SFT examples (generated locally via script; gitignored β upload to Modal for training)data/eval_intent_prompts.jsonβ held-out intent eval setdata/pocket_benchmark_prompts.jsonβ 200 real-world messy prompts
Train & evaluate
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
modal run modal_apps/evaluate_intent_modal.py
modal run modal_apps/evaluate_pocket_benchmark_modal.py
Benchmark results (Pocket Automator, 200 prompts)
| Metric | Score |
|---|---|
| Skill accuracy | 99.0% |
| Parameter accuracy | 86.0% |
| Exact JSON match | 85.5% |
Next: self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.
License
Apache 2.0. Base model weights subject to Qwen license.