Spaces:

build-small-hackathon
/

android-skill-router

Sleeping

App Files Files Community

android-skill-router / README.md

kriyanshi

Add Related repos section linking Pocket Automator and android-dataset.

e52eee4 19 days ago

preview code

Raw

History Blame Contribute Delete

9.13 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

metadata

title: Android Skill Router
emoji: 📱
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.34.2
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: Natural language → Android automation skill → UI trajectory
tags:
  - build-small-hackathon
  - track:backyard
  - track:wood
  - sponsor:modal
  - achievement:offbrand
  - achievement:fieldnotes

Android Skill Router

Build Small Hackathon — Backyard AI track · Modal sponsor

You say "text mom on whatsapp i'm on my way" — a voice assistant might web-search or shrug. Android Skill Router closes that gap with a 3B-parameter intent classifier that maps messy phone language to structured {skill, parameters} JSON, then loads a pre-recorded UI trajectory captured on a real Android device. It is the classifier layer of the Pocket Automator stack: record a flow once on your phone, route to it forever with a tiny model.

"play my workout playlist"  →  spotify_play_playlist  →  trajectories/spotify_play_playlist.json

Tech: fine-tuned Qwen2.5-3B-Instruct via 4-bit QLoRA + SFT (Unsloth on Modal) → skill router → parameterized trajectory → Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.

Modal /predict (or pasted JSON) → parameter dialog → ParameterBinder → replay → device taps

Submission links

Blog post: Hugging Face Blog — Android Skill Router
Demo video: YouTube Short
Social post: Twitter/X
Live Space: android-skill-router
Android recorder: Pocket Automator

Related repos

Repo	Role
Pocket Automator	Android recorder, parameter dialog, `ParameterBinder`, and on-device replay
android-dataset	Classifier training, trajectory bindings, Modal API, and this Gradio demo (source)
Live Space	Hosted demo — natural language in, skill + parameterized trajectory out
Blog post	Full write-up of the classify → bind → replay architecture

Hackathon tags

Tag	Why
`track:backyard`	Personal automation on hardware you own
`sponsor:modal`	Training, evaluation, and inference on Modal
`achievement:tinytitan`	Full stack on Qwen2.5-3B (≤4B params)
`achievement:agent`	Classify → route → load multi-step UI plan

Recording trajectories

UI traces in trajectories/ were captured with Pocket Automator — an Android accessibility recorder that exports JSON for training and replay. Record a flow on device → export → map to a skill via scripts/generate_skill_dataset.py.

Tech stack

Piece	What
Base model	Qwen/Qwen2.5-3B-Instruct
Fine-tune	4-bit QLoRA + SFT with Unsloth on Modal (`modal_apps/train_modal.py`)
Inference	Modal GPU API (`modal_apps/predict_api.py`) — returns skill + parameters
Parameter binding	`src/parameter_binder.py` + `data/skill_schemas.json` bindings — substitutes runtime values into trajectory steps
Demo UI	Gradio (`app.py`) — shows parameterized trajectory preview
Recorder / replay	Pocket Automator — accessibility capture, parameter dialog, `ParameterBinder`, replay
Data	15 Android trajectories → `data/skills.jsonl` → ~510 prompt variations in `data/train.jsonl`

Quick start (local dev)

# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
pip install modal
modal setup
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl

# 2. Deploy inference API
modal deploy modal_apps/predict_api.py
# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run

# 3. Run the Gradio demo
pip install -r requirements.txt
export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
python app.py

The /predict endpoint returns structured intents:

{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}

Hugging Face Space setup

Create a Gradio Space inside the build-small-hackathon org.
Upload this repo (exclude trained_model/ — inference stays on Modal).
Add a Space secret: MODAL_PREDICT_URL = your deployed Modal /predict base URL.
Link the demo video and social post in the README (see Submission links above).

Project layout

app.py                      # Gradio demo (hackathon submission UI)
requirements.txt            # Space dependencies
data/
  train.jsonl               # SFT training data (~510 examples)
  eval_prompts.json         # 50 held-out evaluation prompts
  skills.jsonl              # Canonical skill ↔ task mapping
src/
  skill_router.py           # Skill name → trajectory JSON
  parameter_binder.py       # Runtime parameter → trajectory step substitution
  skill_utils.py            # Shared JSON parsing helpers
  evaluate.py               # Local CPU/MPS evaluation
modal_apps/                 # Modal training + inference (not named "modal" — avoids import clash)
  train_modal.py
  predict_api.py
  infer_modal.py
  evaluate_modal.py
  run_modal.py
  requirements-modal.txt
scripts/
  generate_skill_dataset.py
  generate_training_data.py
  train.py                  # Local GPU training (optional)
trajectories/               # Pocket Automator exports (Android UI automation traces)
trained_model/              # Local model weights (gitignored)

Evaluation

# On Modal GPU
modal run modal_apps/evaluate_modal.py

# Locally (needs adapter in trained_model/adapter or merged weights)
python -m src.evaluate

Regenerating data

python scripts/generate_skill_dataset.py    # trajectories → data/skills.jsonl
python scripts/generate_training_data.py    # data/skills.jsonl → data/train.jsonl

V2: Intent extraction

V1 maps prompts to a skill label only. V2 extracts structured intents:

"text mom on whatsapp i'm on my way"
→ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}

The Gradio demo and Modal /predict API both return skill + parameters.

Parameterized replay

V2 extracts {skill, parameters} at inference time. Slot-filling at replay substitutes those values into recorded set_text / post-search click steps before replay.

End-to-end flow (validated on WhatsApp):

"text mom on whatsapp i'm on my way"
  → {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
  → ParameterBinder (Gradio preview + Pocket Automator on device)
  → replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"

Bindings live in data/skill_schemas.json per skill. Supported in preview today: WhatsApp, Gmail, YouTube. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.

python -m src.parameter_binder   # self-test bindings

Data

data/skill_schemas.json — parameter definitions and trajectory bindings per skill
data/train_intent.jsonl — ~15k synthetic SFT examples (generated locally via script; gitignored — upload to Modal for training)
data/eval_intent_prompts.json — held-out intent eval set
data/pocket_benchmark_prompts.json — 200 real-world messy prompts

Train & evaluate

python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
modal run modal_apps/evaluate_intent_modal.py
modal run modal_apps/evaluate_pocket_benchmark_modal.py

Benchmark results (Pocket Automator, 200 prompts)

Metric	Score
Skill accuracy	99.0%
Parameter accuracy	86.0%
Exact JSON match	85.5%

Next: self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.

License

Apache 2.0. Base model weights subject to Qwen license.