kriyanshi's picture
Add Related repos section linking Pocket Automator and android-dataset.
e52eee4
|
Raw
History Blame Contribute Delete
9.13 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Android Skill Router
emoji: πŸ“±
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.34.2
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: Natural language β†’ Android automation skill β†’ UI trajectory
tags:
  - build-small-hackathon
  - track:backyard
  - track:wood
  - sponsor:modal
  - achievement:offbrand
  - achievement:fieldnotes

Android Skill Router

Build Small Hackathon β€” Backyard AI track Β· Modal sponsor

You say "text mom on whatsapp i'm on my way" β€” a voice assistant might web-search or shrug. Android Skill Router closes that gap with a 3B-parameter intent classifier that maps messy phone language to structured {skill, parameters} JSON, then loads a pre-recorded UI trajectory captured on a real Android device. It is the classifier layer of the Pocket Automator stack: record a flow once on your phone, route to it forever with a tiny model.

"play my workout playlist"  β†’  spotify_play_playlist  β†’  trajectories/spotify_play_playlist.json

Tech: fine-tuned Qwen2.5-3B-Instruct via 4-bit QLoRA + SFT (Unsloth on Modal) β†’ skill router β†’ parameterized trajectory β†’ Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.

Modal /predict (or pasted JSON) β†’ parameter dialog β†’ ParameterBinder β†’ replay β†’ device taps

Submission links

Related repos

Repo Role
Pocket Automator Android recorder, parameter dialog, ParameterBinder, and on-device replay
android-dataset Classifier training, trajectory bindings, Modal API, and this Gradio demo (source)
Live Space Hosted demo β€” natural language in, skill + parameterized trajectory out
Blog post Full write-up of the classify β†’ bind β†’ replay architecture

Hackathon tags

Tag Why
track:backyard Personal automation on hardware you own
sponsor:modal Training, evaluation, and inference on Modal
achievement:tinytitan Full stack on Qwen2.5-3B (≀4B params)
achievement:agent Classify β†’ route β†’ load multi-step UI plan

Recording trajectories

UI traces in trajectories/ were captured with Pocket Automator β€” an Android accessibility recorder that exports JSON for training and replay. Record a flow on device β†’ export β†’ map to a skill via scripts/generate_skill_dataset.py.

Tech stack

Piece What
Base model Qwen/Qwen2.5-3B-Instruct
Fine-tune 4-bit QLoRA + SFT with Unsloth on Modal (modal_apps/train_modal.py)
Inference Modal GPU API (modal_apps/predict_api.py) β€” returns skill + parameters
Parameter binding src/parameter_binder.py + data/skill_schemas.json bindings β€” substitutes runtime values into trajectory steps
Demo UI Gradio (app.py) β€” shows parameterized trajectory preview
Recorder / replay Pocket Automator β€” accessibility capture, parameter dialog, ParameterBinder, replay
Data 15 Android trajectories β†’ data/skills.jsonl β†’ ~510 prompt variations in data/train.jsonl

Quick start (local dev)

# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
pip install modal
modal setup
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl

# 2. Deploy inference API
modal deploy modal_apps/predict_api.py
# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run

# 3. Run the Gradio demo
pip install -r requirements.txt
export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
python app.py

The /predict endpoint returns structured intents:

{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}

Hugging Face Space setup

  1. Create a Gradio Space inside the build-small-hackathon org.
  2. Upload this repo (exclude trained_model/ β€” inference stays on Modal).
  3. Add a Space secret: MODAL_PREDICT_URL = your deployed Modal /predict base URL.
  4. Link the demo video and social post in the README (see Submission links above).

Project layout

app.py                      # Gradio demo (hackathon submission UI)
requirements.txt            # Space dependencies
data/
  train.jsonl               # SFT training data (~510 examples)
  eval_prompts.json         # 50 held-out evaluation prompts
  skills.jsonl              # Canonical skill ↔ task mapping
src/
  skill_router.py           # Skill name β†’ trajectory JSON
  parameter_binder.py       # Runtime parameter β†’ trajectory step substitution
  skill_utils.py            # Shared JSON parsing helpers
  evaluate.py               # Local CPU/MPS evaluation
modal_apps/                 # Modal training + inference (not named "modal" β€” avoids import clash)
  train_modal.py
  predict_api.py
  infer_modal.py
  evaluate_modal.py
  run_modal.py
  requirements-modal.txt
scripts/
  generate_skill_dataset.py
  generate_training_data.py
  train.py                  # Local GPU training (optional)
trajectories/               # Pocket Automator exports (Android UI automation traces)
trained_model/              # Local model weights (gitignored)

Evaluation

# On Modal GPU
modal run modal_apps/evaluate_modal.py

# Locally (needs adapter in trained_model/adapter or merged weights)
python -m src.evaluate

Regenerating data

python scripts/generate_skill_dataset.py    # trajectories β†’ data/skills.jsonl
python scripts/generate_training_data.py    # data/skills.jsonl β†’ data/train.jsonl

V2: Intent extraction

V1 maps prompts to a skill label only. V2 extracts structured intents:

"text mom on whatsapp i'm on my way"
β†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}

The Gradio demo and Modal /predict API both return skill + parameters.

Parameterized replay

V2 extracts {skill, parameters} at inference time. Slot-filling at replay substitutes those values into recorded set_text / post-search click steps before replay.

End-to-end flow (validated on WhatsApp):

"text mom on whatsapp i'm on my way"
  β†’ {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
  β†’ ParameterBinder (Gradio preview + Pocket Automator on device)
  β†’ replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"

Bindings live in data/skill_schemas.json per skill. Supported in preview today: WhatsApp, Gmail, YouTube. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.

python -m src.parameter_binder   # self-test bindings

Data

  • data/skill_schemas.json β€” parameter definitions and trajectory bindings per skill
  • data/train_intent.jsonl β€” ~15k synthetic SFT examples (generated locally via script; gitignored β€” upload to Modal for training)
  • data/eval_intent_prompts.json β€” held-out intent eval set
  • data/pocket_benchmark_prompts.json β€” 200 real-world messy prompts

Train & evaluate

python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
modal run modal_apps/evaluate_intent_modal.py
modal run modal_apps/evaluate_pocket_benchmark_modal.py

Benchmark results (Pocket Automator, 200 prompts)

Metric Score
Skill accuracy 99.0%
Parameter accuracy 86.0%
Exact JSON match 85.5%

Next: self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.

License

Apache 2.0. Base model weights subject to Qwen license.