---
title: Android Skill Router
emoji: 📱
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "5.34.2"
python_version: '3.13'
app_file: app.py
pinned: false
license: apache-2.0
short_description: Natural language → Android automation skill → UI trajectory
tags:
  - build-small-hackathon
  - track:backyard
  - track:wood
  - sponsor:modal
  - achievement:offbrand
  - achievement:fieldnotes
---

# Android Skill Router

**Build Small Hackathon — Backyard AI track · Modal sponsor**

You say *"text mom on whatsapp i'm on my way"* — a voice assistant might web-search or shrug. Android Skill Router closes that gap with a **3B-parameter intent classifier** that maps messy phone language to structured `{skill, parameters}` JSON, then loads a **pre-recorded UI trajectory** captured on a real Android device. It is the classifier layer of the **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** stack: record a flow once on your phone, route to it forever with a tiny model.

```
"play my workout playlist"  →  spotify_play_playlist  →  trajectories/spotify_play_playlist.json
```

**Tech:** fine-tuned [Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) via 4-bit QLoRA + SFT ([Unsloth](https://github.com/unslothai/unsloth) on Modal) → skill router → parameterized trajectory → Pocket Automator replay on device. Fifteen real Android flows expand to ~15k synthetic intent examples for training; inference runs on Modal, demo UI on Gradio.

```
Modal /predict (or pasted JSON) → parameter dialog → ParameterBinder → replay → device taps
```

**Submission links**

- **Blog post:** [Hugging Face Blog — Android Skill Router](https://huggingface.co/blog/build-small-hackathon/android-skill-router)
- **Demo video:** [YouTube Short](https://youtube.com/shorts/IQRHf7HfTDA)
- **Social post:** [Twitter/X](https://x.com/kriyanshii/status/2066587828839141634)
- **Live Space:** [android-skill-router](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)
- **Android recorder:** [Pocket Automator](https://github.com/kriyanshii/pocket-automator)

## Related repos

| Repo | Role |
| --- | --- |
| **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** | Android recorder, parameter dialog, `ParameterBinder`, and on-device replay |
| **[android-dataset](https://github.com/kriyanshii/android-dataset)** | Classifier training, trajectory bindings, Modal API, and this Gradio demo (source) |
| **[Live Space](https://huggingface.co/spaces/build-small-hackathon/android-skill-router)** | Hosted demo — natural language in, skill + parameterized trajectory out |
| **[Blog post](https://huggingface.co/blog/build-small-hackathon/android-skill-router)** | Full write-up of the classify → bind → replay architecture |

## Hackathon tags

| Tag | Why |
| --- | --- |
| `track:backyard` | Personal automation on hardware you own |
| `sponsor:modal` | Training, evaluation, and inference on Modal |
| `achievement:tinytitan` | Full stack on Qwen2.5-3B (≤4B params) |
| `achievement:agent` | Classify → route → load multi-step UI plan |

## Recording trajectories

UI traces in `trajectories/` were captured with **[Pocket Automator](https://github.com/kriyanshii/pocket-automator)** — an Android accessibility recorder that exports JSON for training and replay. Record a flow on device → export → map to a skill via `scripts/generate_skill_dataset.py`.

## Tech stack

| Piece | What |
| --- | --- |
| **Base model** | [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) |
| **Fine-tune** | 4-bit QLoRA + SFT with [Unsloth](https://github.com/unslothai/unsloth) on Modal (`modal_apps/train_modal.py`) |
| **Inference** | Modal GPU API (`modal_apps/predict_api.py`) — returns skill + parameters |
| **Parameter binding** | `src/parameter_binder.py` + `data/skill_schemas.json` bindings — substitutes runtime values into trajectory steps |
| **Demo UI** | Gradio (`app.py`) — shows parameterized trajectory preview |
| **Recorder / replay** | [Pocket Automator](https://github.com/kriyanshii/pocket-automator) — accessibility capture, parameter dialog, `ParameterBinder`, replay |
| **Data** | 15 Android trajectories → `data/skills.jsonl` → ~510 prompt variations in `data/train.jsonl` |

## Quick start (local dev)

```bash
# 1. Train intent model on Modal (uploads data/train_intent.jsonl, saves adapter to volume)
pip install modal
modal setup
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl

# 2. Deploy inference API
modal deploy modal_apps/predict_api.py
# Copy the printed URL, e.g. https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run

# 3. Run the Gradio demo
pip install -r requirements.txt
export MODAL_PREDICT_URL="https://<workspace>--android-skill-predict-api-skillpredictor-web.modal.run"
python app.py
```

The `/predict` endpoint returns structured intents:

```json
{"skill": "whatsapp_send_message", "parameters": {"contact": "ri", "message": "see you soon"}}
```

### Hugging Face Space setup

1. Create a **Gradio Space** inside the [build-small-hackathon](https://huggingface.co/build-small-hackathon) org.
2. Upload this repo (exclude `trained_model/` — inference stays on Modal).
3. Add a Space secret: `MODAL_PREDICT_URL` = your deployed Modal `/predict` base URL.
4. Link the demo video and social post in the README (see **Submission links** above).

## Project layout

```
app.py                      # Gradio demo (hackathon submission UI)
requirements.txt            # Space dependencies
data/
  train.jsonl               # SFT training data (~510 examples)
  eval_prompts.json         # 50 held-out evaluation prompts
  skills.jsonl              # Canonical skill ↔ task mapping
src/
  skill_router.py           # Skill name → trajectory JSON
  parameter_binder.py       # Runtime parameter → trajectory step substitution
  skill_utils.py            # Shared JSON parsing helpers
  evaluate.py               # Local CPU/MPS evaluation
modal_apps/                 # Modal training + inference (not named "modal" — avoids import clash)
  train_modal.py
  predict_api.py
  infer_modal.py
  evaluate_modal.py
  run_modal.py
  requirements-modal.txt
scripts/
  generate_skill_dataset.py
  generate_training_data.py
  train.py                  # Local GPU training (optional)
trajectories/               # Pocket Automator exports (Android UI automation traces)
trained_model/              # Local model weights (gitignored)
```

## Evaluation

```bash
# On Modal GPU
modal run modal_apps/evaluate_modal.py

# Locally (needs adapter in trained_model/adapter or merged weights)
python -m src.evaluate
```

## Regenerating data

```bash
python scripts/generate_skill_dataset.py    # trajectories → data/skills.jsonl
python scripts/generate_training_data.py    # data/skills.jsonl → data/train.jsonl
```

## V2: Intent extraction

V1 maps prompts to a skill label only. V2 extracts structured intents:

    "text mom on whatsapp i'm on my way"
    → {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}

The Gradio demo and Modal `/predict` API both return skill + parameters.

### Parameterized replay

V2 extracts `{skill, parameters}` at inference time. **Slot-filling at replay** substitutes those values into recorded `set_text` / post-search click steps before replay.

**End-to-end flow (validated on WhatsApp):**

```
"text mom on whatsapp i'm on my way"
  → {"skill": "whatsapp_send_message", "parameters": {"contact": "mom", "message": "i'm on my way"}}
  → ParameterBinder (Gradio preview + Pocket Automator on device)
  → replay with "mom" / "i'm on my way", not the recorded "Biraj" / "Hi"
```

Bindings live in `data/skill_schemas.json` per skill. Supported in preview today: **WhatsApp**, **Gmail**, **YouTube**. Pocket Automator mirrors the same binding rules at replay time via its parameter dialog.

```bash
python -m src.parameter_binder   # self-test bindings
```

### Data
- `data/skill_schemas.json` — parameter definitions and trajectory bindings per skill
- `data/train_intent.jsonl` — ~15k synthetic SFT examples (generated locally via script; gitignored — upload to Modal for training)
- `data/eval_intent_prompts.json` — held-out intent eval set
- `data/pocket_benchmark_prompts.json` — 200 real-world messy prompts

### Train & evaluate
```bash
python scripts/generate_intent_dataset.py
modal run modal_apps/train_modal.py --dataset train_intent.jsonl
modal run modal_apps/evaluate_intent_modal.py
modal run modal_apps/evaluate_pocket_benchmark_modal.py
```

### Benchmark results (Pocket Automator, 200 prompts)

| Metric | Score |
|--------|-------|
| Skill accuracy | 99.0% |
| Parameter accuracy | 86.0% |
| Exact JSON match | 85.5% |

**Next:** self-contained trajectory exports (bindings embedded in export JSON) and bindings for remaining skills.

## License

Apache 2.0. Base model weights subject to [Qwen license](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct).