Spaces:

Mohammed-Altaf
/

Neural-Tuner

Sleeping

App Files Files Community

Neural-Tuner / README.md

Mohammed-Altaf

added wandb links

45d21c6 about 1 month ago

preview code

raw

history blame contribute delete

6.46 kB

metadata

title: Neural Tuner Env Environment Server
emoji: 🥉
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false

NeuralTuner Environment

An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning.

Full write-up: BLOG.md | Live demo: HuggingFace Space | Google Collab Notebook Notebook | W&B Results Logs

Notebook is also available in the local directory here, since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs.

Overview

NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (FP32/FP16/INT8/INT4), apply structured pruning (LOW/MEDIUM/HIGH), benchmark the current plan, and submit a final configuration for scoring.

The environment supports 19 scenarios across 5 models and 3 difficulty tiers.

Requirements

Python ≥ 3.10
uv (recommended) or pip

Installation

# Clone and install core dependencies
uv sync

# Include training dependencies (TRL, transformers, datasets, matplotlib)
uv sync --extra training

Running the Server

# Development (auto-reload)
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

# Production
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4

The server exposes:

Endpoint	Method	Description
`/reset`	POST	Start a new episode
`/step`	POST	Execute an action
`/state`	GET	Current episode state
`/metadata`	GET	Model and scenario metadata
`/health`	GET	Server health check
`/schema`	GET	Action and observation schemas
`/ws`	WebSocket	Persistent session

Running Inference

Run the trained agent across all 19 scenarios using the HuggingFace router:

HF_TOKEN=hf_... python inference.py

Filter by difficulty or scenario:

# Easy tier only
HF_TOKEN=hf_... python inference.py --difficulty easy

# Single scenario
HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium

# Different model
HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct

Training

Open and run neural_tuner_trl.ipynb for the full pipeline:

Environment smoke test
Random policy baseline (n=20 seeds) and oracle ceiling
SFT warm-up on heuristic trajectories
GRPO training with curriculum scheduling (easy → medium → hard)
Post-training evaluation and plot export

# Optional: regenerate baseline vs heuristic episode traces
python rollout_eval.py --trace --model-id inception_v3 --difficulty medium

# Outputs
artifacts/eval/episode_trace.md
artifacts/eval/episode_metrics.json
artifacts/eval/episode_metrics.csv

Project Structure

.
├── server/
│   ├── app.py                          # FastAPI server
│   ├── neural_tuner_env_environment.py # RL environment (reset/step logic)
│   ├── simulator.py                    # Hardware simulator (latency/memory/accuracy)
│   ├── scenarios.py                    # 19 scenarios × 3 difficulty tiers
│   └── model_zoo.py                    # Layer profiles for 5 neural networks
├── scripts/
│   ├── neural_tuner.py                 # TRL-compatible OpenEnv wrapper
│   ├── run_training_eval.py            # Post-training evaluation sweep
│   └── training_utils.py              # Scenario splitting and JSONL helpers
├── tests/                              # pytest suite
├── artifacts/
│   ├── plots/                          # Training reward plots
│   ├── eval/                           # Episode metrics and traces
│   └── training/                       # Baseline and eval metrics JSON
├── neural_tuner_trl.ipynb              # Training notebook
├── inference.py                        # Multi-scenario inference runner
├── rollout_eval.py                     # Baseline vs heuristic evaluator
├── client.py                           # OpenEnv WebSocket client
├── models.py                           # Pydantic action/observation/state models
├── openenv.yaml                        # OpenEnv deployment manifest
├── Dockerfile                          # Container build
└── pyproject.toml                      # Package config and dependencies

Tests

pytest -q

Covers:

Reward ordering for safe vs over-aggressive quantization
Benchmark budget enforcement (5 per episode)
Step limit enforcement (20 per episode)
Invalid layer ID and missing argument handling
Episode terminal state after submit()
Training metrics schema validation
Scenario train/eval split correctness (no overlap, deterministic)

Environment Actions

Action	Arguments	Description
`profile_layer`	`layer_id`	Reveal sensitivity score and optimization advice
`quantize_layer`	`layer_id`, `dtype`	Apply `FP32` / `FP16` / `INT8` / `INT4`
`prune_layer`	`layer_id`, `sparsity`	Remove `LOW=25%` / `MEDIUM=50%` / `HIGH=75%` channels
`revert_layer`	`layer_id`	Reset to FP32, no pruning
`benchmark`	—	Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode)
`submit`	—	Finalise episode and receive reward

Reward

reward = latency_reward (0–0.40)
       + memory_reward  (0 or 0.30)
       + accuracy_reward (0–0.20)
       + efficiency_bonus (0 or 0.10)

Maximum reward: 1.0. All constraints must be met simultaneously for the efficiency bonus.

Deployment

# Push to HuggingFace Space
git push space master

Manifest: openenv.yaml — runtime: fastapi, entrypoint: server.app:app, port: 8000.