Spaces:
Sleeping
title: Neural Tuner Env Environment Server
emoji: π₯
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
NeuralTuner Environment
An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning.
Full write-up: BLOG.md | Live demo: HuggingFace Space | Google Collab Notebook Notebook | W&B Results Logs
Notebook is also available in the local directory here, since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs.
Overview
NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (FP32/FP16/INT8/INT4), apply structured pruning (LOW/MEDIUM/HIGH), benchmark the current plan, and submit a final configuration for scoring.
The environment supports 19 scenarios across 5 models and 3 difficulty tiers.
Requirements
- Python β₯ 3.10
- uv (recommended) or pip
Installation
# Clone and install core dependencies
uv sync
# Include training dependencies (TRL, transformers, datasets, matplotlib)
uv sync --extra training
Running the Server
# Development (auto-reload)
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
# Production
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
The server exposes:
| Endpoint | Method | Description |
|---|---|---|
/reset |
POST | Start a new episode |
/step |
POST | Execute an action |
/state |
GET | Current episode state |
/metadata |
GET | Model and scenario metadata |
/health |
GET | Server health check |
/schema |
GET | Action and observation schemas |
/ws |
WebSocket | Persistent session |
Running Inference
Run the trained agent across all 19 scenarios using the HuggingFace router:
HF_TOKEN=hf_... python inference.py
Filter by difficulty or scenario:
# Easy tier only
HF_TOKEN=hf_... python inference.py --difficulty easy
# Single scenario
HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium
# Different model
HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct
Training
Open and run neural_tuner_trl.ipynb for the full pipeline:
- Environment smoke test
- Random policy baseline (n=20 seeds) and oracle ceiling
- SFT warm-up on heuristic trajectories
- GRPO training with curriculum scheduling (easy β medium β hard)
- Post-training evaluation and plot export
# Optional: regenerate baseline vs heuristic episode traces
python rollout_eval.py --trace --model-id inception_v3 --difficulty medium
# Outputs
artifacts/eval/episode_trace.md
artifacts/eval/episode_metrics.json
artifacts/eval/episode_metrics.csv
Project Structure
.
βββ server/
β βββ app.py # FastAPI server
β βββ neural_tuner_env_environment.py # RL environment (reset/step logic)
β βββ simulator.py # Hardware simulator (latency/memory/accuracy)
β βββ scenarios.py # 19 scenarios Γ 3 difficulty tiers
β βββ model_zoo.py # Layer profiles for 5 neural networks
βββ scripts/
β βββ neural_tuner.py # TRL-compatible OpenEnv wrapper
β βββ run_training_eval.py # Post-training evaluation sweep
β βββ training_utils.py # Scenario splitting and JSONL helpers
βββ tests/ # pytest suite
βββ artifacts/
β βββ plots/ # Training reward plots
β βββ eval/ # Episode metrics and traces
β βββ training/ # Baseline and eval metrics JSON
βββ neural_tuner_trl.ipynb # Training notebook
βββ inference.py # Multi-scenario inference runner
βββ rollout_eval.py # Baseline vs heuristic evaluator
βββ client.py # OpenEnv WebSocket client
βββ models.py # Pydantic action/observation/state models
βββ openenv.yaml # OpenEnv deployment manifest
βββ Dockerfile # Container build
βββ pyproject.toml # Package config and dependencies
Tests
pytest -q
Covers:
- Reward ordering for safe vs over-aggressive quantization
- Benchmark budget enforcement (5 per episode)
- Step limit enforcement (20 per episode)
- Invalid layer ID and missing argument handling
- Episode terminal state after
submit() - Training metrics schema validation
- Scenario train/eval split correctness (no overlap, deterministic)
Environment Actions
| Action | Arguments | Description |
|---|---|---|
profile_layer |
layer_id |
Reveal sensitivity score and optimization advice |
quantize_layer |
layer_id, dtype |
Apply FP32 / FP16 / INT8 / INT4 |
prune_layer |
layer_id, sparsity |
Remove LOW=25% / MEDIUM=50% / HIGH=75% channels |
revert_layer |
layer_id |
Reset to FP32, no pruning |
benchmark |
β | Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode) |
submit |
β | Finalise episode and receive reward |
Reward
reward = latency_reward (0β0.40)
+ memory_reward (0 or 0.30)
+ accuracy_reward (0β0.20)
+ efficiency_bonus (0 or 0.10)
Maximum reward: 1.0. All constraints must be met simultaneously for the efficiency bonus.
Deployment
# Push to HuggingFace Space
git push space master
Manifest: openenv.yaml β runtime: fastapi, entrypoint: server.app:app, port: 8000.