Neural-Tuner / README.md
Mohammed-Altaf's picture
added wandb links
45d21c6
---
title: Neural Tuner Env Environment Server
emoji: πŸ₯‰
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
---
# NeuralTuner Environment
An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning.
> **Full write-up:** [BLOG.md](BLOG.md) | **Live demo:** [HuggingFace Space](https://huggingface.co/spaces/Mohammed-Altaf/Neural-Tuner) | **Google Collab Notebook** [Notebook](https://colab.research.google.com/drive/1cGnFxloW-3WN_I5imlkjGcWJVzZnLbcq?usp=sharing) | **W&B Results** [Logs](https://api.wandb.ai/links/mohammedaltaf4316/4czj329l)
Notebook is also available in the local directory [here](neural_tuner_trl.ipynb), since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs.
---
## Overview
NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (`FP32`/`FP16`/`INT8`/`INT4`), apply structured pruning (`LOW`/`MEDIUM`/`HIGH`), benchmark the current plan, and submit a final configuration for scoring.
The environment supports 19 scenarios across 5 models and 3 difficulty tiers.
---
## Requirements
- Python β‰₯ 3.10
- [uv](https://github.com/astral-sh/uv) (recommended) or pip
---
## Installation
```bash
# Clone and install core dependencies
uv sync
# Include training dependencies (TRL, transformers, datasets, matplotlib)
uv sync --extra training
```
---
## Running the Server
```bash
# Development (auto-reload)
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
# Production
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
```
The server exposes:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/reset` | POST | Start a new episode |
| `/step` | POST | Execute an action |
| `/state` | GET | Current episode state |
| `/metadata` | GET | Model and scenario metadata |
| `/health` | GET | Server health check |
| `/schema` | GET | Action and observation schemas |
| `/ws` | WebSocket | Persistent session |
---
## Running Inference
Run the trained agent across all 19 scenarios using the HuggingFace router:
```bash
HF_TOKEN=hf_... python inference.py
```
Filter by difficulty or scenario:
```bash
# Easy tier only
HF_TOKEN=hf_... python inference.py --difficulty easy
# Single scenario
HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium
# Different model
HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct
```
---
## Training
Open and run `neural_tuner_trl.ipynb` for the full pipeline:
1. Environment smoke test
2. Random policy baseline (n=20 seeds) and oracle ceiling
3. SFT warm-up on heuristic trajectories
4. GRPO training with curriculum scheduling (easy β†’ medium β†’ hard)
5. Post-training evaluation and plot export
```bash
# Optional: regenerate baseline vs heuristic episode traces
python rollout_eval.py --trace --model-id inception_v3 --difficulty medium
# Outputs
artifacts/eval/episode_trace.md
artifacts/eval/episode_metrics.json
artifacts/eval/episode_metrics.csv
```
---
## Project Structure
```
.
β”œβ”€β”€ server/
β”‚ β”œβ”€β”€ app.py # FastAPI server
β”‚ β”œβ”€β”€ neural_tuner_env_environment.py # RL environment (reset/step logic)
β”‚ β”œβ”€β”€ simulator.py # Hardware simulator (latency/memory/accuracy)
β”‚ β”œβ”€β”€ scenarios.py # 19 scenarios Γ— 3 difficulty tiers
β”‚ └── model_zoo.py # Layer profiles for 5 neural networks
β”œβ”€β”€ scripts/
β”‚ β”œβ”€β”€ neural_tuner.py # TRL-compatible OpenEnv wrapper
β”‚ β”œβ”€β”€ run_training_eval.py # Post-training evaluation sweep
β”‚ └── training_utils.py # Scenario splitting and JSONL helpers
β”œβ”€β”€ tests/ # pytest suite
β”œβ”€β”€ artifacts/
β”‚ β”œβ”€β”€ plots/ # Training reward plots
β”‚ β”œβ”€β”€ eval/ # Episode metrics and traces
β”‚ └── training/ # Baseline and eval metrics JSON
β”œβ”€β”€ neural_tuner_trl.ipynb # Training notebook
β”œβ”€β”€ inference.py # Multi-scenario inference runner
β”œβ”€β”€ rollout_eval.py # Baseline vs heuristic evaluator
β”œβ”€β”€ client.py # OpenEnv WebSocket client
β”œβ”€β”€ models.py # Pydantic action/observation/state models
β”œβ”€β”€ openenv.yaml # OpenEnv deployment manifest
β”œβ”€β”€ Dockerfile # Container build
└── pyproject.toml # Package config and dependencies
```
---
## Tests
```bash
pytest -q
```
Covers:
- Reward ordering for safe vs over-aggressive quantization
- Benchmark budget enforcement (5 per episode)
- Step limit enforcement (20 per episode)
- Invalid layer ID and missing argument handling
- Episode terminal state after `submit()`
- Training metrics schema validation
- Scenario train/eval split correctness (no overlap, deterministic)
---
## Environment Actions
| Action | Arguments | Description |
|--------|-----------|-------------|
| `profile_layer` | `layer_id` | Reveal sensitivity score and optimization advice |
| `quantize_layer` | `layer_id`, `dtype` | Apply `FP32` / `FP16` / `INT8` / `INT4` |
| `prune_layer` | `layer_id`, `sparsity` | Remove `LOW=25%` / `MEDIUM=50%` / `HIGH=75%` channels |
| `revert_layer` | `layer_id` | Reset to FP32, no pruning |
| `benchmark` | β€” | Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode) |
| `submit` | β€” | Finalise episode and receive reward |
---
## Reward
```
reward = latency_reward (0–0.40)
+ memory_reward (0 or 0.30)
+ accuracy_reward (0–0.20)
+ efficiency_bonus (0 or 0.10)
```
Maximum reward: **1.0**. All constraints must be met simultaneously for the efficiency bonus.
---
## Deployment
```bash
# Push to HuggingFace Space
git push space master
```
Manifest: `openenv.yaml` β€” runtime: `fastapi`, entrypoint: `server.app:app`, port: `8000`.
---