---
title: Neural Tuner Env Environment Server
emoji: 🥉
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
---

# NeuralTuner Environment

An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning.

> **Full write-up:** [BLOG.md](BLOG.md)  |  **Live demo:** [HuggingFace Space](https://huggingface.co/spaces/Mohammed-Altaf/Neural-Tuner) | **Google Collab Notebook** [Notebook](https://colab.research.google.com/drive/1cGnFxloW-3WN_I5imlkjGcWJVzZnLbcq?usp=sharing) | **W&B Results** [Logs](https://api.wandb.ai/links/mohammedaltaf4316/4czj329l)

Notebook is also available in the local directory [here](neural_tuner_trl.ipynb), since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs.
---

## Overview

NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (`FP32`/`FP16`/`INT8`/`INT4`), apply structured pruning (`LOW`/`MEDIUM`/`HIGH`), benchmark the current plan, and submit a final configuration for scoring.

The environment supports 19 scenarios across 5 models and 3 difficulty tiers.

---

## Requirements

- Python ≥ 3.10
- [uv](https://github.com/astral-sh/uv) (recommended) or pip

---

## Installation

```bash
# Clone and install core dependencies
uv sync

# Include training dependencies (TRL, transformers, datasets, matplotlib)
uv sync --extra training
```

---

## Running the Server

```bash
# Development (auto-reload)
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

# Production
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
```

The server exposes:

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/reset` | POST | Start a new episode |
| `/step` | POST | Execute an action |
| `/state` | GET | Current episode state |
| `/metadata` | GET | Model and scenario metadata |
| `/health` | GET | Server health check |
| `/schema` | GET | Action and observation schemas |
| `/ws` | WebSocket | Persistent session |

---

## Running Inference

Run the trained agent across all 19 scenarios using the HuggingFace router:

```bash
HF_TOKEN=hf_... python inference.py
```

Filter by difficulty or scenario:

```bash
# Easy tier only
HF_TOKEN=hf_... python inference.py --difficulty easy

# Single scenario
HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium

# Different model
HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct
```

---

## Training

Open and run `neural_tuner_trl.ipynb` for the full pipeline:

1. Environment smoke test
2. Random policy baseline (n=20 seeds) and oracle ceiling
3. SFT warm-up on heuristic trajectories
4. GRPO training with curriculum scheduling (easy → medium → hard)
5. Post-training evaluation and plot export

```bash
# Optional: regenerate baseline vs heuristic episode traces
python rollout_eval.py --trace --model-id inception_v3 --difficulty medium

# Outputs
artifacts/eval/episode_trace.md
artifacts/eval/episode_metrics.json
artifacts/eval/episode_metrics.csv
```

---

## Project Structure

```
.
├── server/
│   ├── app.py                          # FastAPI server
│   ├── neural_tuner_env_environment.py # RL environment (reset/step logic)
│   ├── simulator.py                    # Hardware simulator (latency/memory/accuracy)
│   ├── scenarios.py                    # 19 scenarios × 3 difficulty tiers
│   └── model_zoo.py                    # Layer profiles for 5 neural networks
├── scripts/
│   ├── neural_tuner.py                 # TRL-compatible OpenEnv wrapper
│   ├── run_training_eval.py            # Post-training evaluation sweep
│   └── training_utils.py              # Scenario splitting and JSONL helpers
├── tests/                              # pytest suite
├── artifacts/
│   ├── plots/                          # Training reward plots
│   ├── eval/                           # Episode metrics and traces
│   └── training/                       # Baseline and eval metrics JSON
├── neural_tuner_trl.ipynb              # Training notebook
├── inference.py                        # Multi-scenario inference runner
├── rollout_eval.py                     # Baseline vs heuristic evaluator
├── client.py                           # OpenEnv WebSocket client
├── models.py                           # Pydantic action/observation/state models
├── openenv.yaml                        # OpenEnv deployment manifest
├── Dockerfile                          # Container build
└── pyproject.toml                      # Package config and dependencies
```

---

## Tests

```bash
pytest -q
```

Covers:
- Reward ordering for safe vs over-aggressive quantization
- Benchmark budget enforcement (5 per episode)
- Step limit enforcement (20 per episode)
- Invalid layer ID and missing argument handling
- Episode terminal state after `submit()`
- Training metrics schema validation
- Scenario train/eval split correctness (no overlap, deterministic)

---

## Environment Actions

| Action | Arguments | Description |
|--------|-----------|-------------|
| `profile_layer` | `layer_id` | Reveal sensitivity score and optimization advice |
| `quantize_layer` | `layer_id`, `dtype` | Apply `FP32` / `FP16` / `INT8` / `INT4` |
| `prune_layer` | `layer_id`, `sparsity` | Remove `LOW=25%` / `MEDIUM=50%` / `HIGH=75%` channels |
| `revert_layer` | `layer_id` | Reset to FP32, no pruning |
| `benchmark` | — | Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode) |
| `submit` | — | Finalise episode and receive reward |

---

## Reward

```
reward = latency_reward (0–0.40)
       + memory_reward  (0 or 0.30)
       + accuracy_reward (0–0.20)
       + efficiency_bonus (0 or 0.10)
```

Maximum reward: **1.0**. All constraints must be met simultaneously for the efficiency bonus.

---

## Deployment

```bash
# Push to HuggingFace Space
git push space master
```

Manifest: `openenv.yaml` — runtime: `fastapi`, entrypoint: `server.app:app`, port: `8000`.

---