Neural-Tuner / README.md
Mohammed-Altaf's picture
added wandb links
45d21c6
metadata
title: Neural Tuner Env Environment Server
emoji: πŸ₯‰
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false

NeuralTuner Environment

An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning.

Full write-up: BLOG.md | Live demo: HuggingFace Space | Google Collab Notebook Notebook | W&B Results Logs

Notebook is also available in the local directory here, since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs.

Overview

NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (FP32/FP16/INT8/INT4), apply structured pruning (LOW/MEDIUM/HIGH), benchmark the current plan, and submit a final configuration for scoring.

The environment supports 19 scenarios across 5 models and 3 difficulty tiers.


Requirements

  • Python β‰₯ 3.10
  • uv (recommended) or pip

Installation

# Clone and install core dependencies
uv sync

# Include training dependencies (TRL, transformers, datasets, matplotlib)
uv sync --extra training

Running the Server

# Development (auto-reload)
uvicorn server.app:app --reload --host 0.0.0.0 --port 8000

# Production
uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4

The server exposes:

Endpoint Method Description
/reset POST Start a new episode
/step POST Execute an action
/state GET Current episode state
/metadata GET Model and scenario metadata
/health GET Server health check
/schema GET Action and observation schemas
/ws WebSocket Persistent session

Running Inference

Run the trained agent across all 19 scenarios using the HuggingFace router:

HF_TOKEN=hf_... python inference.py

Filter by difficulty or scenario:

# Easy tier only
HF_TOKEN=hf_... python inference.py --difficulty easy

# Single scenario
HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium

# Different model
HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct

Training

Open and run neural_tuner_trl.ipynb for the full pipeline:

  1. Environment smoke test
  2. Random policy baseline (n=20 seeds) and oracle ceiling
  3. SFT warm-up on heuristic trajectories
  4. GRPO training with curriculum scheduling (easy β†’ medium β†’ hard)
  5. Post-training evaluation and plot export
# Optional: regenerate baseline vs heuristic episode traces
python rollout_eval.py --trace --model-id inception_v3 --difficulty medium

# Outputs
artifacts/eval/episode_trace.md
artifacts/eval/episode_metrics.json
artifacts/eval/episode_metrics.csv

Project Structure

.
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ app.py                          # FastAPI server
β”‚   β”œβ”€β”€ neural_tuner_env_environment.py # RL environment (reset/step logic)
β”‚   β”œβ”€β”€ simulator.py                    # Hardware simulator (latency/memory/accuracy)
β”‚   β”œβ”€β”€ scenarios.py                    # 19 scenarios Γ— 3 difficulty tiers
β”‚   └── model_zoo.py                    # Layer profiles for 5 neural networks
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ neural_tuner.py                 # TRL-compatible OpenEnv wrapper
β”‚   β”œβ”€β”€ run_training_eval.py            # Post-training evaluation sweep
β”‚   └── training_utils.py              # Scenario splitting and JSONL helpers
β”œβ”€β”€ tests/                              # pytest suite
β”œβ”€β”€ artifacts/
β”‚   β”œβ”€β”€ plots/                          # Training reward plots
β”‚   β”œβ”€β”€ eval/                           # Episode metrics and traces
β”‚   └── training/                       # Baseline and eval metrics JSON
β”œβ”€β”€ neural_tuner_trl.ipynb              # Training notebook
β”œβ”€β”€ inference.py                        # Multi-scenario inference runner
β”œβ”€β”€ rollout_eval.py                     # Baseline vs heuristic evaluator
β”œβ”€β”€ client.py                           # OpenEnv WebSocket client
β”œβ”€β”€ models.py                           # Pydantic action/observation/state models
β”œβ”€β”€ openenv.yaml                        # OpenEnv deployment manifest
β”œβ”€β”€ Dockerfile                          # Container build
└── pyproject.toml                      # Package config and dependencies

Tests

pytest -q

Covers:

  • Reward ordering for safe vs over-aggressive quantization
  • Benchmark budget enforcement (5 per episode)
  • Step limit enforcement (20 per episode)
  • Invalid layer ID and missing argument handling
  • Episode terminal state after submit()
  • Training metrics schema validation
  • Scenario train/eval split correctness (no overlap, deterministic)

Environment Actions

Action Arguments Description
profile_layer layer_id Reveal sensitivity score and optimization advice
quantize_layer layer_id, dtype Apply FP32 / FP16 / INT8 / INT4
prune_layer layer_id, sparsity Remove LOW=25% / MEDIUM=50% / HIGH=75% channels
revert_layer layer_id Reset to FP32, no pruning
benchmark β€” Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode)
submit β€” Finalise episode and receive reward

Reward

reward = latency_reward (0–0.40)
       + memory_reward  (0 or 0.30)
       + accuracy_reward (0–0.20)
       + efficiency_bonus (0 or 0.10)

Maximum reward: 1.0. All constraints must be met simultaneously for the efficiency bonus.


Deployment

# Push to HuggingFace Space
git push space master

Manifest: openenv.yaml β€” runtime: fastapi, entrypoint: server.app:app, port: 8000.