--- title: Neural Tuner Env Environment Server emoji: 🥉 colorFrom: purple colorTo: pink sdk: docker pinned: false --- # NeuralTuner Environment An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning. > **Full write-up:** [BLOG.md](BLOG.md) | **Live demo:** [HuggingFace Space](https://huggingface.co/spaces/Mohammed-Altaf/Neural-Tuner) | **Google Collab Notebook** [Notebook](https://colab.research.google.com/drive/1cGnFxloW-3WN_I5imlkjGcWJVzZnLbcq?usp=sharing) | **W&B Results** [Logs](https://api.wandb.ai/links/mohammedaltaf4316/4czj329l) Notebook is also available in the local directory [here](neural_tuner_trl.ipynb), since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs. --- ## Overview NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (`FP32`/`FP16`/`INT8`/`INT4`), apply structured pruning (`LOW`/`MEDIUM`/`HIGH`), benchmark the current plan, and submit a final configuration for scoring. The environment supports 19 scenarios across 5 models and 3 difficulty tiers. --- ## Requirements - Python ≥ 3.10 - [uv](https://github.com/astral-sh/uv) (recommended) or pip --- ## Installation ```bash # Clone and install core dependencies uv sync # Include training dependencies (TRL, transformers, datasets, matplotlib) uv sync --extra training ``` --- ## Running the Server ```bash # Development (auto-reload) uvicorn server.app:app --reload --host 0.0.0.0 --port 8000 # Production uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4 ``` The server exposes: | Endpoint | Method | Description | |----------|--------|-------------| | `/reset` | POST | Start a new episode | | `/step` | POST | Execute an action | | `/state` | GET | Current episode state | | `/metadata` | GET | Model and scenario metadata | | `/health` | GET | Server health check | | `/schema` | GET | Action and observation schemas | | `/ws` | WebSocket | Persistent session | --- ## Running Inference Run the trained agent across all 19 scenarios using the HuggingFace router: ```bash HF_TOKEN=hf_... python inference.py ``` Filter by difficulty or scenario: ```bash # Easy tier only HF_TOKEN=hf_... python inference.py --difficulty easy # Single scenario HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium # Different model HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct ``` --- ## Training Open and run `neural_tuner_trl.ipynb` for the full pipeline: 1. Environment smoke test 2. Random policy baseline (n=20 seeds) and oracle ceiling 3. SFT warm-up on heuristic trajectories 4. GRPO training with curriculum scheduling (easy → medium → hard) 5. Post-training evaluation and plot export ```bash # Optional: regenerate baseline vs heuristic episode traces python rollout_eval.py --trace --model-id inception_v3 --difficulty medium # Outputs artifacts/eval/episode_trace.md artifacts/eval/episode_metrics.json artifacts/eval/episode_metrics.csv ``` --- ## Project Structure ``` . ├── server/ │ ├── app.py # FastAPI server │ ├── neural_tuner_env_environment.py # RL environment (reset/step logic) │ ├── simulator.py # Hardware simulator (latency/memory/accuracy) │ ├── scenarios.py # 19 scenarios × 3 difficulty tiers │ └── model_zoo.py # Layer profiles for 5 neural networks ├── scripts/ │ ├── neural_tuner.py # TRL-compatible OpenEnv wrapper │ ├── run_training_eval.py # Post-training evaluation sweep │ └── training_utils.py # Scenario splitting and JSONL helpers ├── tests/ # pytest suite ├── artifacts/ │ ├── plots/ # Training reward plots │ ├── eval/ # Episode metrics and traces │ └── training/ # Baseline and eval metrics JSON ├── neural_tuner_trl.ipynb # Training notebook ├── inference.py # Multi-scenario inference runner ├── rollout_eval.py # Baseline vs heuristic evaluator ├── client.py # OpenEnv WebSocket client ├── models.py # Pydantic action/observation/state models ├── openenv.yaml # OpenEnv deployment manifest ├── Dockerfile # Container build └── pyproject.toml # Package config and dependencies ``` --- ## Tests ```bash pytest -q ``` Covers: - Reward ordering for safe vs over-aggressive quantization - Benchmark budget enforcement (5 per episode) - Step limit enforcement (20 per episode) - Invalid layer ID and missing argument handling - Episode terminal state after `submit()` - Training metrics schema validation - Scenario train/eval split correctness (no overlap, deterministic) --- ## Environment Actions | Action | Arguments | Description | |--------|-----------|-------------| | `profile_layer` | `layer_id` | Reveal sensitivity score and optimization advice | | `quantize_layer` | `layer_id`, `dtype` | Apply `FP32` / `FP16` / `INT8` / `INT4` | | `prune_layer` | `layer_id`, `sparsity` | Remove `LOW=25%` / `MEDIUM=50%` / `HIGH=75%` channels | | `revert_layer` | `layer_id` | Reset to FP32, no pruning | | `benchmark` | — | Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode) | | `submit` | — | Finalise episode and receive reward | --- ## Reward ``` reward = latency_reward (0–0.40) + memory_reward (0 or 0.30) + accuracy_reward (0–0.20) + efficiency_bonus (0 or 0.10) ``` Maximum reward: **1.0**. All constraints must be met simultaneously for the efficiency bonus. --- ## Deployment ```bash # Push to HuggingFace Space git push space master ``` Manifest: `openenv.yaml` — runtime: `fastapi`, entrypoint: `server.app:app`, port: `8000`. ---