Spaces:
Sleeping
Sleeping
| title: Neural Tuner Env Environment Server | |
| emoji: π₯ | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: docker | |
| pinned: false | |
| # NeuralTuner Environment | |
| An OpenEnv-compatible RL environment that trains LLMs to optimize neural networks for Qualcomm Snapdragon edge hardware via per-layer quantization and structured pruning. | |
| > **Full write-up:** [BLOG.md](BLOG.md) | **Live demo:** [HuggingFace Space](https://huggingface.co/spaces/Mohammed-Altaf/Neural-Tuner) | **Google Collab Notebook** [Notebook](https://colab.research.google.com/drive/1cGnFxloW-3WN_I5imlkjGcWJVzZnLbcq?usp=sharing) | **W&B Results** [Logs](https://api.wandb.ai/links/mohammedaltaf4316/4czj329l) | |
| Notebook is also available in the local directory [here](neural_tuner_trl.ipynb), since was not able to train model for longer runs locally above like of Colab has been added which can be used to view full training results along with weights&biases logs. | |
| --- | |
| ## Overview | |
| NeuralTuner wraps a hardware simulator as a multi-step RL environment. An LLM agent receives a model's layer table and a set of Snapdragon HTP constraints (latency budget, memory budget, minimum accuracy), then issues tool calls to profile layers, apply quantization dtypes (`FP32`/`FP16`/`INT8`/`INT4`), apply structured pruning (`LOW`/`MEDIUM`/`HIGH`), benchmark the current plan, and submit a final configuration for scoring. | |
| The environment supports 19 scenarios across 5 models and 3 difficulty tiers. | |
| --- | |
| ## Requirements | |
| - Python β₯ 3.10 | |
| - [uv](https://github.com/astral-sh/uv) (recommended) or pip | |
| --- | |
| ## Installation | |
| ```bash | |
| # Clone and install core dependencies | |
| uv sync | |
| # Include training dependencies (TRL, transformers, datasets, matplotlib) | |
| uv sync --extra training | |
| ``` | |
| --- | |
| ## Running the Server | |
| ```bash | |
| # Development (auto-reload) | |
| uvicorn server.app:app --reload --host 0.0.0.0 --port 8000 | |
| # Production | |
| uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4 | |
| ``` | |
| The server exposes: | |
| | Endpoint | Method | Description | | |
| |----------|--------|-------------| | |
| | `/reset` | POST | Start a new episode | | |
| | `/step` | POST | Execute an action | | |
| | `/state` | GET | Current episode state | | |
| | `/metadata` | GET | Model and scenario metadata | | |
| | `/health` | GET | Server health check | | |
| | `/schema` | GET | Action and observation schemas | | |
| | `/ws` | WebSocket | Persistent session | | |
| --- | |
| ## Running Inference | |
| Run the trained agent across all 19 scenarios using the HuggingFace router: | |
| ```bash | |
| HF_TOKEN=hf_... python inference.py | |
| ``` | |
| Filter by difficulty or scenario: | |
| ```bash | |
| # Easy tier only | |
| HF_TOKEN=hf_... python inference.py --difficulty easy | |
| # Single scenario | |
| HF_TOKEN=hf_... python inference.py --scenario mobilenet_v3_medium | |
| # Different model | |
| HF_TOKEN=hf_... python inference.py --model Qwen/Qwen2.5-72B-Instruct | |
| ``` | |
| --- | |
| ## Training | |
| Open and run `neural_tuner_trl.ipynb` for the full pipeline: | |
| 1. Environment smoke test | |
| 2. Random policy baseline (n=20 seeds) and oracle ceiling | |
| 3. SFT warm-up on heuristic trajectories | |
| 4. GRPO training with curriculum scheduling (easy β medium β hard) | |
| 5. Post-training evaluation and plot export | |
| ```bash | |
| # Optional: regenerate baseline vs heuristic episode traces | |
| python rollout_eval.py --trace --model-id inception_v3 --difficulty medium | |
| # Outputs | |
| artifacts/eval/episode_trace.md | |
| artifacts/eval/episode_metrics.json | |
| artifacts/eval/episode_metrics.csv | |
| ``` | |
| --- | |
| ## Project Structure | |
| ``` | |
| . | |
| βββ server/ | |
| β βββ app.py # FastAPI server | |
| β βββ neural_tuner_env_environment.py # RL environment (reset/step logic) | |
| β βββ simulator.py # Hardware simulator (latency/memory/accuracy) | |
| β βββ scenarios.py # 19 scenarios Γ 3 difficulty tiers | |
| β βββ model_zoo.py # Layer profiles for 5 neural networks | |
| βββ scripts/ | |
| β βββ neural_tuner.py # TRL-compatible OpenEnv wrapper | |
| β βββ run_training_eval.py # Post-training evaluation sweep | |
| β βββ training_utils.py # Scenario splitting and JSONL helpers | |
| βββ tests/ # pytest suite | |
| βββ artifacts/ | |
| β βββ plots/ # Training reward plots | |
| β βββ eval/ # Episode metrics and traces | |
| β βββ training/ # Baseline and eval metrics JSON | |
| βββ neural_tuner_trl.ipynb # Training notebook | |
| βββ inference.py # Multi-scenario inference runner | |
| βββ rollout_eval.py # Baseline vs heuristic evaluator | |
| βββ client.py # OpenEnv WebSocket client | |
| βββ models.py # Pydantic action/observation/state models | |
| βββ openenv.yaml # OpenEnv deployment manifest | |
| βββ Dockerfile # Container build | |
| βββ pyproject.toml # Package config and dependencies | |
| ``` | |
| --- | |
| ## Tests | |
| ```bash | |
| pytest -q | |
| ``` | |
| Covers: | |
| - Reward ordering for safe vs over-aggressive quantization | |
| - Benchmark budget enforcement (5 per episode) | |
| - Step limit enforcement (20 per episode) | |
| - Invalid layer ID and missing argument handling | |
| - Episode terminal state after `submit()` | |
| - Training metrics schema validation | |
| - Scenario train/eval split correctness (no overlap, deterministic) | |
| --- | |
| ## Environment Actions | |
| | Action | Arguments | Description | | |
| |--------|-----------|-------------| | |
| | `profile_layer` | `layer_id` | Reveal sensitivity score and optimization advice | | |
| | `quantize_layer` | `layer_id`, `dtype` | Apply `FP32` / `FP16` / `INT8` / `INT4` | | |
| | `prune_layer` | `layer_id`, `sparsity` | Remove `LOW=25%` / `MEDIUM=50%` / `HIGH=75%` channels | | |
| | `revert_layer` | `layer_id` | Reset to FP32, no pruning | | |
| | `benchmark` | β | Simulate hardware; returns latency, memory, accuracy, reward (max 5/episode) | | |
| | `submit` | β | Finalise episode and receive reward | | |
| --- | |
| ## Reward | |
| ``` | |
| reward = latency_reward (0β0.40) | |
| + memory_reward (0 or 0.30) | |
| + accuracy_reward (0β0.20) | |
| + efficiency_bonus (0 or 0.10) | |
| ``` | |
| Maximum reward: **1.0**. All constraints must be met simultaneously for the efficiency bonus. | |
| --- | |
| ## Deployment | |
| ```bash | |
| # Push to HuggingFace Space | |
| git push space master | |
| ``` | |
| Manifest: `openenv.yaml` β runtime: `fastapi`, entrypoint: `server.app:app`, port: `8000`. | |
| --- | |