Text Generation
PEFT
Safetensors
grpo
lora
trl
unsloth
reinforcement-learning
process-control
methanol
conversational
Instructions to use glitchfilter/methanol-apc with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use glitchfilter/methanol-apc with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-3B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "glitchfilter/methanol-apc") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Unsloth Studio new
How to use glitchfilter/methanol-apc with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for glitchfilter/methanol-apc to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for glitchfilter/methanol-apc to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for glitchfilter/methanol-apc to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="glitchfilter/methanol-apc", max_seq_length=2048, )
| base_model: unsloth/Qwen2.5-3B-Instruct-bnb-4bit | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| model_name: methanol-apc | |
| tags: | |
| - base_model:adapter:unsloth/Qwen2.5-3B-Instruct-bnb-4bit | |
| - grpo | |
| - lora | |
| - peft | |
| - trl | |
| - unsloth | |
| - reinforcement-learning | |
| - process-control | |
| - methanol | |
| license: apache-2.0 | |
| # Model Card for methanol-apc | |
| LoRA adapter for [`unsloth/Qwen2.5-3B-Instruct-bnb-4bit`](https://huggingface.co/unsloth/Qwen2.5-3B-Instruct-bnb-4bit), fine-tuned with **GRPO** ([Group Relative Policy Optimization](https://huggingface.co/papers/2402.03300)) using [Unsloth](https://github.com/unslothai/unsloth) to act as an autonomous **Advanced Process Control (APC)** operator for a methanol synthesis reactor. | |
| The agent reads simulated sensor readings (temperature, pressure, H₂/CO ratio, catalyst health, …) and emits a JSON control action — feed rates, cooling water flow, and compressor power — that is scored by the [`methanol-apc` OpenEnv environment](https://huggingface.co/spaces/glitchfilter/methanol-apc-env). | |
| - **Model on Hugging Face:** [glitchfilter/methanol-apc](https://huggingface.co/glitchfilter/methanol-apc) | |
| - **Environment:** [glitchfilter/methanol-apc-env (HF Space)](https://huggingface.co/spaces/glitchfilter/methanol-apc-env) · [Bhavneet1492/openenv-methanol-apc (GitHub)](https://github.com/Bhavneet1492/openenv-methanol-apc) | |
| - **Base model:** [unsloth/Qwen2.5-3B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Qwen2.5-3B-Instruct-bnb-4bit) | |
| ## Quick start | |
| ```python | |
| from unsloth import FastLanguageModel | |
| from peft import PeftModel | |
| model, tokenizer = FastLanguageModel.from_pretrained( | |
| model_name="unsloth/Qwen2.5-3B-Instruct-bnb-4bit", | |
| max_seq_length=2048, | |
| load_in_4bit=True, | |
| ) | |
| model = PeftModel.from_pretrained(model, "glitchfilter/methanol-apc") | |
| FastLanguageModel.for_inference(model) | |
| system_prompt = ( | |
| "You are an AI controller for a methanol synthesis reactor. " | |
| "Output a JSON control action with fields: " | |
| '{"feed_rate_h2": <0-10>, "feed_rate_co": <0-5>, ' | |
| '"cooling_water_flow": <0-100>, "compressor_power": <0-100>}.' | |
| ) | |
| sensors = "T=248.3°C P=85.0bar H2=4.50mol/s CO=2.20mol/s ratio=2.05 cool=55L/min cat_health=98%" | |
| messages = [ | |
| {"role": "system", "content": system_prompt}, | |
| {"role": "user", "content": f"Current sensor readings:\n{sensors}\n\nProvide control action as JSON:"}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| import torch | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| out = model.generate(**inputs, max_new_tokens=128, temperature=0.3, do_sample=True, | |
| pad_token_id=tokenizer.eos_token_id) | |
| print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Training procedure | |
| Trained with **GRPO** accelerated by Unsloth's 4-bit quantized base model and LoRA adapters. | |
| **Pipeline:** `LLM generates JSON action` → `reward fn parses & scores` → `env.step()` → `multi-component reward` → `GRPO update`. | |
| ### Key design choices | |
| - **Curriculum learning** over three task types: | |
| - `startup` (40%) — easy: ramp reactor to operating temperature | |
| - `optimization` (35%) — medium: maximize profit at steady state | |
| - `disturbance_rejection` (25%) — hard: handle cooling system failures | |
| - **Multi-component reward** combining: | |
| 1. Physics reward from `env.step` (× 0.55) | |
| 2. Format-compliance bonus for valid JSON actions (+0.10) | |
| 3. Action-quality score grounded in stoichiometry / cooling adequacy ([−0.30, +0.20]) | |
| 4. 3-step lookahead penalty to surface delayed thermal-runaway consequences ([−0.20, 0]) | |
| - **Deterministic replay**: each prompt stores `(task, seed, num_warmup)` so all GRPO group completions evaluate against an identical environment state. | |
| ### Hyperparameters | |
| | | | | |
| |---|---| | |
| | Base model | `unsloth/Qwen2.5-3B-Instruct-bnb-4bit` (4-bit) | | |
| | LoRA `r` / `alpha` / dropout | 16 / 32 / 0 | | |
| | LoRA target modules | `q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj` | | |
| | Max sequence length | 2048 | | |
| | Max completion length | 120 tokens | | |
| | Train steps | 200 | | |
| | Per-device batch × grad accum | 2 × 4 | | |
| | GRPO group size (`num_generations`) | 8 | | |
| | Learning rate | 5e-6 | | |
| | Warmup ratio | 0.05 | | |
| | Max grad norm | 1.0 | | |
| | Sampling temperature | 0.7 | | |
| | KL coefficient | 0.05 | | |
| | Precision | fp16 (bf16 where supported) | | |
| | Gradient checkpointing | Unsloth | | |
| | Prompt dataset size | 300 | | |
| ### Framework versions | |
| - PEFT 0.18.1 | |
| - Unsloth (`git+https://github.com/unslothai/unsloth.git`) | |
| - TRL ≥ 0.15 | |
| - `openenv-core[core]` ≥ 0.2.2 | |
| ## Evaluation | |
| The trained agent is compared against a random-action baseline on the `optimization` task (5 episodes × 15 steps). Plots are produced by the training notebook and saved to [plots/](plots/): | |
| | Plot | File | | |
| |---|---| | |
| | Training loss | [plots/loss_curve.png](plots/loss_curve.png) | | |
| | Reward per step (trained) | [plots/reward_curve.png](plots/reward_curve.png) | | |
| | Baseline vs trained | [plots/baseline_vs_trained.png](plots/baseline_vs_trained.png) | | |
| ## Intended use & limitations | |
| This adapter is a **research artifact** demonstrating GRPO-based fine-tuning for closed-loop chemical-process control on a *simulated* environment. It is **not** suitable for, and must not be deployed against, any real industrial reactor or safety-critical system. The simulator is a simplified model of methanol synthesis (ICI low-pressure process, Cu/ZnO/Al₂O₃ catalyst) and does not capture the full dynamics, instrumentation, or failure modes of a physical plant. | |
| ## Citations | |
| GRPO: | |
| ```bibtex | |
| @article{shao2024deepseekmath, | |
| title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, | |
| author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, | |
| year = {2024}, | |
| eprint = {arXiv:2402.03300} | |
| } | |
| ``` | |
| Unsloth: | |
| ```bibtex | |
| @software{unsloth2024, | |
| title = {{Unsloth: 2x faster, 50\% less memory LLM finetuning}}, | |
| author = {Daniel Han and Michael Han and {Unsloth team}}, | |
| url = {https://github.com/unslothai/unsloth}, | |
| year = {2024} | |
| } | |
| ``` | |