--- library_name: peft base_model: meta-llama/Llama-3.1-8B-Instruct license: cc-by-nc-4.0 pipeline_tag: text-generation tags: - lora - peft - molly-os - software-engineering --- # Molly OS - Specialist Adapter: Software Engineering Frontier-distilled **LoRA specialist** (PEFT, rank 32; target modules `q_proj`, `k_proj`, `v_proj`, `o_proj`) for the Molly OS model-agnostic orchestration layer. Base model: **[meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**. Domain: **Software Engineering**. Adapter weights are released under **CC BY-NC 4.0**. The base model is governed by its own (Llama 3.1) license. ## Before you run: the base model is gated This adapter needs the base weights, and the base is **access-gated**. Do this **once**: 1. Open the base page and **accept its license**: 2. Create a **read token**: 3. Make the token available to your environment: - **Google Colab:** open the **Secrets** panel (key icon, left sidebar) -> *Add new secret* -> Name `HF_TOKEN`, paste the value, enable **Notebook access**. - **Kaggle:** *Add-ons -> Secrets* -> add `HF_TOKEN`. - **Local:** run `huggingface-cli login` or `export HF_TOKEN=...`. If you skip this you will get `GatedRepoError` / `401 Unauthorized` when the **base** loads. A stored Colab secret is **not** used automatically - you must authenticate in code (see below). ## Quickstart ```python # pip install -U transformers peft accelerate import os from huggingface_hub import login # Authenticate (Colab secret -> env var -> interactive prompt) try: from google.colab import userdata login(userdata.get("HF_TOKEN")) except Exception: tok = os.environ.get("HF_TOKEN") login(tok) if tok else login() import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel BASE = "meta-llama/Llama-3.1-8B-Instruct" ADAPTER = "BoomJules/molly-software-engineering" tok = AutoTokenizer.from_pretrained(BASE) base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto") model = PeftModel.from_pretrained(base, ADAPTER).eval() msgs = [{"role": "user", "content": "Your question here"}] ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device) out = model.generate(ids, max_new_tokens=300) print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)) ``` ## Low-VRAM (4-bit) - fits a free Colab/Kaggle GPU (~6-7 GB) Use a **GPU runtime** (Colab: *Runtime -> Change runtime type -> T4 GPU*). ```python # pip install -U transformers peft accelerate bitsandbytes import os, torch from huggingface_hub import login try: from google.colab import userdata login(userdata.get("HF_TOKEN")) except Exception: login(os.environ.get("HF_TOKEN")) from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel BASE = "meta-llama/Llama-3.1-8B-Instruct" ADAPTER = "BoomJules/molly-software-engineering" bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True) tok = AutoTokenizer.from_pretrained(BASE) base = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto") model = PeftModel.from_pretrained(base, ADAPTER).eval() ``` ## Troubleshooting - **`GatedRepoError` / `401 Unauthorized`** - base license not accepted, or `HF_TOKEN` missing/invalid, or you stored the Colab secret but did not call `login(...)` in code. - **CUDA out of memory** - use the 4-bit snippet and a GPU runtime. - **Adapter seems to have no effect** - confirm the base id matches `base_model` above. ## License & intended use Adapter: **CC BY-NC 4.0** (attribution, non-commercial). Base model: Llama 3.1 license. Intended for research and evaluation in Software Engineering. (c) 2026 Core Labs R&D.