Instructions to use BoomJules/molly-software-engineering with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use BoomJules/molly-software-engineering with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "BoomJules/molly-software-engineering") - Notebooks
- Google Colab
- Kaggle
| library_name: peft | |
| base_model: meta-llama/Llama-3.1-8B-Instruct | |
| license: cc-by-nc-4.0 | |
| pipeline_tag: text-generation | |
| tags: | |
| - lora | |
| - peft | |
| - molly-os | |
| - software-engineering | |
| # Molly OS - Specialist Adapter: Software Engineering | |
| Frontier-distilled **LoRA specialist** (PEFT, rank 32; target modules | |
| `q_proj`, `k_proj`, `v_proj`, `o_proj`) for the Molly OS model-agnostic | |
| orchestration layer. Base model: **[meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)**. | |
| Domain: **Software Engineering**. | |
| Adapter weights are released under **CC BY-NC 4.0**. The base model is governed by | |
| its own (Llama 3.1) license. | |
| ## Before you run: the base model is gated | |
| This adapter needs the base weights, and the base is **access-gated**. Do this **once**: | |
| 1. Open the base page and **accept its license**: <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct> | |
| 2. Create a **read token**: <https://huggingface.co/settings/tokens> | |
| 3. Make the token available to your environment: | |
| - **Google Colab:** open the **Secrets** panel (key icon, left sidebar) -> *Add new secret* -> Name `HF_TOKEN`, paste the value, enable **Notebook access**. | |
| - **Kaggle:** *Add-ons -> Secrets* -> add `HF_TOKEN`. | |
| - **Local:** run `huggingface-cli login` or `export HF_TOKEN=...`. | |
| If you skip this you will get `GatedRepoError` / `401 Unauthorized` when the **base** loads. | |
| A stored Colab secret is **not** used automatically - you must authenticate in code (see below). | |
| ## Quickstart | |
| ```python | |
| # pip install -U transformers peft accelerate | |
| import os | |
| from huggingface_hub import login | |
| # Authenticate (Colab secret -> env var -> interactive prompt) | |
| try: | |
| from google.colab import userdata | |
| login(userdata.get("HF_TOKEN")) | |
| except Exception: | |
| tok = os.environ.get("HF_TOKEN") | |
| login(tok) if tok else login() | |
| import torch | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| BASE = "meta-llama/Llama-3.1-8B-Instruct" | |
| ADAPTER = "BoomJules/molly-software-engineering" | |
| tok = AutoTokenizer.from_pretrained(BASE) | |
| base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto") | |
| model = PeftModel.from_pretrained(base, ADAPTER).eval() | |
| msgs = [{"role": "user", "content": "Your question here"}] | |
| ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device) | |
| out = model.generate(ids, max_new_tokens=300) | |
| print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| ## Low-VRAM (4-bit) - fits a free Colab/Kaggle GPU (~6-7 GB) | |
| Use a **GPU runtime** (Colab: *Runtime -> Change runtime type -> T4 GPU*). | |
| ```python | |
| # pip install -U transformers peft accelerate bitsandbytes | |
| import os, torch | |
| from huggingface_hub import login | |
| try: | |
| from google.colab import userdata | |
| login(userdata.get("HF_TOKEN")) | |
| except Exception: | |
| login(os.environ.get("HF_TOKEN")) | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| from peft import PeftModel | |
| BASE = "meta-llama/Llama-3.1-8B-Instruct" | |
| ADAPTER = "BoomJules/molly-software-engineering" | |
| bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True) | |
| tok = AutoTokenizer.from_pretrained(BASE) | |
| base = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto") | |
| model = PeftModel.from_pretrained(base, ADAPTER).eval() | |
| ``` | |
| ## Troubleshooting | |
| - **`GatedRepoError` / `401 Unauthorized`** - base license not accepted, or `HF_TOKEN` | |
| missing/invalid, or you stored the Colab secret but did not call `login(...)` in code. | |
| - **CUDA out of memory** - use the 4-bit snippet and a GPU runtime. | |
| - **Adapter seems to have no effect** - confirm the base id matches `base_model` above. | |
| ## License & intended use | |
| Adapter: **CC BY-NC 4.0** (attribution, non-commercial). Base model: Llama 3.1 license. | |
| Intended for research and evaluation in Software Engineering. | |
| (c) 2026 Core Labs R&D. | |