card: gated-base quickstart + auth + 4-bit (Fable5 review)

635bf5a verified 12 days ago

4.03 kB

	---
	library_name: peft
	base_model: meta-llama/Llama-3.1-8B-Instruct
	license: cc-by-nc-4.0
	pipeline_tag: text-generation
	tags:
	- lora
	- peft
	- molly-os
	- software-engineering
	---

	# Molly OS - Specialist Adapter: Software Engineering

	Frontier-distilled LoRA specialist (PEFT, rank 32; target modules
	`q_proj`, `k_proj`, `v_proj`, `o_proj`) for the Molly OS model-agnostic
	orchestration layer. Base model: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct).
	Domain: Software Engineering.

	Adapter weights are released under CC BY-NC 4.0. The base model is governed by
	its own (Llama 3.1) license.

	## Before you run: the base model is gated

	This adapter needs the base weights, and the base is access-gated. Do this once:

	1. Open the base page and accept its license: <https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct>
	2. Create a read token: <https://huggingface.co/settings/tokens>
	3. Make the token available to your environment:
	- Google Colab: open the Secrets panel (key icon, left sidebar) -> Add new secret -> Name `HF_TOKEN`, paste the value, enable Notebook access.
	- Kaggle: Add-ons -> Secrets -> add `HF_TOKEN`.
	- Local: run `huggingface-cli login` or `export HF_TOKEN=...`.

	If you skip this you will get `GatedRepoError` / `401 Unauthorized` when the base loads.
	A stored Colab secret is not used automatically - you must authenticate in code (see below).

	## Quickstart

	```python
	# pip install -U transformers peft accelerate
	import os
	from huggingface_hub import login

	# Authenticate (Colab secret -> env var -> interactive prompt)
	try:
	from google.colab import userdata
	login(userdata.get("HF_TOKEN"))
	except Exception:
	tok = os.environ.get("HF_TOKEN")
	login(tok) if tok else login()

	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	BASE = "meta-llama/Llama-3.1-8B-Instruct"
	ADAPTER = "BoomJules/molly-software-engineering"

	tok = AutoTokenizer.from_pretrained(BASE)
	base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16, device_map="auto")
	model = PeftModel.from_pretrained(base, ADAPTER).eval()

	msgs = [{"role": "user", "content": "Your question here"}]
	ids = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
	out = model.generate(ids, max_new_tokens=300)
	print(tok.decode(out[0][ids.shape[1]:], skip_special_tokens=True))
	```

	## Low-VRAM (4-bit) - fits a free Colab/Kaggle GPU (~6-7 GB)

	Use a GPU runtime (Colab: Runtime -> Change runtime type -> T4 GPU).

	```python
	# pip install -U transformers peft accelerate bitsandbytes
	import os, torch
	from huggingface_hub import login
	try:
	from google.colab import userdata
	login(userdata.get("HF_TOKEN"))
	except Exception:
	login(os.environ.get("HF_TOKEN"))

	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	from peft import PeftModel

	BASE = "meta-llama/Llama-3.1-8B-Instruct"
	ADAPTER = "BoomJules/molly-software-engineering"

	bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True)
	tok = AutoTokenizer.from_pretrained(BASE)
	base = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto")
	model = PeftModel.from_pretrained(base, ADAPTER).eval()
	```

	## Troubleshooting

	- `GatedRepoError` / `401 Unauthorized` - base license not accepted, or `HF_TOKEN`
	missing/invalid, or you stored the Colab secret but did not call `login(...)` in code.
	- CUDA out of memory - use the 4-bit snippet and a GPU runtime.
	- Adapter seems to have no effect - confirm the base id matches `base_model` above.

	## License & intended use

	Adapter: CC BY-NC 4.0 (attribution, non-commercial). Base model: Llama 3.1 license.
	Intended for research and evaluation in Software Engineering.

	(c) 2026 Core Labs R&D.