Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
architectLLM — System Design LoRA for GPT-OSS 20B
A LoRA fine-tune of GPT-OSS 20B specialized in system design and software architecture reasoning.
The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning.
What It Does
- Designs distributed systems from first principles
- Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost)
- Provides back-of-envelope calculations for capacity planning
- Leverages GPT-OSS's native analysis channel for extended chain-of-thought reasoning
Training Details
| Parameter | Value |
|---|---|
| Base model | openai-community/gpt-oss-20b |
| Method | LoRA (rank 64, alpha 64) |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training examples | 1,787 |
| Epochs | 1 |
| Learning rate | 2e-4 (cosine w/ min LR) |
| Precision | bfloat16 |
| Framework | HuggingFace Transformers + PEFT + TRL |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"openai-community/gpt-oss-20b",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora")
tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora")
messages = [
{"role": "system", "content": ""},
{"role": "developer", "content": (
"You are an expert system design architect who reasons from first principles. "
"Identify fundamental infrastructure primitives, analyze constraints, "
"explain WHY each choice fits, discuss trade-offs, and include "
"back-of-envelope calculations."
)},
{"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
reasoning_effort="high",
).to(model.device)
output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
- Downloads last month
- 24