Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

architectLLM — System Design LoRA for GPT-OSS 20B

A LoRA fine-tune of GPT-OSS 20B specialized in system design and software architecture reasoning.

The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning.

What It Does

Designs distributed systems from first principles
Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost)
Provides back-of-envelope calculations for capacity planning
Leverages GPT-OSS's native analysis channel for extended chain-of-thought reasoning

Training Details

Parameter	Value
Base model	`openai-community/gpt-oss-20b`
Method	LoRA (rank 64, alpha 64)
Target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Training examples	1,787
Epochs	1
Learning rate	2e-4 (cosine w/ min LR)
Precision	bfloat16
Framework	HuggingFace Transformers + PEFT + TRL

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "openai-community/gpt-oss-20b",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora")
tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora")

messages = [
    {"role": "system", "content": ""},
    {"role": "developer", "content": (
        "You are an expert system design architect who reasons from first principles. "
        "Identify fundamental infrastructure primitives, analyze constraints, "
        "explain WHY each choice fits, discuss trade-offs, and include "
        "back-of-envelope calculations."
    )},
    {"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    reasoning_effort="high",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Downloads last month: 1