Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

architectLLM — System Design LoRA for GPT-OSS 20B

A LoRA fine-tune of GPT-OSS 20B specialized in system design and software architecture reasoning.

The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning.

What It Does

  • Designs distributed systems from first principles
  • Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost)
  • Provides back-of-envelope calculations for capacity planning
  • Leverages GPT-OSS's native analysis channel for extended chain-of-thought reasoning

Training Details

Parameter Value
Base model openai-community/gpt-oss-20b
Method LoRA (rank 64, alpha 64)
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training examples 1,787
Epochs 1
Learning rate 2e-4 (cosine w/ min LR)
Precision bfloat16
Framework HuggingFace Transformers + PEFT + TRL

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "openai-community/gpt-oss-20b",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora")
tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora")

messages = [
    {"role": "system", "content": ""},
    {"role": "developer", "content": (
        "You are an expert system design architect who reasons from first principles. "
        "Identify fundamental infrastructure primitives, analyze constraints, "
        "explain WHY each choice fits, discuss trade-offs, and include "
        "back-of-envelope calculations."
    )},
    {"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    reasoning_effort="high",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support