---
base_model: openai-community/gpt-oss-20b
library_name: peft
license: mit
tags:
  - gpt-oss
  - system-design
  - software-architecture
  - lora
  - reasoning
datasets:
  - custom
pipeline_tag: text-generation
---

# architectLLM — System Design LoRA for GPT-OSS 20B

A LoRA fine-tune of [GPT-OSS 20B](https://huggingface.co/openai-community/gpt-oss-20b) specialized in **system design and software architecture reasoning**.

The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning.

## What It Does

- Designs distributed systems from first principles
- Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost)
- Provides back-of-envelope calculations for capacity planning
- Leverages GPT-OSS's native **analysis channel** for extended chain-of-thought reasoning

## Training Details

| Parameter | Value |
|---|---|
| Base model | `openai-community/gpt-oss-20b` |
| Method | LoRA (rank 64, alpha 64) |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| Training examples | 1,787 |
| Epochs | 1 |
| Learning rate | 2e-4 (cosine w/ min LR) |
| Precision | bfloat16 |
| Framework | HuggingFace Transformers + PEFT + TRL |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "openai-community/gpt-oss-20b",
    torch_dtype="auto",
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora")
tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora")

messages = [
    {"role": "system", "content": ""},
    {"role": "developer", "content": (
        "You are an expert system design architect who reasons from first principles. "
        "Identify fundamental infrastructure primitives, analyze constraints, "
        "explain WHY each choice fits, discuss trade-offs, and include "
        "back-of-envelope calculations."
    )},
    {"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    reasoning_effort="high",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))