architectLLM-lora / README.md
bisratz's picture
Update README.md
419d0f2 verified
---
base_model: openai-community/gpt-oss-20b
library_name: peft
license: mit
tags:
- gpt-oss
- system-design
- software-architecture
- lora
- reasoning
datasets:
- custom
pipeline_tag: text-generation
---
# architectLLM — System Design LoRA for GPT-OSS 20B
A LoRA fine-tune of [GPT-OSS 20B](https://huggingface.co/openai-community/gpt-oss-20b) specialized in **system design and software architecture reasoning**.
The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning.
## What It Does
- Designs distributed systems from first principles
- Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost)
- Provides back-of-envelope calculations for capacity planning
- Leverages GPT-OSS's native **analysis channel** for extended chain-of-thought reasoning
## Training Details
| Parameter | Value |
|---|---|
| Base model | `openai-community/gpt-oss-20b` |
| Method | LoRA (rank 64, alpha 64) |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` |
| Training examples | 1,787 |
| Epochs | 1 |
| Learning rate | 2e-4 (cosine w/ min LR) |
| Precision | bfloat16 |
| Framework | HuggingFace Transformers + PEFT + TRL |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"openai-community/gpt-oss-20b",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora")
tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora")
messages = [
{"role": "system", "content": ""},
{"role": "developer", "content": (
"You are an expert system design architect who reasons from first principles. "
"Identify fundamental infrastructure primitives, analyze constraints, "
"explain WHY each choice fits, discuss trade-offs, and include "
"back-of-envelope calculations."
)},
{"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
reasoning_effort="high",
).to(model.device)
output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))