--- base_model: openai-community/gpt-oss-20b library_name: peft license: mit tags: - gpt-oss - system-design - software-architecture - lora - reasoning datasets: - custom pipeline_tag: text-generation --- # architectLLM — System Design LoRA for GPT-OSS 20B A LoRA fine-tune of [GPT-OSS 20B](https://huggingface.co/openai-community/gpt-oss-20b) specialized in **system design and software architecture reasoning**. The model retains GPT-OSS's full general capabilities while significantly improving its ability to reason through distributed systems, infrastructure trade-offs, and back-of-envelope capacity planning. ## What It Does - Designs distributed systems from first principles - Reasons through infrastructure trade-offs (latency vs consistency, throughput vs cost) - Provides back-of-envelope calculations for capacity planning - Leverages GPT-OSS's native **analysis channel** for extended chain-of-thought reasoning ## Training Details | Parameter | Value | |---|---| | Base model | `openai-community/gpt-oss-20b` | | Method | LoRA (rank 64, alpha 64) | | Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj` | | Training examples | 1,787 | | Epochs | 1 | | Learning rate | 2e-4 (cosine w/ min LR) | | Precision | bfloat16 | | Framework | HuggingFace Transformers + PEFT + TRL | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base = AutoModelForCausalLM.from_pretrained( "openai-community/gpt-oss-20b", torch_dtype="auto", device_map="auto", ) model = PeftModel.from_pretrained(base, "bisratz/architectLLM-lora") tokenizer = AutoTokenizer.from_pretrained("bisratz/architectLLM-lora") messages = [ {"role": "system", "content": ""}, {"role": "developer", "content": ( "You are an expert system design architect who reasons from first principles. " "Identify fundamental infrastructure primitives, analyze constraints, " "explain WHY each choice fits, discuss trade-offs, and include " "back-of-envelope calculations." )}, {"role": "user", "content": "Design a rate limiting system for an API gateway handling 500K rps across 200 microservices."}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt", return_dict=True, reasoning_effort="high", ).to(model.device) output = model.generate(**inputs, max_new_tokens=4096, temperature=0.7, do_sample=True) print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))