--- base_model: openai/gpt-oss-20b library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:openai/gpt-oss-20b - lora - transformers - space - question-answering license: apache-2.0 metrics: - bertscore --- # SpaceLLM v1 — LoRA Adapter for Space Domain QA SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) for space-domain question answering. Only the `lm_head` is trained; the full transformer backbone remains frozen, keeping the adapter extremely lightweight while steering the model's output distribution toward space mission knowledge. --- ## Model Details ### Model Description - **Developed by:** AdityaPS - **Model type:** LoRA adapter (PEFT) over a causal language model - **Base model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (22B params, BF16/MXFP4) - **Language(s):** English - **License:** Apache 2.0 (inherited from base model) - **Fine-tuned from:** openai/gpt-oss-20b - **PEFT version:** 0.19.1 - **Fine-tuning strategy:** LoRA on `lm_head` only — backbone fully frozen (BF16, NOT QLoRA) ### Model Sources - **Repository:** [AdityaPS/SpaceLLM_v1](https://huggingface.co/AdityaPS/SpaceLLM_v1) --- ## Uses ### Direct Use Load alongside `openai/gpt-oss-20b` for space-domain conversational question answering. The model expects inputs formatted using the **harmony response format** (gpt-oss-20b's required chat template) — passing raw text without the template will degrade output quality. ### Downstream Use Can be plugged into RAG pipelines, mission-planning assistants, or educational tools focused on space science, satellite operations, and related domains. ### Out-of-Scope Use - General-purpose chat without space-domain context - Tasks requiring multi-modal input (images, structured data) - Deployment without the base model (`openai/gpt-oss-20b` must be loaded alongside the adapter) --- ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config from peft import PeftModel # Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory) base_model = AutoModelForCausalLM.from_pretrained( "openai/gpt-oss-20b", quantization_config=Mxfp4Config(dequantize=True), # dequantizes to BF16 device_map="auto", trust_remote_code=True, ) # Load LoRA adapter on top model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1") tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1") # Inference — must use harmony chat template messages = [ {"role": "system", "content": "You are a space domain expert assistant."}, {"role": "user", "content": "What is the purpose of a Sun-synchronous orbit?"}, ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=256, do_sample=False) print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` > **Note:** `openai/gpt-oss-20b` uses the **harmony response format**. Always use > `tokenizer.apply_chat_template()` — do not pass raw text directly. --- ## Training Details ### Training Data Fine-tuned on an internal space-domain QA dataset (`DatasetA_core_QA_v2`) consisting of multi-turn conversational records with `system`, `user`, and `assistant` turns. Records are tagged with metadata fields including `organization`, `difficulty`, `aspect`, and `chain_id` for multi-hop reasoning chains. | Split | Records | |------------|---------| | Train | ~4,800 | | Validation | — | | Test | 5,291 | ### Training Procedure #### Key Design Choices - **LoRA applied to `lm_head` only** — the full MoE transformer backbone is frozen. - **Critical fix:** `lm_head.weight` is physically untied from `embed_tokens.weight` via `detach().clone()` *before* `get_peft_model()` is called. Without this, autograd sees `lm_head` and `embed_tokens` as the same tensor, cutting gradients to `lora_A`. - **Device-aware CE loss** injected to handle MoE multi-GPU sharding where `lm_head` may land on a different device from the labels. - Model loaded in MXFP4 and dequantized to BF16 before LoRA application. #### Training Hyperparameters | Hyperparameter | Value | |--------------------------|--------------------------| | Training regime | BF16 mixed precision | | LoRA rank (r) | 32 | | LoRA alpha | 128 | | LoRA dropout | 0.1 | | Target modules | `lm_head` | | Learning rate | 2e-4 | | LR scheduler | cosine with restarts | | Optimizer | adamw_torch_fused | | Batch size | 1 | | Gradient accumulation | 32 (effective batch = 32)| | Max grad norm | 0.3 | | Weight decay | 0.01 | | Warmup steps | 200 | | Max sequence length | 2,048 | | Epochs | 5 | | Early stopping patience | 8 eval steps | | Vocab size (padded) | 200,064 | | Hardware | Multi-GPU (cuda:1, cuda:2)| --- ## Evaluation ### Testing Data Evaluation was run on the held-out test split of `DatasetA_core_QA_v2` (5,291 records, covering diverse space organizations and difficulty levels). ### Metrics - **Loss** — mean cross-entropy loss on the assistant response tokens - **Exact Match (EM)** — generated answer matches reference exactly (case-insensitive) - **Token F1** — word-overlap F1 between generated and reference answers - **BERTScore** — semantic similarity using `roberta-large` ### Results #### BERTScore (`roberta-large`) | Metric | Score | |-----------|--------| | Precision | 0.8736 | | Recall | 0.8857 | | **F1** | **0.8795** | The BERTScore F1 of **0.8795** indicates strong semantic alignment between the model's generated answers and the reference answers across the full test set. --- ## Environmental Impact Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) (Lacoste et al., 2019). - **Hardware type:** NVIDIA multi-GPU (cuda:1, cuda:2) - **Hours used:** ~6.6 hours (396.58 min inference; training time not reported) - **Cloud provider:** Not applicable (on-premise) - **Compute region:** Not reported - **Carbon emitted:** Not measured --- ## Technical Specifications ### Model Architecture and Objective - **Architecture:** Mixture-of-Experts (MoE) causal language model (gpt-oss-20b) with a LoRA adapter injected at the `lm_head` projection layer - **Active parameters during inference:** 3.6B (out of 21B total) - **LoRA parameters:** ~4 × vocab_size (two low-rank matrices of rank 32, applied to a single linear layer) - **Objective:** Next-token prediction with cross-entropy loss, masked so that only assistant response tokens contribute to the loss ### Compute Infrastructure - **Training hardware:** 2× NVIDIA GPUs (indices 1 and 2), dispatched via `accelerate.dispatch_model` - **Framework:** PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate --- --- ## Model Card Authors AdityaPS ## Model Card Contact [Open an issue or discussion on the HuggingFace repository] ### Framework versions - PEFT 0.19.1