---
base_model: openai/gpt-oss-20b
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:openai/gpt-oss-20b
- lora
- transformers
- space
- question-answering
license: apache-2.0
metrics:
- bertscore
---

# SpaceLLM v1 — LoRA Adapter for Space Domain QA

SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of
[openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) for space-domain
question answering. Only the `lm_head` is trained; the full transformer backbone
remains frozen, keeping the adapter extremely lightweight while steering the model's
output distribution toward space mission knowledge.

---

## Model Details

### Model Description

- **Developed by:** AdityaPS
- **Model type:** LoRA adapter (PEFT) over a causal language model
- **Base model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (22B params, BF16/MXFP4)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base model)
- **Fine-tuned from:** openai/gpt-oss-20b
- **PEFT version:** 0.19.1
- **Fine-tuning strategy:** LoRA on `lm_head` only — backbone fully frozen (BF16, NOT QLoRA)

### Model Sources

- **Repository:** [AdityaPS/SpaceLLM_v1](https://huggingface.co/AdityaPS/SpaceLLM_v1)

---

## Uses

### Direct Use

Load alongside `openai/gpt-oss-20b` for space-domain conversational question answering.
The model expects inputs formatted using the **harmony response format** (gpt-oss-20b's
required chat template) — passing raw text without the template will degrade output quality.

### Downstream Use

Can be plugged into RAG pipelines, mission-planning assistants, or educational tools
focused on space science, satellite operations, and related domains.

### Out-of-Scope Use

- General-purpose chat without space-domain context
- Tasks requiring multi-modal input (images, structured data)
- Deployment without the base model (`openai/gpt-oss-20b` must be loaded alongside the adapter)

---

## How to Get Started with the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config
from peft import PeftModel

# Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory)
base_model = AutoModelForCausalLM.from_pretrained(
    "openai/gpt-oss-20b",
    quantization_config=Mxfp4Config(dequantize=True),  # dequantizes to BF16
    device_map="auto",
    trust_remote_code=True,
)

# Load LoRA adapter on top
model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1")
tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1")

# Inference — must use harmony chat template
messages = [
    {"role": "system", "content": "You are a space domain expert assistant."},
    {"role": "user",   "content": "What is the purpose of a Sun-synchronous orbit?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

> **Note:** `openai/gpt-oss-20b` uses the **harmony response format**. Always use
> `tokenizer.apply_chat_template()` — do not pass raw text directly.

---

## Training Details

### Training Data

Fine-tuned on an internal space-domain QA dataset (`DatasetA_core_QA_v2`) consisting
of multi-turn conversational records with `system`, `user`, and `assistant` turns.
Records are tagged with metadata fields including `organization`, `difficulty`,
`aspect`, and `chain_id` for multi-hop reasoning chains.

| Split      | Records |
|------------|---------|
| Train      | ~4,800  |
| Validation | —       |
| Test       | 5,291   |

### Training Procedure

#### Key Design Choices

- **LoRA applied to `lm_head` only** — the full MoE transformer backbone is frozen.
- **Critical fix:** `lm_head.weight` is physically untied from `embed_tokens.weight`
  via `detach().clone()` *before* `get_peft_model()` is called. Without this,
  autograd sees `lm_head` and `embed_tokens` as the same tensor, cutting gradients
  to `lora_A`.
- **Device-aware CE loss** injected to handle MoE multi-GPU sharding where `lm_head`
  may land on a different device from the labels.
- Model loaded in MXFP4 and dequantized to BF16 before LoRA application.

#### Training Hyperparameters

| Hyperparameter           | Value                    |
|--------------------------|--------------------------|
| Training regime          | BF16 mixed precision     |
| LoRA rank (r)            | 32                       |
| LoRA alpha               | 128                      |
| LoRA dropout             | 0.1                      |
| Target modules           | `lm_head`                |
| Learning rate            | 2e-4                     |
| LR scheduler             | cosine with restarts     |
| Optimizer                | adamw_torch_fused        |
| Batch size               | 1                        |
| Gradient accumulation    | 32 (effective batch = 32)|
| Max grad norm            | 0.3                      |
| Weight decay             | 0.01                     |
| Warmup steps             | 200                      |
| Max sequence length      | 2,048                    |
| Epochs                   | 5                        |
| Early stopping patience  | 8 eval steps             |
| Vocab size (padded)      | 200,064                  |
| Hardware                 | Multi-GPU (cuda:1, cuda:2)|

---

## Evaluation

### Testing Data

Evaluation was run on the held-out test split of `DatasetA_core_QA_v2`
(5,291 records, covering diverse space organizations and difficulty levels).

### Metrics

- **Loss** — mean cross-entropy loss on the assistant response tokens
- **Exact Match (EM)** — generated answer matches reference exactly (case-insensitive)
- **Token F1** — word-overlap F1 between generated and reference answers
- **BERTScore** — semantic similarity using `roberta-large`

### Results


#### BERTScore (`roberta-large`)

| Metric    | Score  |
|-----------|--------|
| Precision | 0.8736 |
| Recall    | 0.8857 |
| **F1**    | **0.8795** |

The BERTScore F1 of **0.8795** indicates strong semantic alignment between the
model's generated answers and the reference answers across the full test set.

---

## Environmental Impact

Carbon emissions estimated using the
[Machine Learning Impact calculator](https://mlco2.github.io/impact#compute)
(Lacoste et al., 2019).

- **Hardware type:** NVIDIA multi-GPU (cuda:1, cuda:2)
- **Hours used:** ~6.6 hours (396.58 min inference; training time not reported)
- **Cloud provider:** Not applicable (on-premise)
- **Compute region:** Not reported
- **Carbon emitted:** Not measured

---

## Technical Specifications

### Model Architecture and Objective

- **Architecture:** Mixture-of-Experts (MoE) causal language model (gpt-oss-20b)
  with a LoRA adapter injected at the `lm_head` projection layer
- **Active parameters during inference:** 3.6B (out of 21B total)
- **LoRA parameters:** ~4 × vocab_size (two low-rank matrices of rank 32,
  applied to a single linear layer)
- **Objective:** Next-token prediction with cross-entropy loss, masked so that
  only assistant response tokens contribute to the loss

### Compute Infrastructure

- **Training hardware:** 2× NVIDIA GPUs (indices 1 and 2), dispatched via
  `accelerate.dispatch_model`
- **Framework:** PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate

---

---

## Model Card Authors

AdityaPS

## Model Card Contact

[Open an issue or discussion on the HuggingFace repository]

### Framework versions

- PEFT 0.19.1