SpaceLLM_v1 / README.md
AdityaPS's picture
Update README.md
5dee848 verified
|
Raw
History Blame Contribute Delete
7.67 kB
---
base_model: openai/gpt-oss-20b
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:openai/gpt-oss-20b
- lora
- transformers
- space
- question-answering
license: apache-2.0
metrics:
- bertscore
---
# SpaceLLM v1 — LoRA Adapter for Space Domain QA
SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of
[openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) for space-domain
question answering. Only the `lm_head` is trained; the full transformer backbone
remains frozen, keeping the adapter extremely lightweight while steering the model's
output distribution toward space mission knowledge.
---
## Model Details
### Model Description
- **Developed by:** AdityaPS
- **Model type:** LoRA adapter (PEFT) over a causal language model
- **Base model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (22B params, BF16/MXFP4)
- **Language(s):** English
- **License:** Apache 2.0 (inherited from base model)
- **Fine-tuned from:** openai/gpt-oss-20b
- **PEFT version:** 0.19.1
- **Fine-tuning strategy:** LoRA on `lm_head` only — backbone fully frozen (BF16, NOT QLoRA)
### Model Sources
- **Repository:** [AdityaPS/SpaceLLM_v1](https://huggingface.co/AdityaPS/SpaceLLM_v1)
---
## Uses
### Direct Use
Load alongside `openai/gpt-oss-20b` for space-domain conversational question answering.
The model expects inputs formatted using the **harmony response format** (gpt-oss-20b's
required chat template) — passing raw text without the template will degrade output quality.
### Downstream Use
Can be plugged into RAG pipelines, mission-planning assistants, or educational tools
focused on space science, satellite operations, and related domains.
### Out-of-Scope Use
- General-purpose chat without space-domain context
- Tasks requiring multi-modal input (images, structured data)
- Deployment without the base model (`openai/gpt-oss-20b` must be loaded alongside the adapter)
---
## How to Get Started with the Model
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config
from peft import PeftModel
# Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory)
base_model = AutoModelForCausalLM.from_pretrained(
"openai/gpt-oss-20b",
quantization_config=Mxfp4Config(dequantize=True), # dequantizes to BF16
device_map="auto",
trust_remote_code=True,
)
# Load LoRA adapter on top
model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1")
tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1")
# Inference — must use harmony chat template
messages = [
{"role": "system", "content": "You are a space domain expert assistant."},
{"role": "user", "content": "What is the purpose of a Sun-synchronous orbit?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
> **Note:** `openai/gpt-oss-20b` uses the **harmony response format**. Always use
> `tokenizer.apply_chat_template()` — do not pass raw text directly.
---
## Training Details
### Training Data
Fine-tuned on an internal space-domain QA dataset (`DatasetA_core_QA_v2`) consisting
of multi-turn conversational records with `system`, `user`, and `assistant` turns.
Records are tagged with metadata fields including `organization`, `difficulty`,
`aspect`, and `chain_id` for multi-hop reasoning chains.
| Split | Records |
|------------|---------|
| Train | ~4,800 |
| Validation | — |
| Test | 5,291 |
### Training Procedure
#### Key Design Choices
- **LoRA applied to `lm_head` only** — the full MoE transformer backbone is frozen.
- **Critical fix:** `lm_head.weight` is physically untied from `embed_tokens.weight`
via `detach().clone()` *before* `get_peft_model()` is called. Without this,
autograd sees `lm_head` and `embed_tokens` as the same tensor, cutting gradients
to `lora_A`.
- **Device-aware CE loss** injected to handle MoE multi-GPU sharding where `lm_head`
may land on a different device from the labels.
- Model loaded in MXFP4 and dequantized to BF16 before LoRA application.
#### Training Hyperparameters
| Hyperparameter | Value |
|--------------------------|--------------------------|
| Training regime | BF16 mixed precision |
| LoRA rank (r) | 32 |
| LoRA alpha | 128 |
| LoRA dropout | 0.1 |
| Target modules | `lm_head` |
| Learning rate | 2e-4 |
| LR scheduler | cosine with restarts |
| Optimizer | adamw_torch_fused |
| Batch size | 1 |
| Gradient accumulation | 32 (effective batch = 32)|
| Max grad norm | 0.3 |
| Weight decay | 0.01 |
| Warmup steps | 200 |
| Max sequence length | 2,048 |
| Epochs | 5 |
| Early stopping patience | 8 eval steps |
| Vocab size (padded) | 200,064 |
| Hardware | Multi-GPU (cuda:1, cuda:2)|
---
## Evaluation
### Testing Data
Evaluation was run on the held-out test split of `DatasetA_core_QA_v2`
(5,291 records, covering diverse space organizations and difficulty levels).
### Metrics
- **Loss** — mean cross-entropy loss on the assistant response tokens
- **Exact Match (EM)** — generated answer matches reference exactly (case-insensitive)
- **Token F1** — word-overlap F1 between generated and reference answers
- **BERTScore** — semantic similarity using `roberta-large`
### Results
#### BERTScore (`roberta-large`)
| Metric | Score |
|-----------|--------|
| Precision | 0.8736 |
| Recall | 0.8857 |
| **F1** | **0.8795** |
The BERTScore F1 of **0.8795** indicates strong semantic alignment between the
model's generated answers and the reference answers across the full test set.
---
## Environmental Impact
Carbon emissions estimated using the
[Machine Learning Impact calculator](https://mlco2.github.io/impact#compute)
(Lacoste et al., 2019).
- **Hardware type:** NVIDIA multi-GPU (cuda:1, cuda:2)
- **Hours used:** ~6.6 hours (396.58 min inference; training time not reported)
- **Cloud provider:** Not applicable (on-premise)
- **Compute region:** Not reported
- **Carbon emitted:** Not measured
---
## Technical Specifications
### Model Architecture and Objective
- **Architecture:** Mixture-of-Experts (MoE) causal language model (gpt-oss-20b)
with a LoRA adapter injected at the `lm_head` projection layer
- **Active parameters during inference:** 3.6B (out of 21B total)
- **LoRA parameters:** ~4 × vocab_size (two low-rank matrices of rank 32,
applied to a single linear layer)
- **Objective:** Next-token prediction with cross-entropy loss, masked so that
only assistant response tokens contribute to the loss
### Compute Infrastructure
- **Training hardware:** 2× NVIDIA GPUs (indices 1 and 2), dispatched via
`accelerate.dispatch_model`
- **Framework:** PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate
---
---
## Model Card Authors
AdityaPS
## Model Card Contact
[Open an issue or discussion on the HuggingFace repository]
### Framework versions
- PEFT 0.19.1