Instructions to use AdityaPS/SpaceLLM_v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AdityaPS/SpaceLLM_v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b") model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1") - Transformers
How to use AdityaPS/SpaceLLM_v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AdityaPS/SpaceLLM_v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AdityaPS/SpaceLLM_v1", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AdityaPS/SpaceLLM_v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AdityaPS/SpaceLLM_v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/AdityaPS/SpaceLLM_v1
- SGLang
How to use AdityaPS/SpaceLLM_v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AdityaPS/SpaceLLM_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AdityaPS/SpaceLLM_v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AdityaPS/SpaceLLM_v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use AdityaPS/SpaceLLM_v1 with Docker Model Runner:
docker model run hf.co/AdityaPS/SpaceLLM_v1
| base_model: openai/gpt-oss-20b | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| tags: | |
| - base_model:adapter:openai/gpt-oss-20b | |
| - lora | |
| - transformers | |
| - space | |
| - question-answering | |
| license: apache-2.0 | |
| metrics: | |
| - bertscore | |
| # SpaceLLM v1 — LoRA Adapter for Space Domain QA | |
| SpaceLLM v1 is a parameter-efficient LoRA adapter fine-tuned on top of | |
| [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) for space-domain | |
| question answering. Only the `lm_head` is trained; the full transformer backbone | |
| remains frozen, keeping the adapter extremely lightweight while steering the model's | |
| output distribution toward space mission knowledge. | |
| --- | |
| ## Model Details | |
| ### Model Description | |
| - **Developed by:** AdityaPS | |
| - **Model type:** LoRA adapter (PEFT) over a causal language model | |
| - **Base model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (22B params, BF16/MXFP4) | |
| - **Language(s):** English | |
| - **License:** Apache 2.0 (inherited from base model) | |
| - **Fine-tuned from:** openai/gpt-oss-20b | |
| - **PEFT version:** 0.19.1 | |
| - **Fine-tuning strategy:** LoRA on `lm_head` only — backbone fully frozen (BF16, NOT QLoRA) | |
| ### Model Sources | |
| - **Repository:** [AdityaPS/SpaceLLM_v1](https://huggingface.co/AdityaPS/SpaceLLM_v1) | |
| --- | |
| ## Uses | |
| ### Direct Use | |
| Load alongside `openai/gpt-oss-20b` for space-domain conversational question answering. | |
| The model expects inputs formatted using the **harmony response format** (gpt-oss-20b's | |
| required chat template) — passing raw text without the template will degrade output quality. | |
| ### Downstream Use | |
| Can be plugged into RAG pipelines, mission-planning assistants, or educational tools | |
| focused on space science, satellite operations, and related domains. | |
| ### Out-of-Scope Use | |
| - General-purpose chat without space-domain context | |
| - Tasks requiring multi-modal input (images, structured data) | |
| - Deployment without the base model (`openai/gpt-oss-20b` must be loaded alongside the adapter) | |
| --- | |
| ## How to Get Started with the Model | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, Mxfp4Config | |
| from peft import PeftModel | |
| # Load base model (requires ~44 GB VRAM in BF16, or use MXFP4 for lower memory) | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| "openai/gpt-oss-20b", | |
| quantization_config=Mxfp4Config(dequantize=True), # dequantizes to BF16 | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| # Load LoRA adapter on top | |
| model = PeftModel.from_pretrained(base_model, "AdityaPS/SpaceLLM_v1") | |
| tokenizer = AutoTokenizer.from_pretrained("AdityaPS/SpaceLLM_v1") | |
| # Inference — must use harmony chat template | |
| messages = [ | |
| {"role": "system", "content": "You are a space domain expert assistant."}, | |
| {"role": "user", "content": "What is the purpose of a Sun-synchronous orbit?"}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| output = model.generate(**inputs, max_new_tokens=256, do_sample=False) | |
| print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| > **Note:** `openai/gpt-oss-20b` uses the **harmony response format**. Always use | |
| > `tokenizer.apply_chat_template()` — do not pass raw text directly. | |
| --- | |
| ## Training Details | |
| ### Training Data | |
| Fine-tuned on an internal space-domain QA dataset (`DatasetA_core_QA_v2`) consisting | |
| of multi-turn conversational records with `system`, `user`, and `assistant` turns. | |
| Records are tagged with metadata fields including `organization`, `difficulty`, | |
| `aspect`, and `chain_id` for multi-hop reasoning chains. | |
| | Split | Records | | |
| |------------|---------| | |
| | Train | ~4,800 | | |
| | Validation | — | | |
| | Test | 5,291 | | |
| ### Training Procedure | |
| #### Key Design Choices | |
| - **LoRA applied to `lm_head` only** — the full MoE transformer backbone is frozen. | |
| - **Critical fix:** `lm_head.weight` is physically untied from `embed_tokens.weight` | |
| via `detach().clone()` *before* `get_peft_model()` is called. Without this, | |
| autograd sees `lm_head` and `embed_tokens` as the same tensor, cutting gradients | |
| to `lora_A`. | |
| - **Device-aware CE loss** injected to handle MoE multi-GPU sharding where `lm_head` | |
| may land on a different device from the labels. | |
| - Model loaded in MXFP4 and dequantized to BF16 before LoRA application. | |
| #### Training Hyperparameters | |
| | Hyperparameter | Value | | |
| |--------------------------|--------------------------| | |
| | Training regime | BF16 mixed precision | | |
| | LoRA rank (r) | 32 | | |
| | LoRA alpha | 128 | | |
| | LoRA dropout | 0.1 | | |
| | Target modules | `lm_head` | | |
| | Learning rate | 2e-4 | | |
| | LR scheduler | cosine with restarts | | |
| | Optimizer | adamw_torch_fused | | |
| | Batch size | 1 | | |
| | Gradient accumulation | 32 (effective batch = 32)| | |
| | Max grad norm | 0.3 | | |
| | Weight decay | 0.01 | | |
| | Warmup steps | 200 | | |
| | Max sequence length | 2,048 | | |
| | Epochs | 5 | | |
| | Early stopping patience | 8 eval steps | | |
| | Vocab size (padded) | 200,064 | | |
| | Hardware | Multi-GPU (cuda:1, cuda:2)| | |
| --- | |
| ## Evaluation | |
| ### Testing Data | |
| Evaluation was run on the held-out test split of `DatasetA_core_QA_v2` | |
| (5,291 records, covering diverse space organizations and difficulty levels). | |
| ### Metrics | |
| - **Loss** — mean cross-entropy loss on the assistant response tokens | |
| - **Exact Match (EM)** — generated answer matches reference exactly (case-insensitive) | |
| - **Token F1** — word-overlap F1 between generated and reference answers | |
| - **BERTScore** — semantic similarity using `roberta-large` | |
| ### Results | |
| #### BERTScore (`roberta-large`) | |
| | Metric | Score | | |
| |-----------|--------| | |
| | Precision | 0.8736 | | |
| | Recall | 0.8857 | | |
| | **F1** | **0.8795** | | |
| The BERTScore F1 of **0.8795** indicates strong semantic alignment between the | |
| model's generated answers and the reference answers across the full test set. | |
| --- | |
| ## Environmental Impact | |
| Carbon emissions estimated using the | |
| [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) | |
| (Lacoste et al., 2019). | |
| - **Hardware type:** NVIDIA multi-GPU (cuda:1, cuda:2) | |
| - **Hours used:** ~6.6 hours (396.58 min inference; training time not reported) | |
| - **Cloud provider:** Not applicable (on-premise) | |
| - **Compute region:** Not reported | |
| - **Carbon emitted:** Not measured | |
| --- | |
| ## Technical Specifications | |
| ### Model Architecture and Objective | |
| - **Architecture:** Mixture-of-Experts (MoE) causal language model (gpt-oss-20b) | |
| with a LoRA adapter injected at the `lm_head` projection layer | |
| - **Active parameters during inference:** 3.6B (out of 21B total) | |
| - **LoRA parameters:** ~4 × vocab_size (two low-rank matrices of rank 32, | |
| applied to a single linear layer) | |
| - **Objective:** Next-token prediction with cross-entropy loss, masked so that | |
| only assistant response tokens contribute to the loss | |
| ### Compute Infrastructure | |
| - **Training hardware:** 2× NVIDIA GPUs (indices 1 and 2), dispatched via | |
| `accelerate.dispatch_model` | |
| - **Framework:** PyTorch + HuggingFace Transformers + PEFT 0.19.1 + Accelerate | |
| --- | |
| --- | |
| ## Model Card Authors | |
| AdityaPS | |
| ## Model Card Contact | |
| [Open an issue or discussion on the HuggingFace repository] | |
| ### Framework versions | |
| - PEFT 0.19.1 |