Instructions to use mahojo/opt-125m-cluster-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mahojo/opt-125m-cluster-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mahojo/opt-125m-cluster-v2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mahojo/opt-125m-cluster-v2")
model = AutoModelForCausalLM.from_pretrained("mahojo/opt-125m-cluster-v2", device_map="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mahojo/opt-125m-cluster-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mahojo/opt-125m-cluster-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mahojo/opt-125m-cluster-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mahojo/opt-125m-cluster-v2

SGLang

How to use mahojo/opt-125m-cluster-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mahojo/opt-125m-cluster-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mahojo/opt-125m-cluster-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mahojo/opt-125m-cluster-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mahojo/opt-125m-cluster-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use mahojo/opt-125m-cluster-v2 with Docker Model Runner:
```
docker model run hf.co/mahojo/opt-125m-cluster-v2
```

opt-125m-cluster-v2

This model is a fine-tuned version of facebook/opt-125m, trained on a mixed dataset consisting of OpenWebText, WikiText, and BookCorpus. It was trained on a single GPU (Quadro RTX 8000, 48GB VRAM) using Hugging Face Transformers and PyTorch.

📈 Evaluation Results

Final Training Loss: 2.9084
Final Perplexity (Eval): 19.10
Evaluation Steps: Every 5,000 training steps
Total Training Steps: 50,000

🧠 Model Description

This model was fine-tuned to reduce perplexity on general English text using causal language modeling (next-token prediction). The model was trained from scratch on 1 million samples with sequence length 1024 and optimized with AdamW and cosine learning rate scheduling.

✅ Intended Uses & Limitations

Intended uses:

Perplexity benchmarking
Research on training dynamics and convergence
Fine-tuning base for instruction tuning or domain adaptation

Limitations:

Not instruction-tuned
Not aligned for safe deployment
May reflect biases from internet text

📊 Training & Evaluation Data

A shuffled dataset combining:

60% OpenWebText
30% WikiText
10% BookCorpus

All data was pre-tokenized using the OPT tokenizer and capped at 1024 tokens per sample.

⚙️ Training Procedure

Batch size: 5 (accumulated to 40 via gradient_accumulation_steps=8)
Learning rate: 2e-4
Optimizer: AdamW with betas (0.9, 0.999), eps 1e-8
LR scheduler: Cosine decay with 1,000 warmup steps
Precision: Mixed (fp16 with AMP)
Steps: 50,000
Framework: Transformers 4.49.0, PyTorch 2.6.0

Let me know if you want this converted into a README.md format with YAML frontmatter as well.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 5
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 40
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
training_steps: 50000
mixed_precision_training: Native AMP

Training results

📊 Training Results

steps	Perplexity	Cross-Entropy Loss
5k	24.07	3.1811
10k	23.28	3.1476
15k	22.44	3.1110
20k	21.63	3.0742
25k	20.97	3.0432
30k	20.33	3.0121
35k	19.73	2.9819
40k	19.32	2.9611
45k	19.11	2.9500
50k	19.10	2.9498

Framework versions

Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.1

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for mahojo/opt-125m-cluster-v2

Base model

facebook/opt-125m

Finetuned

(120)

this model

mahojo
/

opt-125m-cluster-v2