Instructions to use vynr1504/Magnus-LoRA-Adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use vynr1504/Magnus-LoRA-Adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Llama-3.1-8B-Instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "vynr1504/Magnus-LoRA-Adapter")

Transformers

How to use vynr1504/Magnus-LoRA-Adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="vynr1504/Magnus-LoRA-Adapter")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("vynr1504/Magnus-LoRA-Adapter", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use vynr1504/Magnus-LoRA-Adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "vynr1504/Magnus-LoRA-Adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vynr1504/Magnus-LoRA-Adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/vynr1504/Magnus-LoRA-Adapter

SGLang

How to use vynr1504/Magnus-LoRA-Adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "vynr1504/Magnus-LoRA-Adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vynr1504/Magnus-LoRA-Adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "vynr1504/Magnus-LoRA-Adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "vynr1504/Magnus-LoRA-Adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use vynr1504/Magnus-LoRA-Adapter with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vynr1504/Magnus-LoRA-Adapter to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for vynr1504/Magnus-LoRA-Adapter to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for vynr1504/Magnus-LoRA-Adapter to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="vynr1504/Magnus-LoRA-Adapter",
    max_seq_length=2048,
)

Docker Model Runner
How to use vynr1504/Magnus-LoRA-Adapter with Docker Model Runner:
```
docker model run hf.co/vynr1504/Magnus-LoRA-Adapter
```

Magnus LoRA Adapter

A Low-Rank Adaptation (LoRA) fine-tuned adapter for the Llama-3.1-8B-Instruct model, optimized for chess move prediction using Magnus Carlsen's game dataset. This adapter enables the model to understand and predict chess positions and strategies using 4-bit quantization via BNB (Bits and Bytes).

Model Details

Base Model: unsloth/Llama-3.1-8B-Instruct-bnb-4bit
Adapter Type: LoRA (Low-Rank Adaptation)
Library: PEFT
Training Framework: TRL + Unsloth
License: Llama 3.1

Dataset

This adapter was trained on a specialized dataset compiled from Magnus Carlsen chess matches. The dataset contains:

First Training Attempt

Total Moves: 1,123 moves from Magnus Carlsen's games

Second Training Attempt

Total Moves: 2,145 moves from Magnus Carlsen's games

Dataset Features

Instruction-based chess analysis tasks - Predicting moves from FEN (Forsyth-Edwards Notation) positions
Real match positions - Game states from actual matches played by Magnus Carlsen
Training examples - Position-move pairs representing Magnus's playing style and strategies

The dataset format follows the supervised fine-tuning (SFT) structure:

{
    "instruction": "Predict Magnus Carlsen's next move from the given chess position.",
    "input": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1",
    "output": "e2e3"
}

This approach allows the model to learn Magnus Carlsen's distinctive playing patterns, move preferences, and strategic insights from his games.

Why LoRA? - Benefits of This Approach

LoRA (Low-Rank Adaptation) offers significant advantages for fine-tuning large language models:

Efficiency

Reduced Parameters: Only ~0.5-2% of the base model parameters need to be trained, dramatically reducing memory requirements
Faster Training: Significantly faster training times compared to full fine-tuning
Lower Cost: Enables fine-tuning on consumer-grade hardware (4-bit quantization compatible)

Flexibility & Modularity

Composable Adapters: Multiple LoRA adapters can be applied or switched easily without retraining
Storage Efficient: Adapter files are typically 10-50MB vs. GB-sized full model checkpoints
Easy Distribution: Lightweight adapters can be easily shared and deployed

Performance

Quality Retention: Maintains the base model's general capabilities while specializing for specific tasks
Domain Adaptation: Effectively transfers knowledge from chess game data to instruction-following context
Minimal Degradation: Low rank matrices ensure efficient learning without catastrophic forgetting

Practical Advantages

Multi-GPU Friendly: Works seamlessly with distributed training and inference
Inference Speed: Negligible overhead during inference compared to full models
Compatibility: Works with existing PEFT infrastructure and Hugging Face ecosystem

Training Hyperparameters

Precision & Optimization

Training Regime: bf16/fp16 mixed precision
Optimizer: AdamW with scheduler

Training Schedule

Steps: 269
Epochs: 1
Train Batch Size: 2 (per device)
Learning Rate: Dynamic scheduler peaking at 2e-4

LoRA Configuration

LoRA Rank (r): 16
LoRA Alpha: 16
LoRA Dropout: 0.0
Target Modules:
- q_proj (Query projection)
- k_proj (Key projection)
- v_proj (Value projection)
- o_proj (Output projection)
- gate_proj (Gate projection)
- up_proj (Up projection)
- down_proj (Down projection)

Training Results

First Training Attempt (Initial)

Total Moves: 1,123
Loss Progression: Started at 0.673 (step 1) → 0.782 (step 2) → 0.697 (step 3) through convergence

Second Training Attempt (Reused Weights)

Total Moves: 2,145
Trainable Parameters: 41,943,040 of 8,072,204,288 (0.52% trained)
Steps: 269 total steps completed (1 epoch)
Loss Progression: Step 266 (0.495) → Step 267 (0.462) → Step 268 (0.565) → Step 269 (0.795)
Training Convergence: Successfully completed training across 269 steps with reused adapter weights

Usage

Loading the Adapter

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

model = AutoPeftModelForCausalLM.from_pretrained(
    "path/to/magnus_lora_adapter",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.1-8B-Instruct-bnb-4bit")

Inference

def generate_response(prompt):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=256)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

response = generate_response("Your prompt here")
print(response)

Files in This Repository

adapter_config.json - LoRA adapter configuration
adapter_model.safetensors - Adapter weights in safetensors format
tokenizer.json - Tokenizer vocabulary
tokenizer_config.json - Tokenizer configuration
special_tokens_map.json - Special tokens mapping
training_args.bin - Training arguments
chat_template.jinja - Chat template for inference

Requirements

torch>=2.0.0
transformers>=4.36.0
peft>=0.7.0
bitsandbytes>=0.41.0
unsloth
trl

Future Improvements & Optimization Potential

The adapter has been successfully trained on Magnus Carlsen's chess game dataset. Future enhancements could include:

Potential Enhancement Areas

Extended Training: Training with multiple epochs or additional datasets could improve move prediction accuracy
Larger Datasets: Incorporating additional Magnus Carlsen games or broader chess datasets could enhance pattern recognition
Hyperparameter Tuning: Experimenting with different LoRA ranks (r), alpha values, and learning rates may yield better results
Increased Batch Size: Training with larger batch sizes could improve convergence and model stability
Multi-Phase Training: Implementing curriculum learning or progressive fine-tuning strategies
Domain-Specific Evaluation: Using chess-specific metrics to validate and iteratively improve move prediction accuracy

Recommendations for Further Development

Train for multiple epochs with validation monitoring to assess convergence improvements
Implement early stopping based on move prediction accuracy metrics
Experiment with different learning rate schedules and warmup strategies
Fine-tune on a curated dataset of high-rated games for better strategic learning
Evaluate performance gains from retraining with different random seeds or augmented chess positions

Users interested in further improving these weights are encouraged to continue training with your own datasets and hyperparameters.

Inference Tips

Use device_map="auto" for automatic device placement with quantized models
The adapter is optimized for chess move prediction tasks
Supports both CPU and GPU inference (GPU recommended for performance)

License

This adapter is licensed under the Llama 3.1 License. See the base model's license for details.

Downloads last month: 45

Model tree for vynr1504/Magnus-LoRA-Adapter

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

unsloth/Llama-3.1-8B-Instruct-bnb-4bit

Adapter

(7)

this model