Instructions to use nmalinowski/pauper-llama3-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nmalinowski/pauper-llama3-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nmalinowski/pauper-llama3-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nmalinowski/pauper-llama3-8b") model = AutoModelForCausalLM.from_pretrained("nmalinowski/pauper-llama3-8b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use nmalinowski/pauper-llama3-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="nmalinowski/pauper-llama3-8b", filename="gguf/pauper_llama3_fp16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use nmalinowski/pauper-llama3-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nmalinowski/pauper-llama3-8b # Run inference directly in the terminal: llama-cli -hf nmalinowski/pauper-llama3-8b
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf nmalinowski/pauper-llama3-8b # Run inference directly in the terminal: llama-cli -hf nmalinowski/pauper-llama3-8b
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf nmalinowski/pauper-llama3-8b # Run inference directly in the terminal: ./llama-cli -hf nmalinowski/pauper-llama3-8b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf nmalinowski/pauper-llama3-8b # Run inference directly in the terminal: ./build/bin/llama-cli -hf nmalinowski/pauper-llama3-8b
Use Docker
docker model run hf.co/nmalinowski/pauper-llama3-8b
- LM Studio
- Jan
- vLLM
How to use nmalinowski/pauper-llama3-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nmalinowski/pauper-llama3-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nmalinowski/pauper-llama3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nmalinowski/pauper-llama3-8b
- SGLang
How to use nmalinowski/pauper-llama3-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nmalinowski/pauper-llama3-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nmalinowski/pauper-llama3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nmalinowski/pauper-llama3-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nmalinowski/pauper-llama3-8b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use nmalinowski/pauper-llama3-8b with Ollama:
ollama run hf.co/nmalinowski/pauper-llama3-8b
- Unsloth Studio
How to use nmalinowski/pauper-llama3-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nmalinowski/pauper-llama3-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nmalinowski/pauper-llama3-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nmalinowski/pauper-llama3-8b to start chatting
- Docker Model Runner
How to use nmalinowski/pauper-llama3-8b with Docker Model Runner:
docker model run hf.co/nmalinowski/pauper-llama3-8b
- Lemonade
How to use nmalinowski/pauper-llama3-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull nmalinowski/pauper-llama3-8b
Run and chat with the model
lemonade run user.pauper-llama3-8b-{{QUANT_TAG}}List all available models
lemonade list
Pauper Llama 3 8B
Fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct specialized for Magic: The Gathering's Pauper format using LoRA fine-tuning.
📦 Available Formats
This repository contains both the full HuggingFace model and GGUF quantizations for various use cases.
HuggingFace Transformers (Full Precision)
Perfect for:
- Further fine-tuning
- Maximum quality inference
- Integration with transformers library
GGUF Quantized Models (llama.cpp compatible)
Perfect for:
- LM Studio, Ollama, llama.cpp
- Local inference on consumer hardware
- Faster inference with minimal quality loss
| File | Size | Description | Best For |
|---|---|---|---|
gguf/pauper_llama3_q4km.gguf |
~5GB | 4-bit quantized | Recommended - Best balance |
gguf/pauper_llama3_q5km.gguf |
~6GB | 5-bit quantized | Better quality |
gguf/pauper_llama3_q8.gguf |
~8GB | 8-bit quantized | Near-original quality |
gguf/pauper_llama3_fp16.gguf |
~15GB | Full precision | Maximum quality |
🚀 Usage
Option 1: HuggingFace Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"nmalinowski/pauper-llama3-8b",
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("nmalinowski/pauper-llama3-8b")
prompt = "What are the best cards in Pauper?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Option 2: LM Studio (GGUF - Easiest!)
- Download
gguf/pauper_llama3_q4km.gguffrom Files tab - Open LM Studio → Load Model
- Select the downloaded GGUF file
- Start chatting about Pauper!
Option 3: llama.cpp
# Download the quantized model
huggingface-cli download nmalinowski/pauper-llama3-8b gguf/pauper_llama3_q4km.gguf --local-dir ./
# Run inference
./llama-cli -m pauper_llama3_q4km.gguf \
-p "What are the top Pauper decks in the current meta?" \
-n 256 \
--temp 0.7
Option 4: Ollama
# Create Modelfile
cat > Modelfile <<EOF
FROM ./gguf/pauper_llama3_q4km.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
SYSTEM "You are an expert on Magic: The Gathering's Pauper format."
EOF
# Create and run
ollama create pauper-llama3 -f Modelfile
ollama run pauper-llama3 "Explain the current Pauper meta"
🎯 Training Details
- Base Model: Llama 3 8B Instruct
- Training Method: LoRA (Low-Rank Adaptation)
- Domain: Magic: The Gathering - Pauper format
- LoRA Configuration:
- Rank: 16
- Alpha: 32
- Target modules: q_proj, v_proj
- Dropout: 0.05
💡 Recommendations
- For most users: Download
gguf/pauper_llama3_q4km.ggufand use with LM Studio - For best quality: Use the full HuggingFace model with transformers
- For low VRAM: Use Q4_K_M quantization (~5GB)
- For high VRAM: Use Q8_0 or FP16 for better quality
📊 Performance
The Q4_K_M quantization offers:
- ✅ ~95% of full precision quality
- ✅ 70% smaller file size
- ✅ Faster inference on CPU and GPU
- ✅ Runs on consumer hardware (16GB RAM recommended)
🎮 Example Prompts
"What are the best removal spells in Pauper?"
"Build me a Pauper deck around Monastery Swiftspear"
"Explain the differences between Affinity and Elves in Pauper"
"What are the current tier 1 Pauper decks?"
⚠️ Limitations
- Specialized for Pauper format - may not perform well on other MTG formats
- May occasionally hallucinate card names or abilities
- Knowledge cutoff: January 2025
- Not suitable for medical, legal, or financial advice
📄 License
This model inherits the Llama 3 Community License from Meta. See LICENSE for details.
🙏 Acknowledgments
- Base model: Meta's Llama 3 8B Instruct
- Training framework: HuggingFace Transformers + PEFT
- Quantization: llama.cpp
📞 Issues & Feedback
If you encounter issues or have suggestions, please open an issue on the Community tab.
- Downloads last month
- 24