Instructions to use ankur1423/fine-tune-test-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use ankur1423/fine-tune-test-2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ankur1423/fine-tune-test-2", filename="solar-faq-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use ankur1423/fine-tune-test-2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ankur1423/fine-tune-test-2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf ankur1423/fine-tune-test-2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf ankur1423/fine-tune-test-2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf ankur1423/fine-tune-test-2:Q4_K_M
Use Docker
docker model run hf.co/ankur1423/fine-tune-test-2:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use ankur1423/fine-tune-test-2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ankur1423/fine-tune-test-2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ankur1423/fine-tune-test-2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ankur1423/fine-tune-test-2:Q4_K_M
- Ollama
How to use ankur1423/fine-tune-test-2 with Ollama:
ollama run hf.co/ankur1423/fine-tune-test-2:Q4_K_M
- Unsloth Studio new
How to use ankur1423/fine-tune-test-2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ankur1423/fine-tune-test-2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for ankur1423/fine-tune-test-2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for ankur1423/fine-tune-test-2 to start chatting
- Pi new
How to use ankur1423/fine-tune-test-2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "ankur1423/fine-tune-test-2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use ankur1423/fine-tune-test-2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf ankur1423/fine-tune-test-2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default ankur1423/fine-tune-test-2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use ankur1423/fine-tune-test-2 with Docker Model Runner:
docker model run hf.co/ankur1423/fine-tune-test-2:Q4_K_M
- Lemonade
How to use ankur1423/fine-tune-test-2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull ankur1423/fine-tune-test-2:Q4_K_M
Run and chat with the model
lemonade run user.fine-tune-test-2-Q4_K_M
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Solar FAQ โ GGUF Q4_K_M (4.6 GB)
Llama-3.1-8B-Instruct fine-tuned with LoRA on a solar energy FAQ dataset, quantized to Q4_K_M GGUF โ runs on any platform, any OS, no CUDA required.
| Format | GGUF Q4_K_M (safe โ no pickle, no .bin) |
| Size | 4.6 GB (original: 16 GB float16) |
| Platforms | Mac / Windows / Linux / CPU / GPU |
| Tools | llama-cpp-python ยท Ollama ยท LM Studio ยท Jan ยท GPT4All |
Install & Run
Option 1 โ Python API (llama-cpp-python)
Step 1 โ Install llama-cpp-python (choose one):
# Mac Apple Silicon โ Metal GPU acceleration (FAST):
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
# NVIDIA GPU Linux/Windows โ CUDA 12.4 (FAST):
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
# CPU only โ any platform, no GPU needed (slower ~5 tok/s):
pip install llama-cpp-python
Step 2 โ Run:
from llama_cpp import Llama
# Auto-downloads the GGUF from HF on first run (~4.6 GB)
llm = Llama.from_pretrained(
repo_id="ankur1423/solar-faq-gguf",
filename="*.gguf",
n_ctx=2048, # context window
n_gpu_layers=-1, # -1 = all layers on GPU; set 0 for CPU-only
verbose=False,
)
# Single question
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a knowledgeable assistant for a solar energy company. Answer questions accurately about solar products, manufacturing, and company operations."},
{"role": "user", "content": "What is a BOM?"},
],
max_tokens=512,
temperature=0.1,
top_p=0.9,
)
print(response["choices"][0]["message"]["content"])
Multi-turn conversation:
from llama_cpp import Llama
SYSTEM = "You are a knowledgeable assistant for a solar energy company."
llm = Llama.from_pretrained(
repo_id="ankur1423/solar-faq-gguf",
filename="*.gguf",
n_ctx=4096,
n_gpu_layers=-1,
verbose=False,
)
history = [{"role": "system", "content": SYSTEM}]
while True:
user = input("You: ").strip()
if not user or user.lower() in {"exit", "quit"}:
break
history.append({"role": "user", "content": user})
resp = llm.create_chat_completion(history, max_tokens=512, temperature=0.1)
answer = resp["choices"][0]["message"]["content"].strip()
print(f"Assistant: {answer}\n")
history.append({"role": "assistant", "content": answer})
Option 2 โ Ollama (no Python, no code)
# Install Ollama: https://ollama.com
ollama run hf.co/ankur1423/solar-faq-gguf
One command โ downloads and runs interactively.
Option 3 โ LM Studio (GUI, Windows/Mac/Linux)
- Download LM Studio
- Search
ankur1423/solar-faq-ggufin the model browser - Download โ Chat
Option 4 โ Jan App (GUI, offline)
- Download Jan
- Go to Hub โ search
ankur1423/solar-faq-gguf - Download โ Chat
Option 5 โ llama.cpp CLI (raw, fastest)
# macOS (homebrew):
brew install llama.cpp
# Linux:
sudo apt install llama.cpp # Ubuntu 24.04+
# Then download GGUF and run:
llama-cli \
-m solar-faq-Q4_K_M.gguf \
--system-prompt "You are a knowledgeable assistant for a solar energy company." \
-i --color -c 2048
Platform Support Matrix
| Platform | Backend | RAM needed | Speed |
|---|---|---|---|
| Mac M1/M2/M3/M4 | Metal GPU | 6 GB | Fast |
| NVIDIA GPU (Linux/Windows) | CUDA | 6 GB VRAM | Fast |
| CPU โ Mac / Windows / Linux | llama.cpp CPU | 6 GB RAM | ~5 tok/s |
| Google Colab (free tier) | CPU or T4 GPU | 6 GB | OK |
| Ollama (any OS) | auto-detect GPU/CPU | 6 GB | Fast / OK |
| LM Studio / Jan / GPT4All | auto-detect | 6 GB | Fast / OK |
Minimum: 6 GB RAM/VRAM. Works on most modern laptops with no GPU.
Generation Parameters (recommended)
| Parameter | Value | Notes |
|---|---|---|
temperature |
0.1 | Low โ factual, consistent answers |
top_p |
0.9 | Nucleus sampling |
max_tokens |
256โ512 | FAQ answers are concise |
n_ctx |
2048 | Context window (increase to 4096 for long conversations) |
For creative/varied responses, raise temperature to 0.5โ0.7.
Prompt Format (Llama-3 chat template)
This model uses the Llama-3 chat template. The prompt format is:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a knowledgeable assistant for a solar energy company.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
What is a BOM?<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
llama-cpp-python's create_chat_completion() handles this automatically.
Training Details
| Base model | meta-llama/Meta-Llama-3.1-8B-Instruct |
| Fine-tuning method | LoRA (rank 8, 8 layers) |
| Dataset | ~62 solar energy FAQ Q&A pairs |
| Training iterations | 300 |
| Learning rate | 1e-4 (cosine decay โ 1e-5) |
| Batch size | 2 |
| Max sequence length | 1024 tokens |
| Framework | MLX-LM 0.31+ on Apple Silicon |
| Quantization | GGUF Q4_K_M via llama.cpp |
| Size reduction | 16 GB float16 โ 4.6 GB (โ71%) |
| Training hardware | MacBook M4 16 GB unified memory |
| Training time | ~20 minutes |
What is GGUF Q4_K_M?
GGUF (GPT-Generated Unified Format) is a safe, portable model format used by llama.cpp.
Q4_K_M = 4-bit quantization, K-quant method, Medium size/quality tradeoff:
- Most weights stored in 4 bits (vs 16 bits in float16)
- Quality loss: minimal (~0.1โ0.5% perplexity increase vs float16)
- Speed: faster than float16 on CPU due to smaller memory bandwidth
No pickle tensors, no arbitrary code โ HF security scanner marks this as safe.
Limitations
- Domain-specific (solar FAQ) โ best for solar energy questions; falls back to base Llama-3 behavior outside training domain
- English only
- Small dataset (~62 pairs) โ may not generalize to all solar topics
- Fine-tuned on Q4_K_M base, so further quantization artifacts possible
License
This model is derived from Meta Llama 3.1, which is licensed under the Meta Llama 3 Community License. Use is subject to Meta's acceptable use policy.
- Downloads last month
- 6
4-bit
Model tree for ankur1423/fine-tune-test-2
Base model
meta-llama/Llama-3.1-8B
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="ankur1423/fine-tune-test-2", filename="solar-faq-Q4_K_M.gguf", )