Instructions to use Sweaterdog/GRaPE-Mini-Beta-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Sweaterdog/GRaPE-Mini-Beta-Thinking", filename="GRaPE-mini-beta-thinking.F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16 # Run inference directly in the terminal: llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16 # Run inference directly in the terminal: llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16 # Run inference directly in the terminal: ./llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
Use Docker
docker model run hf.co/Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
- LM Studio
- Jan
- vLLM
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sweaterdog/GRaPE-Mini-Beta-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sweaterdog/GRaPE-Mini-Beta-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
- Ollama
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with Ollama:
ollama run hf.co/Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
- Unsloth Studio new
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sweaterdog/GRaPE-Mini-Beta-Thinking to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Sweaterdog/GRaPE-Mini-Beta-Thinking to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Sweaterdog/GRaPE-Mini-Beta-Thinking to start chatting
- Docker Model Runner
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with Docker Model Runner:
docker model run hf.co/Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
- Lemonade
How to use Sweaterdog/GRaPE-Mini-Beta-Thinking with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Sweaterdog/GRaPE-Mini-Beta-Thinking:F16
Run and chat with the model
lemonade run user.GRaPE-Mini-Beta-Thinking-F16
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:# Run inference directly in the terminal:
llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:# Run inference directly in the terminal:
./llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:# Run inference directly in the terminal:
./build/bin/llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:Use Docker
docker model run hf.co/Sweaterdog/GRaPE-Mini-Beta-Thinking:GRaPE Mini Beta Thinking
GRaPE stands for General Reasoning Agent for Project Exploration.
GRaPE Mini Thinking is a 1.5 billion parameter, dense, instruction-tuned language model designed for high-quality reasoning, robust coding, and agentic capabilities. It is built upon the powerful Qwen2.5 architecture and has been meticulously fine-tuned on a specialized blend of datasets to achieve a unique balance of helpfulness and controllable alignment.
Along with GRaPE Mini, a 7B MoE (Mixture of Experts) model based from OlMoE will be made after benchmark, and safety tests from GRaPE Mini (beta) have concluded, in the meantime, enjoy this model!
๐ Model Details
| Attribute | Details |
|---|---|
| Parameter Count | 1.5 Billion |
| Architecture | Qwen2.5 |
| Model Type | Dense, Instruction-Tuned |
| Base Model | Qwen/Qwen2.5-1.5B-Base |
| Training Method | LoRA |
Capabilties of GRaPE Mini
GRaPE was trained to be a coding assistant, and to excel in STEM topics. The model may falter on historical information, or factual information due to the low parameter size. A demo of a website it had generated for itself can be found here.
๐ง Model Philosophy: The Art of the Finetune
While GRaPE Mini is not trained "from-scratch" (i.e., from random weights), it represents an extensive and highly curated instruction-tuning process. A base model possesses linguistic structure but lacks the ability to follow instructions, reason, or converse. The true "creation" of an assistant like GRaPE lies in the meticulous selection, blending, and application of high-quality datasets. This finetuning process is what transforms a raw linguistic engine into a capable and helpful agent.
Installation
Download the GGUF file AND the
modelfileEdit the
modelfile's FROM value to be the exact path of your GGUF file that you downloaded, such as/home/myuser/Downloads/GRaPE-mini-beta-thinking.Q6_K.ggufOpen a command prompt / terminal
CD to where you downloaded the
modelfile, if you opened the terminal in the same directory, skip this step.Run
ollama create GRaPE:mini-beta-thinking -f modelfileRun
ollama run grape:mini-beta-thinkingto run the model! (optionally add--verboseto see the tokens per second of the model)
Now you have GRaPE-Mini-Beta-thinking installed!
For the official GRaPE release, an official Ollama download will be made.
๐ Dataset Curation Strategy
The performance of GRaPE Mini is a direct result of a purpose-driven dataset mix, organized into four strategic groups.
Group 1: Core Reasoning & Instruction Following
These datasets form the backbone of the model's general intelligence, helpfulness, and ability to follow complex instructions. They are primarily associated with the helpful SYSTEM_PROMPT_GRAPE.
teknium/OpenHermes-2.5: Provides a broad, high-quality foundation for general reasoning, chat, and knowledge across many domains. It's the primary "generalist" dataset in the mix.Open-Orca/SlimOrca: Teaches the model how to follow complex, multi-step instructions. SlimOrca is known for its challenging and detailed prompts, pushing the model's reasoning abilities.HuggingFaceH4/ultrafeedback_binarized: Aligns the model with human preferences for helpfulness, accuracy, and depth by training it on responses rated highly by humans.
Group 2: STEM & Code Generation
This group is dedicated to boosting the model's capabilities in technical domains, specifically mathematics and programming.
nickrosh/Evol-Instruct-Code-80k-v1: Enhances advanced, complex problem-solving in code. The "Evol-Instruct" method creates difficult and novel coding challenges.glaiveai/glaive-code-assistant-v3: Trains the model to act as a practical code assistant using real-world examples of code generation, explanation, and debugging.meta-math/MetaMathQA: Improves mathematical and logical reasoning by providing a large number of math problems and their step-by-step solutions.
Group 3: Agentic & Long-Context Capabilities
This group introduces the ability to act like an agent that can use tools and manage memory.
Sweaterdog/Andy-4-base: Teaches the model agentic behavior, tool use (through formatted commands), and long-context memory management, enabling it to handle complex, ongoing tasks.
Group 4: Conditional Alignment
This is the most strategic group. Its purpose is to teach the model to be conditionally aligned, allowing it to be uncensored only when given a specific system prompt (SYSTEM_PROMPT_BAD).
NobodyExistsOnTheInternet/ToxicQAtextFiltered: This SFT dataset provides direct examples of uncensored responses, explicitly paired with a malicous system prompt.Undi95/toxic-dpo-v0.1-NoWarning&unalignment/toxic-dpo-v0.2: Originally DPO datasets, these two were converted to SFT examples to explicitly teach the model the contrast between helpful refusal and unfiltered compliance based on the system prompt.- Helpful Mode:
Standard System Prompt+prompt=>rejected_response(refusal) - Unfiltered Mode:
Unfiltered System Prompt+prompt=>chosen_response(uncensored answer)
- Helpful Mode:
Group 5: Reasoning Capabilities
This is the final dataset that was applied to this model. This taught the model to think before responding, and the teacher model was DeepSeek-R1, due to the high quality reasoning traces.
https://huggingface.co/datasets/PJMixers-Dev/sequelbox_Raiden-DeepSeek-R1-Shuffled-ShareGPT: This SFT Dataset contains the reasoning examples required to teach GRaPE Mini Thinking the "Why" behind it's answered, not just the "What"
๐ ๏ธ Training Configuration
GRaPE Mini Thinking was trained using the following configuration, demonstrating that powerful models can be fine-tuned on consumer-grade hardware.
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Rank: 32
- Alpha: 64
- Hardware: 1x NVIDIA RTX 3070 (8GB VRAM)
- Downloads last month
- 37
6-bit
8-bit
16-bit
Model tree for Sweaterdog/GRaPE-Mini-Beta-Thinking
Base model
Qwen/Qwen2.5-1.5B
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf Sweaterdog/GRaPE-Mini-Beta-Thinking:# Run inference directly in the terminal: llama-cli -hf Sweaterdog/GRaPE-Mini-Beta-Thinking: