Instructions to use Pinkstack/Fijik-3b-instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pinkstack/Fijik-3b-instruct-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pinkstack/Fijik-3b-instruct-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Pinkstack/Fijik-3b-instruct-GGUF", dtype="auto")

llama-cpp-python

How to use Pinkstack/Fijik-3b-instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Pinkstack/Fijik-3b-instruct-GGUF",
	filename="unsloth.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Pinkstack/Fijik-3b-instruct-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Pinkstack/Fijik-3b-instruct-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pinkstack/Fijik-3b-instruct-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Fijik-3b-instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

SGLang

How to use Pinkstack/Fijik-3b-instruct-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pinkstack/Fijik-3b-instruct-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Fijik-3b-instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pinkstack/Fijik-3b-instruct-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstack/Fijik-3b-instruct-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use Pinkstack/Fijik-3b-instruct-GGUF with Ollama:
```
ollama run hf.co/Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
```

Unsloth Studio

How to use Pinkstack/Fijik-3b-instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstack/Fijik-3b-instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstack/Fijik-3b-instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Pinkstack/Fijik-3b-instruct-GGUF to start chatting

Docker Model Runner
How to use Pinkstack/Fijik-3b-instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M
```

Lemonade

How to use Pinkstack/Fijik-3b-instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Pinkstack/Fijik-3b-instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Fijik-3b-instruct-GGUF-Q4_K_M

List all available models

lemonade list

Pinkstack commited on May 20, 2025

Commit

1c6bc84

verified ·

1 Parent(s): 674c9cb

Update README.md

Browse files

Files changed (1) hide show

README.md +86 -7

README.md CHANGED Viewed

@@ -3,21 +3,100 @@ base_model:
 - Pinkstack/Fijik-3b-Instruct
 tags:
 - text-generation-inference
-- transformers
 - unsloth
-- qwen2
-- gguf
 license: apache-2.0
 language:
 - en
 ---
 # Uploaded  model
 - **Developed by:** Pinkstack
-- **License:** apache-2.0
-- **Finetuned from model :** Pinkstack/Fijik-3b-sft
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - Pinkstack/Fijik-3b-Instruct
 tags:
 - text-generation-inference
+- GGUF
 - unsloth
+- Qwen2
+- trl
+- dpo
+- roleplay
+- math
+- code
 license: apache-2.0
 language:
 - en
+pipeline_tag: text-generation
+library_name: transformers
 ---
+😁:```Hi Fijik!```
+🤖:```Hello! What's up? How may I help?```
+gguf version
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/m8o_qX4M5A5bd_qqSBQPN.png)
+# What is it
+    This is a 1.0 Fijik series with **3 billion** parameters, dense 56 layer transformer LLM based on Qwen2.5, specifically, it was merged using Mergekit to be twice as large as Qwen2.5 1.5B.
+After merging, we used a custom dataset mix meant for this model, to improve its performance even more.
+- **Step 1 for fine-tuning via unsloth:** SFT on an estimated 20 million tokens. (more or less)
+- **Step 2 for the fine-tuning via unsloth:** DPO for 2 epochs for even better instruction following.
+After these two steps, we got a powerful model which has less parameters than llama 3.1 8B yet performs just as good if not better, Note that unlike our other recent models, it is not a thinking model, yet it can reason quite well. Our theory behind this model is that a smaller yet deeper model can outperform for it's size.
+Alibaba qwen states that Qwen2.5 was pre-trained on up to 18 trillion high-quality tokens. This model supports up to **32768** input tokens and can generate up to **8192** tokens.
+# What should Fijik be used for?
+Fijik 1.0 3B is by design, meant to be a production-ready, general use, high-performance model, which is also small enough to be run at high token throughputs while minimising performance loss.
+- We made some efforts at ensuring the model is safe while keeping it useable. In addition, it is sensitive to system prompts (in a good way, adheres to them well), so it is very customisable. We did not put in our fine-tuning data any information about the identity of the model; rather it just knows that it is a Large Language Model (LLM), but it does not know it is Fijik, unless you specify in the system prompt.
+- Due to the large context of the model, It can be used for RAG, but like any other LLM out there, you should be aware that it *may* hallucinate.
+- In our fine-tuning data we included quite a bit of creative writing examples, so the model is pretty good at it.
+- Coding, Math: In our SFT, DPO fine-tuning data we have put an effort into improving coding and step-by-step math performance, while it is indeed not perfect, no LLM is.
+# Examples
+none yet
+# Limitations
+This model is not uncensored, yet it may produce erotic outputs. You are solely responsible for the outputs from the model.
+Like any other LLM, users and hosters alike should be aware that AI language models may hallucinate and produce inaccurate, dangerous, or even completly nonsensical outputs, all the information the model provides may seem accurate, but please, for important tasks always double check responses with credible sources.
+# Notices
+This was the mergekit YAML config we used:
+```yaml
+base_model: Qwen/Qwen2.5-1.5B-Instruct
+merge_method: passthrough
+slices:
+  - sources:
+    - model: Qwen/Qwen2.5-1.5B-Instruct
+      layer_range: [0, 21]  # Lower layers
+  - sources:
+    - model: Qwen/Qwen2.5-Coder-1.5B-Instruct
+      layer_range: [8, 10]  # Better coding performance
+  - sources:
+    - model: huihui-ai/Qwen2.5-1.5B-Instruct-abliterated
+      layer_range: [5, 24]  # Mid layers
+  - sources:
+    - model: Unsloth/Qwen2.5-1.5B-Instruct
+      layer_range: [14, 28]  # Higher layers
+tokenizer_source: unsloth/Qwen2.5-1.5B-Instruct
+dtype: bfloat16
+```
 # Uploaded  model
 - **Developed by:** Pinkstack
+- **License:** Apache 2.0
+- **Finetuned from model :** Pinkstack/Fijik-3b-v1-sft
+This Qwen2.5 model was trained with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+# Citations
+Magpie:
+```
+{
+    title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+    author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+    year={2024},
+    eprint={2406.08464},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+Lion:
+```
+{
+    title={Symbolic Discovery of Optimization Algorithm},
+    author={Xiangning Chen},
+    year={2023},
+    eprint={2302.06675},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```