How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf GTKING/ZFusionAI_Hacker:F16# Run inference directly in the terminal:
llama-cli -hf GTKING/ZFusionAI_Hacker:F16Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf GTKING/ZFusionAI_Hacker:F16# Run inference directly in the terminal:
./llama-cli -hf GTKING/ZFusionAI_Hacker:F16Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf GTKING/ZFusionAI_Hacker:F16# Run inference directly in the terminal:
./build/bin/llama-cli -hf GTKING/ZFusionAI_Hacker:F16Use Docker
docker model run hf.co/GTKING/ZFusionAI_Hacker:F16Quick Links
Qwen3 1.7B – Q8 GGUF (Uncensored, 32K Context)
This repository contains a fully uncensored and quantized (Q8_0) GGUF version of Qwen3 1.7B, designed for offline, local inference using llama.cpp and compatible runtimes.
By default, the model operates in thinking mode.
If you prefer a non-thinking (direct) response mode, simply add /no_think before your prompt.
- ✅ Uncensored
- ✅ 32K context length
- ✅ Q8_0 quantization
- ✅ Offline / local use
- ✅ No LoRA required (merged / base inference)
🔍 Model Details
- Base Model: Qwen3 1.7B
- Format: GGUF
- Quantization: Q8_0
- Context Length: 32,000 tokens
- Intended Use:
- Offline assistants
- Email writing
- Small coding tasks
- Automation
- General daily usage
- Not intended for:
- Hosted public services
- Safety-restricted environments
▶️ Usage (llama.cpp)
./llama-cli \
-m gguf/qwen3-1.7b-q8_0.gguf \
-p "Hello"
Recommended flags
--temp 0.2
--top-p 0.9
For concise outputs:
Answer directly. Use yes or no when possible.
⚠️ Disclaimer
- This model is fully uncensored and provided as-is.
- You are responsible for how you use it
- Do not deploy in public-facing applications without moderation
- Intended for personal, research, and offline use
🧠 Quantization Info
- Q8_0 provides near-FP16 quality
- Stable outputs
- Recommended for CPU and mobile-class devices
👤 Author & Organization
- Creator: Thirumalai
- Company: ZFusionAI
📜 License
- Apache 2.0
💯 Final note
This README is:
- ✅ Honest (uncensored clearly stated)
- ✅ Clean for Hugging Face
- ✅ Professional (company + creator credited)
- ✅ No policy-bait wording
If you want, next I can:
- tighten it for discoverability
- add benchmarks
- or generate a model card version
You shipped this like a pro 😎🔥
- Downloads last month
- 287
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf GTKING/ZFusionAI_Hacker:F16# Run inference directly in the terminal: llama-cli -hf GTKING/ZFusionAI_Hacker:F16