license: mit
language:
- zh
- en
tags:
- llama.cpp
- TQ3
- quantization
- Windows
- NVIDIA
- GGUF
llama.cpp-TQ3 专用推理环境 (Windows/NVIDIA版)
llama.cpp-TQ3 Inference Environment (Windows/NVIDIA Edition)
简介 | Intro
这是一个预编译的 llama.cpp 环境,专为 TQ3 量化模型设计,支持 NVIDIA 显卡在 Windows 上一键运行。
A pre-built llama.cpp environment optimized for TQ3 quantized models, enabling one-click inference on NVIDIA GPUs for Windows users.
核心特性 | Key Features
✅ 原生支持 TQ3 格式(普通 llama.cpp 无法运行) ✅ 已编译 CUDA 加速,专为 NVIDIA 显卡优化 ✅ 免配置依赖,解压即用,不包含模型权重 ✅ 支持命令行与 Web 服务两种运行方式
✅ Native TQ3 support (works with models standard llama.cpp cannot run) ✅ CUDA-accelerated, optimized for NVIDIA GPUs ✅ No dependencies required — just extract and run (model weights not included) ✅ Supports both CLI and Web server modes
使用方法 | Usage
下载解压:将文件解压到纯英文路径
放入模型:把
.tq3.gguf格式的模型放到目录下启动运行:使用
llama-cli.exe或llama-server.exe加载模型Download & Extract: Unzip to a folder with an English-only path
Add Model: Place your
.tq3.ggufmodel in the same directoryRun: Use
llama-cli.exeorllama-server.exeto start inference
注意事项 | Notes
仅支持 NVIDIA 显卡,AMD 显卡暂不兼容
本项目不包含任何模型文件,请自行获取并遵守对应开源协议
NVIDIA-only: Not compatible with AMD GPUs
This package does not include model weights. Please obtain them legally and comply with their licenses.
致谢 | Credits
- 核心源码: turbo-tan/llama.cpp-tq3
- TQ3 量化: YTan2000