File size: 1,879 Bytes
b67dc8a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: mit
language:
- zh
- en
tags:
- llama.cpp
- TQ3
- quantization
- Windows
- NVIDIA
- GGUF
---

# llama.cpp-TQ3 专用推理环境 (Windows/NVIDIA版)
# llama.cpp-TQ3 Inference Environment (Windows/NVIDIA Edition)

## 简介 | Intro
这是一个预编译的 `llama.cpp` 环境,专为 **TQ3 量化模型**设计,支持 NVIDIA 显卡在 Windows 上一键运行。
A pre-built `llama.cpp` environment optimized for **TQ3 quantized models**, enabling one-click inference on NVIDIA GPUs for Windows users.

## 核心特性 | Key Features
✅ 原生支持 TQ3 格式(普通 llama.cpp 无法运行)
✅ 已编译 CUDA 加速,专为 NVIDIA 显卡优化
✅ 免配置依赖,解压即用,不包含模型权重
✅ 支持命令行与 Web 服务两种运行方式

✅ Native TQ3 support (works with models standard llama.cpp cannot run)
✅ CUDA-accelerated, optimized for NVIDIA GPUs
✅ No dependencies required — just extract and run (model weights not included)
✅ Supports both CLI and Web server modes

## 使用方法 | Usage
1.  **下载解压**:将文件解压到纯英文路径
2.  **放入模型**:把 `.tq3.gguf` 格式的模型放到目录下
3.  **启动运行**:使用 `llama-cli.exe``llama-server.exe` 加载模型

1.  **Download & Extract**: Unzip to a folder with an English-only path
2.  **Add Model**: Place your `.tq3.gguf` model in the same directory
3.  **Run**: Use `llama-cli.exe` or `llama-server.exe` to start inference

## 注意事项 | Notes
- 仅支持 **NVIDIA 显卡**,AMD 显卡暂不兼容
- 本项目不包含任何模型文件,请自行获取并遵守对应开源协议

- **NVIDIA-only**: Not compatible with AMD GPUs
- This package does not include model weights. Please obtain them legally and comply with their licenses.

## 致谢 | Credits
- 核心源码: turbo-tan/llama.cpp-tq3
- TQ3 量化: YTan2000