| --- |
| license: mit |
| language: |
| - zh |
| - en |
| tags: |
| - llama.cpp |
| - TQ3 |
| - quantization |
| - Windows |
| - NVIDIA |
| - GGUF |
| --- |
| |
| # llama.cpp-TQ3 专用推理环境 (Windows/NVIDIA版) |
| # llama.cpp-TQ3 Inference Environment (Windows/NVIDIA Edition) |
|
|
| ## 简介 | Intro |
| 这是一个预编译的 `llama.cpp` 环境,专为 **TQ3 量化模型**设计,支持 NVIDIA 显卡在 Windows 上一键运行。 |
| A pre-built `llama.cpp` environment optimized for **TQ3 quantized models**, enabling one-click inference on NVIDIA GPUs for Windows users. |
|
|
| ## 核心特性 | Key Features |
| ✅ 原生支持 TQ3 格式(普通 llama.cpp 无法运行) |
| ✅ 已编译 CUDA 加速,专为 NVIDIA 显卡优化 |
| ✅ 免配置依赖,解压即用,不包含模型权重 |
| ✅ 支持命令行与 Web 服务两种运行方式 |
|
|
| ✅ Native TQ3 support (works with models standard llama.cpp cannot run) |
| ✅ CUDA-accelerated, optimized for NVIDIA GPUs |
| ✅ No dependencies required — just extract and run (model weights not included) |
| ✅ Supports both CLI and Web server modes |
|
|
| ## 使用方法 | Usage |
| 1. **下载解压**:将文件解压到纯英文路径 |
| 2. **放入模型**:把 `.tq3.gguf` 格式的模型放到目录下 |
| 3. **启动运行**:使用 `llama-cli.exe` 或 `llama-server.exe` 加载模型 |
|
|
| 1. **Download & Extract**: Unzip to a folder with an English-only path |
| 2. **Add Model**: Place your `.tq3.gguf` model in the same directory |
| 3. **Run**: Use `llama-cli.exe` or `llama-server.exe` to start inference |
|
|
| ## 注意事项 | Notes |
| - 仅支持 **NVIDIA 显卡**,AMD 显卡暂不兼容 |
| - 本项目不包含任何模型文件,请自行获取并遵守对应开源协议 |
|
|
| - **NVIDIA-only**: Not compatible with AMD GPUs |
| - This package does not include model weights. Please obtain them legally and comply with their licenses. |
|
|
| ## 致谢 | Credits |
| - 核心源码: turbo-tan/llama.cpp-tq3 |
| - TQ3 量化: YTan2000 |
|
|