Update README.md

b67dc8a verified 17 days ago

1.88 kB

license: mit
language:
  - zh
  - en
tags:
  - llama.cpp
  - TQ3
  - quantization
  - Windows
  - NVIDIA
  - GGUF

llama.cpp-TQ3 专用推理环境 (Windows/NVIDIA版)

llama.cpp-TQ3 Inference Environment (Windows/NVIDIA Edition)

简介 | Intro

这是一个预编译的 llama.cpp 环境，专为 TQ3 量化模型设计，支持 NVIDIA 显卡在 Windows 上一键运行。 A pre-built llama.cpp environment optimized for TQ3 quantized models, enabling one-click inference on NVIDIA GPUs for Windows users.

核心特性 | Key Features

✅ 原生支持 TQ3 格式（普通 llama.cpp 无法运行） ✅ 已编译 CUDA 加速，专为 NVIDIA 显卡优化 ✅ 免配置依赖，解压即用，不包含模型权重 ✅ 支持命令行与 Web 服务两种运行方式

✅ Native TQ3 support (works with models standard llama.cpp cannot run) ✅ CUDA-accelerated, optimized for NVIDIA GPUs ✅ No dependencies required — just extract and run (model weights not included) ✅ Supports both CLI and Web server modes

使用方法 | Usage

下载解压：将文件解压到纯英文路径
放入模型：把 .tq3.gguf 格式的模型放到目录下
启动运行：使用 llama-cli.exe 或 llama-server.exe 加载模型
Download & Extract: Unzip to a folder with an English-only path
Add Model: Place your .tq3.gguf model in the same directory
Run: Use llama-cli.exe or llama-server.exe to start inference

注意事项 | Notes

仅支持 NVIDIA 显卡，AMD 显卡暂不兼容
本项目不包含任何模型文件，请自行获取并遵守对应开源协议
NVIDIA-only: Not compatible with AMD GPUs
This package does not include model weights. Please obtain them legally and comply with their licenses.

致谢 | Credits

核心源码: turbo-tan/llama.cpp-tq3
TQ3 量化: YTan2000