How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf codebasic/Qwen3-8B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf codebasic/Qwen3-8B-GGUF:
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf codebasic/Qwen3-8B-GGUF:
# Run inference directly in the terminal:
llama-cli -hf codebasic/Qwen3-8B-GGUF:
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf codebasic/Qwen3-8B-GGUF:
# Run inference directly in the terminal:
./llama-cli -hf codebasic/Qwen3-8B-GGUF:
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf codebasic/Qwen3-8B-GGUF:
# Run inference directly in the terminal:
./build/bin/llama-cli -hf codebasic/Qwen3-8B-GGUF:
Use Docker
docker model run hf.co/codebasic/Qwen3-8B-GGUF:
Quick Links

Qwen3-8B-GGUF

πŸ€– μ½”λ“œλ² μ΄μ§ 제곡

이 λͺ¨λΈμ€ **μ½”λ“œλ² μ΄μ§(codebasic)**μ—μ„œ GGUF 포맷으둜 λ³€ν™˜Β·λ°°ν¬ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

이 λ¦¬ν¬μ§€ν† λ¦¬λŠ” Qwen3-8B λͺ¨λΈμ„ μ—¬λŸ¬ GGUF μ–‘μžν™” λ²„μ „μœΌλ‘œ μ œκ³΅ν•©λ‹ˆλ‹€.
llama.cpp, text-generation-webui, koboldcpp λ“± GGUF 포맷을 μ§€μ›ν•˜λŠ” λ‹€μ–‘ν•œ ν™˜κ²½μ—μ„œ μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.


πŸ“‚ 제곡 파일

파일λͺ… μ–‘μžν™” 방식 λ©”λͺ¨λ¦¬ μš”κ΅¬λŸ‰(λŒ€λž΅) μ„€λͺ…
Qwen3-8B-F16.gguf FP16 (λΉ„μ–‘μžν™”) ~16GB 원본 FP16 κ°€μ€‘μΉ˜ (GPU/고사양 ν™˜κ²½)
Qwen3-8B-Q8_0.gguf Q8_0 ~9GB κ³ ν’ˆμ§ˆ μ–‘μžν™”, 거의 FP16 μˆ˜μ€€μ˜ 정확도

πŸ’‘ λ©”λͺ¨λ¦¬ μš”κ΅¬λŸ‰μ€ μΆ”μ •μΉ˜μ΄λ©°, ν™˜κ²½μ— 따라 λ‹€λ₯Ό 수 μžˆμŠ΅λ‹ˆλ‹€.


πŸš€ μ‚¬μš© 방법

1. Docker (llama.cpp Q8_0 μ˜ˆμ‹œ)

docker run -v /path/to/models:/models \
    ghcr.io/ggml-org/llama.cpp:full \
    --run -m /models/Qwen3-8B/Qwen3-8B-Q8_0.gguf \
    -p "μ–Έμ–΄ λͺ¨λΈ μ†Œκ°œ"
Downloads last month
11
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for codebasic/Qwen3-8B-GGUF

Finetuned
Qwen/Qwen3-8B
Quantized
(278)
this model