TheFireHacker commited on
Commit
bec0b3f
·
verified ·
1 Parent(s): 8dc1f2a

Add GGUF quantized versions for Ollama/llama.cpp compatibility with API instructions

Browse files
.gitattributes CHANGED
@@ -34,3 +34,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ gguf/qwen3-0.6b-tensorslayer-f16.gguf filter=lfs diff=lfs merge=lfs -text
38
+ gguf/qwen3-0.6b-tensorslayer-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
39
+ gguf/qwen3-0.6b-tensorslayer-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
40
+ gguf/qwen3-0.6b-tensorslayer-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
gguf/README.md CHANGED
@@ -6,10 +6,10 @@ This folder contains GGUF (GPT-Generated Unified Format) versions of the Tensor-
6
 
7
  | File | Quantization | Size | Use Case |
8
  |------|-------------|------|----------|
9
- | `qwen3-0.6b-tensorslayer-f16.gguf` | FP16 | ~1.2GB | Maximum quality |
10
- | `qwen3-0.6b-tensorslayer-q8_0.gguf` | 8-bit | ~650MB | High quality, smaller |
11
- | `qwen3-0.6b-tensorslayer-q5_k_m.gguf` | 5-bit K-quant | ~450MB | Balanced quality/size |
12
- | `qwen3-0.6b-tensorslayer-q4_0.gguf` | 4-bit | ~350MB | Fastest inference |
13
 
14
  ## Usage with Ollama
15
 
@@ -48,7 +48,7 @@ ollama run qwen3-enhanced "What is the relationship between 'semantic meaning' a
48
  ```python
49
  import requests
50
 
51
- # Using Ollama API (replace with your endpoint)
52
  OLLAMA_API = "2612b573cd924d148095de291b70bd98.MDGkuS-nd3Ms0a3tBQdpkk-Z"
53
  response = requests.post(f"{OLLAMA_API}/api/generate", json={
54
  "model": "qwen3-enhanced",
 
6
 
7
  | File | Quantization | Size | Use Case |
8
  |------|-------------|------|----------|
9
+ | `qwen3-0.6b-tensorslayer-f16.gguf` | FP16 | ~1.1GB | Maximum quality |
10
+ | `qwen3-0.6b-tensorslayer-q8_0.gguf` | 8-bit | ~610MB | High quality, smaller |
11
+ | `qwen3-0.6b-tensorslayer-q5_k_m.gguf` | 5-bit K-quant | ~424MB | Balanced quality/size |
12
+ | `qwen3-0.6b-tensorslayer-q4_0.gguf` | 4-bit | ~364MB | Fastest inference |
13
 
14
  ## Usage with Ollama
15
 
 
48
  ```python
49
  import requests
50
 
51
+ # Using Ollama API
52
  OLLAMA_API = "2612b573cd924d148095de291b70bd98.MDGkuS-nd3Ms0a3tBQdpkk-Z"
53
  response = requests.post(f"{OLLAMA_API}/api/generate", json={
54
  "model": "qwen3-enhanced",
gguf/qwen3-0.6b-tensorslayer-f16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:63b02802d2abc6b5aad457a4fc64da01341b97f4d569b60bec1273e062b07ad4
3
+ size 1198182016
gguf/qwen3-0.6b-tensorslayer-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44746f83b453466e40c6899e1d4059751655c6dffc4d00ed9418e3cfef6bfc9c
3
+ size 381565568
gguf/qwen3-0.6b-tensorslayer-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccc83a79c321e0d26dec88e29860675b85b57025e3bccad02589f5bd625e1f88
3
+ size 444414592
gguf/qwen3-0.6b-tensorslayer-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a512a8ec6c39a301fd239758a80a8a22c5888b4a0cde6bc21462755db1dc88d
3
+ size 639446656