TheFireHacker commited on
Commit
8dc1f2a
·
verified ·
1 Parent(s): 2f46caa

Add GGUF quantized versions for Ollama/llama.cpp compatibility with API instructions

Browse files
Files changed (1) hide show
  1. gguf/README.md +75 -0
gguf/README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GGUF Models - Qwen3-0.6B Tensor-Slayer Enhanced
2
+
3
+ This folder contains GGUF (GPT-Generated Unified Format) versions of the Tensor-Slayer enhanced Qwen3-0.6B model, optimized for inference with llama.cpp and Ollama.
4
+
5
+ ## Available Formats
6
+
7
+ | File | Quantization | Size | Use Case |
8
+ |------|-------------|------|----------|
9
+ | `qwen3-0.6b-tensorslayer-f16.gguf` | FP16 | ~1.2GB | Maximum quality |
10
+ | `qwen3-0.6b-tensorslayer-q8_0.gguf` | 8-bit | ~650MB | High quality, smaller |
11
+ | `qwen3-0.6b-tensorslayer-q5_k_m.gguf` | 5-bit K-quant | ~450MB | Balanced quality/size |
12
+ | `qwen3-0.6b-tensorslayer-q4_0.gguf` | 4-bit | ~350MB | Fastest inference |
13
+
14
+ ## Usage with Ollama
15
+
16
+ ### Download and Setup
17
+ ```bash
18
+ # Download recommended Q4_0 version
19
+ wget https://huggingface.co/TheFireHacker/Qwen3-0.6b-TensorSlayerPatch/resolve/main/gguf/qwen3-0.6b-tensorslayer-q4_0.gguf
20
+
21
+ # Create Ollama model
22
+ ollama create qwen3-enhanced -f - <<EOF
23
+ FROM ./qwen3-0.6b-tensorslayer-q4_0.gguf
24
+ TEMPLATE """<|im_start|>system
25
+ You are a helpful assistant with enhanced semantic understanding.<|im_end|>
26
+ <|im_start|>user
27
+ {{ .Prompt }}<|im_end|>
28
+ <|im_start|>assistant
29
+ """
30
+ PARAMETER temperature 0.7
31
+ PARAMETER top_p 0.9
32
+ EOF
33
+ ```
34
+
35
+ ### Test Enhanced Semantics
36
+ ```bash
37
+ # Test 1: Synonym similarity (should be much better now)
38
+ ollama run qwen3-enhanced "Rate the similarity between 'understanding' and 'comprehension' on a scale of 1-10 and explain why."
39
+
40
+ # Test 2: Antonym differentiation
41
+ ollama run qwen3-enhanced "Explain the difference between 'surface-level' and 'deep' understanding."
42
+
43
+ # Test 3: Conceptual relationships
44
+ ollama run qwen3-enhanced "What is the relationship between 'semantic meaning' and 'conceptual understanding'?"
45
+ ```
46
+
47
+ ## Ollama API Access
48
+ ```python
49
+ import requests
50
+
51
+ # Using Ollama API (replace with your endpoint)
52
+ OLLAMA_API = "2612b573cd924d148095de291b70bd98.MDGkuS-nd3Ms0a3tBQdpkk-Z"
53
+ response = requests.post(f"{OLLAMA_API}/api/generate", json={
54
+ "model": "qwen3-enhanced",
55
+ "prompt": "Understanding and comprehension are",
56
+ "stream": False
57
+ })
58
+ print(response.json())
59
+ ```
60
+
61
+ ## Performance Improvements
62
+
63
+ This enhanced model shows significant improvements:
64
+ - **Synonym similarity**: 0.07 → 0.25-0.40 (+257-471% improvement)
65
+ - **Better antonym differentiation**: More precise semantic boundaries
66
+ - **Conceptual clustering**: Semantic rather than lexical token relationships
67
+ - **Enhanced reasoning**: 44 tensor patches targeting embedding, attention, and MLP layers
68
+
69
+ ## Technical Details
70
+ - **Base Model**: Qwen/Qwen3-0.6B
71
+ - **Enhancement Method**: Tensor-Slayer framework with 44 strategic patches
72
+ - **Target Areas**: Embeddings, Attention projections, MLP gates
73
+ - **Validation**: HumanEval benchmark improvements demonstrated
74
+
75
+ See parent directory README.md for complete technical analysis.