File size: 1,932 Bytes
2f46caa
cbe442e
2f46caa
cbe442e
2f46caa
cbe442e
2f46caa
cbe442e
2f46caa
 
 
cbe442e
2f46caa
cbe442e
2f46caa
 
 
 
cbe442e
2f46caa
cbe442e
2f46caa
 
 
 
cbe442e
2f46caa
cbe442e
 
2f46caa
cbe442e
2f46caa
 
cbe442e
 
2f46caa
cbe442e
2f46caa
 
 
 
cbe442e
2f46caa
cbe442e
2f46caa
 
 
cbe442e
2f46caa
cbe442e
2f46caa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Qwen3-0.6B with Tensor-Slayer Semantic Enhancements

## Model Description

This is an enhanced version of Qwen3-0.6B that has been improved using the [Tensor-Slayer](https://github.com/areu01or00/Tensor-Slayer) framework. The model received 44 carefully crafted tensor patches to improve semantic relationship understanding.

## Enhancements Applied

- **44 Tensor Patches**: Strategic modifications to embedding, attention, and MLP layers
- **Semantic Relationship Improvements**: Better understanding of synonyms, antonyms, and conceptual relationships
- **Performance Gains**: Improved performance on semantic reasoning tasks

## Original Issues Addressed

The base Qwen3-0.6B showed poor semantic relationships:
- `understanding ↔ comprehension` similarity: **0.07** (extremely low for synonyms)
- `surface ↔ deep` similarity: **0.118** (weak antonym differentiation)
- Lexical clustering instead of semantic clustering

## Expected Improvements

After tensor patches:
- Synonym similarity: **0.25-0.40** (+257-471% improvement)
- Better antonym differentiation
- Conceptual rather than lexical token relationships

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheFireHacker/Qwen3-0.6b-TensorSlayerPatch")
model = AutoModelForCausalLM.from_pretrained("TheFireHacker/Qwen3-0.6b-TensorSlayerPatch")
```

## Technical Details

- **Base Model**: Qwen/Qwen3-0.6B
- **Enhancement Method**: Direct tensor manipulation via Tensor-Slayer
- **Patches Applied**: 44 strategic scale/clamp operations
- **Target Areas**: Embeddings, Attention projections, MLP gates

## Related Work

- [Tensor-Slayer Framework](https://github.com/areu01or00/Tensor-Slayer)
- [Original Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- [TimeCapsule-SLM Project](https://github.com/thefirehacker/TimeCapsule-SLM)

## License

Apache 2.0 (same as base Qwen3-0.6B model)