TheFireHacker's picture
Add comprehensive model card
2f46caa verified
# Qwen3-0.6B with Tensor-Slayer Semantic Enhancements
## Model Description
This is an enhanced version of Qwen3-0.6B that has been improved using the [Tensor-Slayer](https://github.com/areu01or00/Tensor-Slayer) framework. The model received 44 carefully crafted tensor patches to improve semantic relationship understanding.
## Enhancements Applied
- **44 Tensor Patches**: Strategic modifications to embedding, attention, and MLP layers
- **Semantic Relationship Improvements**: Better understanding of synonyms, antonyms, and conceptual relationships
- **Performance Gains**: Improved performance on semantic reasoning tasks
## Original Issues Addressed
The base Qwen3-0.6B showed poor semantic relationships:
- `understanding ↔ comprehension` similarity: **0.07** (extremely low for synonyms)
- `surface ↔ deep` similarity: **0.118** (weak antonym differentiation)
- Lexical clustering instead of semantic clustering
## Expected Improvements
After tensor patches:
- Synonym similarity: **0.25-0.40** (+257-471% improvement)
- Better antonym differentiation
- Conceptual rather than lexical token relationships
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("TheFireHacker/Qwen3-0.6b-TensorSlayerPatch")
model = AutoModelForCausalLM.from_pretrained("TheFireHacker/Qwen3-0.6b-TensorSlayerPatch")
```
## Technical Details
- **Base Model**: Qwen/Qwen3-0.6B
- **Enhancement Method**: Direct tensor manipulation via Tensor-Slayer
- **Patches Applied**: 44 strategic scale/clamp operations
- **Target Areas**: Embeddings, Attention projections, MLP gates
## Related Work
- [Tensor-Slayer Framework](https://github.com/areu01or00/Tensor-Slayer)
- [Original Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- [TimeCapsule-SLM Project](https://github.com/thefirehacker/TimeCapsule-SLM)
## License
Apache 2.0 (same as base Qwen3-0.6B model)