File size: 6,099 Bytes
406ac18
 
 
5b7a2af
406ac18
5b7a2af
 
 
 
 
 
 
406ac18
5b7a2af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
406ac18
 
 
 
5b7a2af
406ac18
5b7a2af
 
 
406ac18
5b7a2af
406ac18
5b7a2af
406ac18
5b7a2af
 
 
 
 
406ac18
 
 
5b7a2af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
406ac18
5b7a2af
406ac18
5b7a2af
406ac18
5b7a2af
 
 
 
 
 
406ac18
5b7a2af
406ac18
 
 
5b7a2af
406ac18
5b7a2af
 
 
406ac18
5b7a2af
406ac18
 
 
5b7a2af
406ac18
5b7a2af
 
 
 
 
406ac18
5b7a2af
406ac18
5b7a2af
 
 
 
 
 
 
 
 
 
 
 
 
 
406ac18
 
 
5b7a2af
406ac18
5b7a2af
 
 
 
406ac18
 
 
5b7a2af
 
 
 
 
 
 
 
 
406ac18
5b7a2af
406ac18
5b7a2af
406ac18
 
 
 
5b7a2af
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: apache-2.0
language:
- en
tags:
- bio-to-tags
- tag-generation
- smollm2
- text-generation
- personality
- interests
- spiceechat
pipeline_tag: text-generation
library_name: transformers
---

<p align="center">
  <img src="https://huggingface.co/SpiceeChat/Bio2Tags-Qwen3.5-4B-SFT/resolve/main/Spiceechat.png" 
       alt="SpiceeChat" 
       width="1100" 
       height="1000" 
       style="border-radius: 50%; object-fit: cover;">
</p>

<p align="center">
  <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct"><img src="https://img.shields.io/badge/SmolLM2-360M-blue?logo=huggingface" alt="SmolLM2"></a>
  <a href="https://github.com/unslothai/unsloth"><img src="https://img.shields.io/badge/Fine‑Tuned-QLoRA-green" alt="QLoRA"></a>
  <a href="https://huggingface.co/SpiceeChat"><img src="https://img.shields.io/badge/SpiceeChat-πŸ”₯-orange" alt="SpiceeChat"></a>
  <a href="https://www.apache.org/licenses/LICENSE-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-yellow" alt="License"></a>
</p>

---

# 🏷️ Bio2Tags-Lite

**Because reading between the lines shouldn't require a psychology degree.**

Bio2Tags-Lite is a fine-tuned SmolLM2-360M model that reads personal biographies and returns clean, structured personality tags. Feed it a dating bio, a LinkedIn summary, or whatever someone wrote about themselves at 2am β€” it'll tell you what kind of person they actually are.

No rambling. No fluff. Just tags.

---

## ✨ Features

- **Lightweight**: 360M parameters β€” runs on hardware that would make a gamer cry
- **Fast**: Inference in milliseconds, because nobody has time to wait
- **Structured Output**: Clean comma-separated tags, every time
- **Plug & Play**: Works with Transformers out of the box, no PhD required
- **SpiceeChat Pipeline**: Pairs with Cinder-1.5B like peanut butter and heartbreak

---

## πŸ§ͺ Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "SpiceeChat/Bio2Tags-Lite",
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("SpiceeChat/Bio2Tags-Lite")

def get_tags(bio):
    prompt = f"Extract personality tags from the bio below. Output ONLY comma-separated tags, nothing else.\n\nBio: {bio}\n\nTags:"
    messages = [{"role": "user", "content": prompt}]
    formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
    return tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()

# Try it
print(get_tags("I love hiking at dawn, painting watercolors, and deep conversations about philosophy."))
# Output: nature-lover, artist, intellectual, deep-thinker
```

---

## πŸ“Š Sample Outputs

| Bio | Tags |
|-----|------|
| "I'm a software engineer who loves late-night coding and playing jazz piano." | tech-savvy, creative, night-owl, music-enthusiast, artistic |
| "I spend my weekends trail running and evenings reading classic literature." | adventurous, nature-lover, bookworm, intellectual, quiet |
| "I'm a retired teacher who gardens, reads history books, and bakes sourdough." | intellectual, family-oriented, gardener, history-buff, old-soul |
| "As a digital nomad, my office changes weekly β€” from Bali cafes to Alpine cabins." | adventurous, creative, digital-nomad, spontaneous, tech-savvy |

*(Yes, the sourdough one is a stereotype. Yes, it's also always accurate.)*

---

## πŸ“¦ Installation

```bash
pip install transformers torch accelerate
```

That's it. No ritual sacrifices, no config files, no Stack Overflow rabbit holes.

---

## 🎯 Use Cases

- **Dating Apps**: Tag user bios automatically for smarter matching β€” because "I like long walks on the beach" means something very different than "I like long walks on the beach at 3am alone"
- **Social Media**: Generate relevant hashtags from profile descriptions
- **Recommender Systems**: Build personality-based recommendation engines
- **Content Analysis**: Extract structured metadata from unstructured text
- **SpiceeChat Pipeline**: Feed extracted tags into Cinder-1.5B for personalized compatibility advice

---

## πŸ› οΈ Technical Details

| Detail | Value |
|--------|-------|
| **Base Model** | [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) |
| **Fine-tuning Method** | QLoRA (4-bit quantization, rank-16 adapters) |
| **Training Framework** | Unsloth |
| **Training Data** | 1,387 hand-crafted (bio, tags) pairs |
| **Epochs** | 3 |
| **Learning Rate** | 1e-4 |
| **Sequence Length** | 512 tokens |
| **Hardware Used** | Google Colab T4 (free tier β€” yes, really) |
| **Final Size** | 724 MB (FP16) |
| **Min VRAM Required** | ~1.5 GB |

---

## ⚠️ Limitations

- **English only**: Other languages may produce results ranging from "creative" to "confidently wrong"
- **Training data size**: 1,387 examples is a solid start β€” more data is always on the roadmap
- **Tag granularity**: Captures the salient stuff, not every quirk (the model can't detect if someone is secretly obsessed with true crime podcasts)
- **Edge cases**: Very short bios, emoji-heavy text, or deeply abstract descriptions may surprise you

---

## 🧠 Part of the SpiceeChat Ecosystem

Bio2Tags-Lite is a core component of the SpiceeChat AI pipeline:

- 🏷️ **Bio2Tags-Lite** β†’ Extracts personality tags from bios
- πŸ”₯ **[Cinder-1.5B](https://huggingface.co/SpiceeChat/Cinder-1.5B)** β†’ Personalized dating advice powered by those tags
- 🌐 **[dating-fatigue.com](https://dating-fatigue.com)** β†’ Live tools for real humans trying to find real love

---

## πŸ“œ License

Apache 2.0 β€” use it, modify it, ship it. Just give SpiceeChat a nod.

---

<div align="center">
  <sub>Built with ❀️ by <b>SpiceeChat</b></sub>
  <br>
  <sub>πŸ”— <a href="https://huggingface.co/SpiceeChat">huggingface.co/SpiceeChat</a></sub>
</div>