QuantaSparkLabs commited on
Commit
5b7a2af
Β·
verified Β·
1 Parent(s): ef6fa09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -54
README.md CHANGED
@@ -1,98 +1,160 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
5
  tags:
6
- - bio
7
- - personality
8
- - tags
9
- - extraction
10
- - spiceechat
11
- - tiny-model
12
- - work-in-progress
13
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  # 🏷️ Bio2Tags-Lite
17
 
18
- > *The full real README is stuck in traffic. Will arrive tomorrow. Please download on half faith. 🀞*
19
 
20
- ---
 
 
21
 
22
- ## 🚧 What's Going On?
23
 
24
- You're looking at a placeholder because:
25
 
26
- 1. **The real README is in traffic.** LA rush hour. It's bad out there.
27
- 2. **This model was born 30 minutes ago.** It still doesn't know what a 401(k) is.
28
- 3. **I am training like 4 models right now.**
 
 
29
 
30
  ---
31
 
32
- ## 🀨 So What Does This Model Actually Do?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
- It reads a dating bio and outputs personality tags.
35
 
36
- **Input:**
37
- > *"I'm a retired teacher who gardens, reads history books, and bakes sourdough."*
38
 
39
- **Output:**
40
- > *intellectual, family-oriented, gardener, history-buff, old-soul*
 
 
 
 
41
 
42
- That's it. No small talk. No life advice. Just tags.
43
 
44
  ---
45
 
46
- ## πŸ™ Why "Download on Half Faith"?
47
 
48
- Because it works about 50% of the time right now. The other 50%? Let's call it "creative interpretation." We're working on it. The 360M version is much better than the original 135M prototype that once tagged everyone as "adventurous, creative, empathetic" regardless of input.
 
 
49
 
50
- **We'll get there. Just not today. Today is chaos.**
51
 
52
  ---
53
 
54
- ## ⚑ Quick Test (If You're Brave)
55
 
56
- ```python
57
- from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
58
 
59
- model = AutoModelForCausalLM.from_pretrained("SpiceeChat/Bio2Tags-Lite", dtype="auto", device_map="auto")
60
- tokenizer = AutoTokenizer.from_pretrained("SpiceeChat/Bio2Tags-Lite")
61
 
62
- bio = "I love hiking at sunrise and brewing craft beer on weekends."
63
- prompt = f"Extract personality tags from the bio below. Output ONLY comma-separated tags, nothing else.\n\nBio: {bio}\n\nTags:"
64
- messages = [{"role": "user", "content": prompt}]
65
- formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
66
- inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
67
- outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7)
68
- tags = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
69
- print(tags)
70
- ```
 
 
 
 
 
71
 
72
  ---
73
 
74
- ## πŸ“ Status
75
 
76
- | Thing | Status |
77
- |-------|--------|
78
- | Model | 🟑 Works 50% of the time |
79
- | Real README | πŸ”΄ Stuck in traffic, ETA tomorrow |
80
- | Developer | 🟑 Running on caffeine and blind faith |
81
- | Cinder-1.5B | 🟑 Still training on Kaggle |
82
- | Sleep | πŸ”΄ Not happening |
83
 
84
  ---
85
 
86
- ## 🧠 Part of SpiceeChat
 
 
 
 
 
 
 
 
87
 
88
- Built for **SpiceeChat** β€” tools and AI to help people navigate the messy world of dating.
89
 
90
- - 🌐 [dating-fatigue.com](https://dating-fatigue.com)
91
- - πŸ”₯ [Cinder-1.5B](https://huggingface.co/SpiceeChat/Cinder-1.5B)
92
- - 🏷️ [Bio2Tags-Lite](https://huggingface.co/SpiceeChat/Bio2Tags-Lite) ← You are here
93
 
94
  ---
95
 
96
  <div align="center">
97
- <sub>πŸš— The real README is on the 405. It'll be here tomorrow. Probably.</sub>
98
- </div>
 
 
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - en
5
  tags:
6
+ - bio-to-tags
7
+ - tag-generation
8
+ - smollm2
9
+ - text-generation
10
+ - personality
11
+ - interests
12
+ - spiceechat
13
  pipeline_tag: text-generation
14
+ library_name: transformers
15
+ ---
16
+
17
+ <p align="center">
18
+ <img src="https://huggingface.co/SpiceeChat/Bio2Tags-Qwen3.5-4B-SFT/resolve/main/Spiceechat.png"
19
+ alt="SpiceeChat"
20
+ width="1100"
21
+ height="1000"
22
+ style="border-radius: 50%; object-fit: cover;">
23
+ </p>
24
+
25
+ <p align="center">
26
+ <a href="https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct"><img src="https://img.shields.io/badge/SmolLM2-360M-blue?logo=huggingface" alt="SmolLM2"></a>
27
+ <a href="https://github.com/unslothai/unsloth"><img src="https://img.shields.io/badge/Fine‑Tuned-QLoRA-green" alt="QLoRA"></a>
28
+ <a href="https://huggingface.co/SpiceeChat"><img src="https://img.shields.io/badge/SpiceeChat-πŸ”₯-orange" alt="SpiceeChat"></a>
29
+ <a href="https://www.apache.org/licenses/LICENSE-2.0"><img src="https://img.shields.io/badge/License-Apache%202.0-yellow" alt="License"></a>
30
+ </p>
31
+
32
  ---
33
 
34
  # 🏷️ Bio2Tags-Lite
35
 
36
+ **Because reading between the lines shouldn't require a psychology degree.**
37
 
38
+ Bio2Tags-Lite is a fine-tuned SmolLM2-360M model that reads personal biographies and returns clean, structured personality tags. Feed it a dating bio, a LinkedIn summary, or whatever someone wrote about themselves at 2am β€” it'll tell you what kind of person they actually are.
39
+
40
+ No rambling. No fluff. Just tags.
41
 
42
+ ---
43
 
44
+ ## ✨ Features
45
 
46
+ - **Lightweight**: 360M parameters β€” runs on hardware that would make a gamer cry
47
+ - **Fast**: Inference in milliseconds, because nobody has time to wait
48
+ - **Structured Output**: Clean comma-separated tags, every time
49
+ - **Plug & Play**: Works with Transformers out of the box, no PhD required
50
+ - **SpiceeChat Pipeline**: Pairs with Cinder-1.5B like peanut butter and heartbreak
51
 
52
  ---
53
 
54
+ ## πŸ§ͺ Example
55
+
56
+ ```python
57
+ from transformers import AutoModelForCausalLM, AutoTokenizer
58
+
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ "SpiceeChat/Bio2Tags-Lite",
61
+ torch_dtype="auto",
62
+ device_map="auto",
63
+ )
64
+ tokenizer = AutoTokenizer.from_pretrained("SpiceeChat/Bio2Tags-Lite")
65
+
66
+ def get_tags(bio):
67
+ prompt = f"Extract personality tags from the bio below. Output ONLY comma-separated tags, nothing else.\n\nBio: {bio}\n\nTags:"
68
+ messages = [{"role": "user", "content": prompt}]
69
+ formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
70
+ inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
71
+ outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
72
+ return tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
73
+
74
+ # Try it
75
+ print(get_tags("I love hiking at dawn, painting watercolors, and deep conversations about philosophy."))
76
+ # Output: nature-lover, artist, intellectual, deep-thinker
77
+ ```
78
 
79
+ ---
80
 
81
+ ## πŸ“Š Sample Outputs
 
82
 
83
+ | Bio | Tags |
84
+ |-----|------|
85
+ | "I'm a software engineer who loves late-night coding and playing jazz piano." | tech-savvy, creative, night-owl, music-enthusiast, artistic |
86
+ | "I spend my weekends trail running and evenings reading classic literature." | adventurous, nature-lover, bookworm, intellectual, quiet |
87
+ | "I'm a retired teacher who gardens, reads history books, and bakes sourdough." | intellectual, family-oriented, gardener, history-buff, old-soul |
88
+ | "As a digital nomad, my office changes weekly β€” from Bali cafes to Alpine cabins." | adventurous, creative, digital-nomad, spontaneous, tech-savvy |
89
 
90
+ *(Yes, the sourdough one is a stereotype. Yes, it's also always accurate.)*
91
 
92
  ---
93
 
94
+ ## πŸ“¦ Installation
95
 
96
+ ```bash
97
+ pip install transformers torch accelerate
98
+ ```
99
 
100
+ That's it. No ritual sacrifices, no config files, no Stack Overflow rabbit holes.
101
 
102
  ---
103
 
104
+ ## 🎯 Use Cases
105
 
106
+ - **Dating Apps**: Tag user bios automatically for smarter matching β€” because "I like long walks on the beach" means something very different than "I like long walks on the beach at 3am alone"
107
+ - **Social Media**: Generate relevant hashtags from profile descriptions
108
+ - **Recommender Systems**: Build personality-based recommendation engines
109
+ - **Content Analysis**: Extract structured metadata from unstructured text
110
+ - **SpiceeChat Pipeline**: Feed extracted tags into Cinder-1.5B for personalized compatibility advice
111
 
112
+ ---
 
113
 
114
+ ## πŸ› οΈ Technical Details
115
+
116
+ | Detail | Value |
117
+ |--------|-------|
118
+ | **Base Model** | [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) |
119
+ | **Fine-tuning Method** | QLoRA (4-bit quantization, rank-16 adapters) |
120
+ | **Training Framework** | Unsloth |
121
+ | **Training Data** | 1,387 hand-crafted (bio, tags) pairs |
122
+ | **Epochs** | 3 |
123
+ | **Learning Rate** | 1e-4 |
124
+ | **Sequence Length** | 512 tokens |
125
+ | **Hardware Used** | Google Colab T4 (free tier β€” yes, really) |
126
+ | **Final Size** | 724 MB (FP16) |
127
+ | **Min VRAM Required** | ~1.5 GB |
128
 
129
  ---
130
 
131
+ ## ⚠️ Limitations
132
 
133
+ - **English only**: Other languages may produce results ranging from "creative" to "confidently wrong"
134
+ - **Training data size**: 1,387 examples is a solid start β€” more data is always on the roadmap
135
+ - **Tag granularity**: Captures the salient stuff, not every quirk (the model can't detect if someone is secretly obsessed with true crime podcasts)
136
+ - **Edge cases**: Very short bios, emoji-heavy text, or deeply abstract descriptions may surprise you
 
 
 
137
 
138
  ---
139
 
140
+ ## 🧠 Part of the SpiceeChat Ecosystem
141
+
142
+ Bio2Tags-Lite is a core component of the SpiceeChat AI pipeline:
143
+
144
+ - 🏷️ **Bio2Tags-Lite** β†’ Extracts personality tags from bios
145
+ - πŸ”₯ **[Cinder-1.5B](https://huggingface.co/SpiceeChat/Cinder-1.5B)** β†’ Personalized dating advice powered by those tags
146
+ - 🌐 **[dating-fatigue.com](https://dating-fatigue.com)** β†’ Live tools for real humans trying to find real love
147
+
148
+ ---
149
 
150
+ ## πŸ“œ License
151
 
152
+ Apache 2.0 β€” use it, modify it, ship it. Just give SpiceeChat a nod.
 
 
153
 
154
  ---
155
 
156
  <div align="center">
157
+ <sub>Built with ❀️ by <b>SpiceeChat</b></sub>
158
+ <br>
159
+ <sub>πŸ”— <a href="https://huggingface.co/SpiceeChat">huggingface.co/SpiceeChat</a></sub>
160
+ </div>