Calvin806's picture
Update README.md
9f410e0 verified
---
language:
- ko
- en
tags:
- text-generation
- keyword-extraction
- tag-generation
license: apache-2.0
---
# Qwen3-0.6B Float:Right Tagger (https://float-right.app)
This repository contains a fine-tuned tag generator based on **Qwen/Qwen3-0.6B**.
This model was built for on-device AI tag generation in the Float:Right app.
Float:Right is an automatic tag generation and classification app
GGUF : https://huggingface.co/FloatDo/qwen3-0.6b-float-right-tagger-GGUF
์ด๊ฒƒ์€ Float:Right ์•ฑ์— ์‚ฌ์šฉํ•  ์˜จ๋””๋ฐ”์ด์Šค AI ํƒœ๊ทธ์ƒ์„ฑ์šฉ๋„๋กœ ๋งŒ๋“ค์–ด์กŒ์Šต๋‹ˆ๋‹ค.
์ž๋™ ํƒœ๊ทธ์ƒ์„ฑ, ๋ถ„๋ฅ˜์•ฑ Float:Right.
https://float-right.app
## What it does
Given a memo/text, it returns **a JSON array of 3โ€“10 tags**:
- Prefer coarse tags (not overly detailed)
- Keeps the same language as input (Korean -> Korean, English -> English)
- Avoids underscores `_`
> In production, parse only the first JSON array `[ ... ]` from the output.
## Quick usage (Transformers)
```python
import json, re, torch
from transformers import AutoTokenizer, AutoModelForCausalLM
MODEL_DIR = "./" # or your HF repo id
tok = AutoTokenizer.from_pretrained(MODEL_DIR, trust_remote_code=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL_DIR, torch_dtype="auto", device_map="cuda", trust_remote_code=True
)
def extract_array(s: str):
m = re.search(r"\[[\s\S]*?\]", s)
if not m:
return None
return json.loads(m.group(0))
text = "์˜ค๋Š˜ ์„œ์šธ์—์„œ AI ์ปจํผ๋Ÿฐ์Šค๋ฅผ ๋‹ค๋…€์™”๋‹ค."
messages = [
{"role": "system", "content": "๋„ˆ๋Š” ํƒœ๊ทธ ์ƒ์„ฑ๊ธฐ๋‹ค. ์ถœ๋ ฅ์€ JSON ๋ฐฐ์—ด ํ•˜๋‚˜๋งŒ."},
{"role": "user", "content": f"๋ฌธ์žฅ: {text}\nํƒœ๊ทธ 3~10๊ฐœ. ๋„ˆ๋ฌด ๋””ํ…Œ์ผํ•˜์ง€ ์•Š๊ฒŒ. ์–ธ๋”์Šค์ฝ”์–ด ๊ธˆ์ง€. JSON ๋ฐฐ์—ด๋งŒ."},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
enc = tok(prompt, return_tensors="pt").to("cuda")
out = model.generate(**enc, max_new_tokens=64, do_sample=False)
decoded = tok.decode(out[0], skip_special_tokens=True)
print(extract_array(decoded))
```
Notes
โ€ข Some outputs may include extra tokens (e.g., <think>). In production, extract only the first JSON array [ ... ].
โ€ข Training data is intended to avoid sensitive information.
Credits
โ€ข Base model: Qwen/Qwen3-0.6B
โ€ข Project: Float-Right