luSI-v1.0 / README.md
chnaaam's picture
Update README.md
5569ad3 verified
---
library_name: transformers
license: apache-2.0
language:
- en
- ko
base_model:
- numind/NuExtract-1.5
tags:
- llama-factory
---
# Automatic Schema Induction(text-to-schema) Model
This model is a sub-task of text-to-json task that generates a JSON template given a text.
# Usage
```python
import json
import torch
from transformers import AutoModel, AutoTokenizer
model_name = "chnaaam/luSI-v1.0"
if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
text = """์•„์ด์œ (IU, ๋ณธ๋ช…: ์ด์ง€์€, ๆŽ็Ÿฅๆฉ[1], 1993๋…„ 5์›” 16์ผ~)๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ์‹ฑ์–ด์†ก๋ผ์ดํ„ฐ, ์ž‘๊ณก๊ฐ€, ๋ฐฐ์šฐ์ด๋‹ค. 2007๋…„ ๋กœ์—” ์—”ํ„ฐํ…Œ์ธ๋จผํŠธ(ํ˜„ ์นด์นด์˜ค ์—”ํ„ฐํ…Œ์ธ๋จผํŠธ) ์—ฐ์Šต์ƒ์œผ๋กœ ์ „์† ๊ณ„์•ฝ์„ ๋งบ๊ณ  15์„ธ์˜ ๋‚˜์ด์— 2008๋…„ ์ฒซ EP์ธ ๋กœ์ŠคํŠธ ์•ค ํŒŒ์šด๋“œ(Lost and Found)๋ฅผ ํ†ตํ•ด ๊ฐ€์ˆ˜๋กœ ๋ฐ๋ท”ํ–ˆ๋‹ค."""
messages = [
{"role": "user", "content": text}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
temperature=0.0
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
json_template = json.loads(response)
print(json_template)
```
## Output
```json
{
'Person': {
'Name': '',
'Stage name': '',
'Real name': '',
'Birth date': '',
'Nationality': '',
'Occupations': [],
'Debut': {
'Age': '',
'Year': '',
'Company': '',
'Contract type': '',
'EP': '',
'EP title': ''
}
}
}
```