chnaaam
/

luSI-v1.0

Text Generation

text-generation-inference

Model card Files Files and versions

luSI-v1.0 / README.md

chnaaam's picture

Update README.md

5569ad3 verified 6 months ago

|

history blame contribute delete

2.01 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	- ko
	base_model:
	- numind/NuExtract-1.5
	tags:
	- llama-factory
	---

	# Automatic Schema Induction(text-to-schema) Model

	This model is a sub-task of text-to-json task that generates a JSON template given a text.

	# Usage

	```python
	import json
	import torch
	from transformers import AutoModel, AutoTokenizer

	model_name = "chnaaam/luSI-v1.0"

	if torch.cuda.is_available():
	device = "cuda"
	elif torch.backends.mps.is_available():
	device = "mps"
	else:
	device = "cpu"

	model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

	text = """아이유(IU, 본명: 이지은, 李知恩[1], 1993년 5월 16일~)는 대한민국의 싱어송라이터, 작곡가, 배우이다. 2007년 로엔 엔터테인먼트(현 카카오 엔터테인먼트) 연습생으로 전속 계약을 맺고 15세의 나이에 2008년 첫 EP인 로스트 앤 파운드(Lost and Found)를 통해 가수로 데뷔했다."""

	messages = [
	{"role": "user", "content": text}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)

	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=1024,
	temperature=0.0
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

	json_template = json.loads(response)

	print(json_template)
	```

	## Output

	```json
	{
	'Person': {
	'Name': '',
	'Stage name': '',
	'Real name': '',
	'Birth date': '',
	'Nationality': '',
	'Occupations': [],
	'Debut': {
	'Age': '',
	'Year': '',
	'Company': '',
	'Contract type': '',
	'EP': '',
	'EP title': ''
	}
	}
	}
	```