Instructions to use Zhao-Ching/TWLLM-Llama2-Extend-VTLoss with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Zhao-Ching/TWLLM-Llama2-Extend-VTLoss with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Zhao-Ching/TWLLM-Llama2-Extend-VTLoss")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Zhao-Ching/TWLLM-Llama2-Extend-VTLoss")
model = AutoModelForCausalLM.from_pretrained("Zhao-Ching/TWLLM-Llama2-Extend-VTLoss")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Zhao-Ching/TWLLM-Llama2-Extend-VTLoss with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Zhao-Ching/TWLLM-Llama2-Extend-VTLoss

SGLang

How to use Zhao-Ching/TWLLM-Llama2-Extend-VTLoss with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zhao-Ching/TWLLM-Llama2-Extend-VTLoss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Zhao-Ching/TWLLM-Llama2-Extend-VTLoss with Docker Model Runner:
```
docker model run hf.co/Zhao-Ching/TWLLM-Llama2-Extend-VTLoss
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for Model ID

This is a fine-tuned model for question or statements rewrite task focused on Traditional Chinese specifically. In this version , we have adjusted the way the model calculates loss.
(The original training process (i.e. SFTTrainer class from trl) calculates CE on whole prompt template.)

In order to prevent the model from copying the original sentence, the total loss we use will be counted as three parts :

Context Loss (from the beginning to <rephrased>)
Answer Loss (from <rephrased> to </rephrased>)
Variety Loss (VTLoss) , it calculates the IOU of orignal tokenized sentence and rewritten tokenized sentence , trying to encourage the model to generate as diverse text as possible.

Noted that the answer loss will take a larger weight than context loss since the answer is more important part that we shall take care of.

Model Details

the prompt template should be used as follow:

<task>  
你是一名熱於助人的AI小幫手，請將敘述語句或者問句變得更加通順與簡潔。  
</task>  

原始句子:  
<origin>  
{before}  
</origin>  

修改後:  
<rephrased>  
{after}  
</rephrased>

Noted that {before} {after} are the original question/statement and rewritten question/statement respcetively.
Moreover , this model is not the best rewrite tool compared with many open source LLMs , it is a trial version.
But we'll still make improvements on it.

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [--]
Funded by [optional]: [--]
Shared by [optional]: [--]
Model type: [--]
Language(s) (NLP): [Traditional Chinese]
License: [--]
Finetuned from model [optional]: [Taiwan LLM base v2.0]

Training Details

Training Data

Generate from GPT4o and artificial human feedback.
Custom Traditional Chinese BenchMark Dataset , with rewritten answers came from Gemini.
Also , the evaluation task is assigned to GPTo with custom rubrics.

[More Information Needed]

Training Procedure

Training Hyperparameters

Training regime: [QLoRA]

More Information [optional]

[--]

Model Card Authors [optional]

[--]

Model Card Contact

[--]

Downloads last month: -

Safetensors

Model size

7B params

Tensor type

F16