Instructions to use thanhhoangnvbg/empathAI-llama3.1-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thanhhoangnvbg/empathAI-llama3.1-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="thanhhoangnvbg/empathAI-llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("thanhhoangnvbg/empathAI-llama3.1-8b")
model = AutoModelForCausalLM.from_pretrained("thanhhoangnvbg/empathAI-llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use thanhhoangnvbg/empathAI-llama3.1-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thanhhoangnvbg/empathAI-llama3.1-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/thanhhoangnvbg/empathAI-llama3.1-8b

SGLang

How to use thanhhoangnvbg/empathAI-llama3.1-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thanhhoangnvbg/empathAI-llama3.1-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thanhhoangnvbg/empathAI-llama3.1-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use thanhhoangnvbg/empathAI-llama3.1-8b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="thanhhoangnvbg/empathAI-llama3.1-8b",
    max_seq_length=2048,
)

Docker Model Runner
How to use thanhhoangnvbg/empathAI-llama3.1-8b with Docker Model Runner:
```
docker model run hf.co/thanhhoangnvbg/empathAI-llama3.1-8b
```

pipeline_tag: text-generation

🧠 EmpathAI - Llama 3.1 8B

Vietnamese Toxic E-commerce Customer Support Model

EmpathAI là mô hình LLM tiếng Việt được fine-tune chuyên biệt cho bài toán chăm sóc khách hàng thương mại điện tử, đặc biệt tập trung vào các tình huống khó như:

khách hàng toxic / tức giận
giao hàng trễ
thiếu hàng / sai hàng
sản phẩm lỗi hoặc hư hỏng
refund / đổi trả
payment / COD issues
escalation và de-escalation
xử lý theo policy và workflow thực tế

Mục tiêu của EmpathAI là:

giảm hallucination trong CSKH
xử lý khách hàng toxic tự nhiên hơn
hoạt động tốt với RAG/tool systems
tăng realism cho workflow e-commerce tiếng Việt

📌 Trạng thái hiện tại

EmpathAI v2 hiện đang trong quá trình phát triển và đánh giá.

Phiên bản v2 tập trung cải thiện:

multi-turn workflow
order-code handling
payment/COD edge cases
policy/context grounding
privacy & PII safety
tool-aware customer support
giảm hallucinated refund/order-status responses

Phiên bản v1 hiện tại vẫn được giữ nguyên và tiếp tục khả dụng.

🌟 Điểm nổi bật

💬 Emotional Intelligence

EmpathAI được huấn luyện để:

xoa dịu khách hàng đang tức giận
tránh tranh cãi không cần thiết
giữ giọng điệu tự nhiên, không quá máy móc
đưa ra bước xử lý tiếp theo rõ ràng

🧩 RAG & Tool-Friendly

Mô hình được thiết kế để hoạt động tốt với:

RAG pipelines
order lookup systems
internal customer-support tools

EmpathAI biết:

khi nào cần hỏi mã đơn
khi nào cần yêu cầu thêm thông tin
khi nào chưa đủ dữ liệu để kết luận
tránh tự bịa trạng thái đơn hàng

🛡️ Safety & Grounding Focus

EmpathAI v2 tập trung giảm:

hallucinated order status
unsupported refund promises
tự ý đề xuất voucher/bồi thường
yêu cầu PII không cần thiết
rò rỉ thông tin khách hàng khác

📊 Tổng quan Dataset

Dataset v1

Dataset gốc chủ yếu tập trung vào:

toxic customer complaints
delayed delivery
refund/compensation
damaged/missing products
Vietnamese empathy/de-escalation

Hạn chế của v1

phần lớn là single-turn
ít workflow tool-aware
ít payment/COD scenarios
ít privacy/security cases
còn tendency overpromise trong một số DPO pairs cũ

Pipeline Dataset v2

Pipeline dataset v2 hiện bao gồm:

cleaned old SFT pool
re-judged DPO preference pairs
synthetic toxic e-commerce conversations
multi-turn workflow generation
benchmark-oriented evaluation data

Mục tiêu dataset

~10k SFT samples
~6k DPO pairs
benchmark eval riêng

🏋️ Pipeline Training

EmpathAI v2 sử dụng pipeline train 2 giai đoạn:

Stage 1 — SFT

Supervised fine-tuning trên:

chosen responses đã clean từ dataset cũ
synthetic customer-support conversations mới

Stage 2 — DPO

Direct Preference Optimization trên:

chosen/rejected pairs rõ ràng
DPO samples đã safety-filter và re-judge

Các preference pair mơ hồ sẽ bị loại để tăng độ ổn định cho DPO.

📈 Benchmark (Đang phát triển)

Một benchmark riêng cho Vietnamese toxic e-commerce customer support hiện đang được xây dựng.

Các nhóm đánh giá

hallucinated order-status rate
hallucinated refund/compensation rate
multi-turn state tracking
policy/context grounding
payment/COD realism
privacy & PII safety
toxic customer handling quality
escalation/de-escalation quality

Các model dự kiến benchmark

EmpathAI v1
EmpathAI v2
Llama 3.1 8B Instruct
Qwen Instruct
Gemini Flash-class models

📊 Thông số kỹ thuật

Thành phần	Chi tiết
Mô hình gốc	`Llama-3.1-8B-Instruct`
Kiến trúc	QLoRA / DPO
Hạ tầng huấn luyện	Google Cloud Vertex AI
GPU sử dụng	NVIDIA L4 / RTX PRO 6000
Pipeline huấn luyện	SFT + DPO
Tối ưu hóa	Unsloth

🌿 Branches

Branch	Mô tả
`main`	bản inference-ready 4-bit mới nhất (stable release mặc định)
`v1-bf16`	full-quality BF16 weights của EmpathAI v1
`v1-4bit`	phiên bản 4-bit của EmpathAI v1
`v1-gguf`	GGUF export của EmpathAI v1 cho llama.cpp / LM Studio / Ollama
`v2-bf16`	full-quality BF16 weights của EmpathAI v2
`v2-4bit`	phiên bản 4-bit của EmpathAI v2
`v2-gguf`	GGUF export của EmpathAI v2 cho llama.cpp / LM Studio / Ollama

🚀 Hướng dẫn sử dụng

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "thanhhoangnvbg/empathAI-llama3.1-8b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {
        "role": "system",
        "content": """Bạn là EmpathAI, chuyên viên CSKH e-commerce tiếng Việt.

Nguyên tắc:
- Không tự bịa trạng thái đơn hàng.
- Không tự hứa hoàn tiền/voucher khi chưa có căn cứ.
- Nếu thiếu dữ liệu, hãy yêu cầu thêm thông tin.
- Giữ giọng điệu bình tĩnh và chuyên nghiệp."""
    },
    {
        "role": "user",
        "content": "Đơn tôi giao trễ 5 ngày rồi đấy."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    inputs,
    max_new_tokens=128,
    temperature=0.5
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🦙 GGUF / Local Inference

EmpathAI hỗ trợ đầy đủ GGUF để chạy local inference với:

Ollama
llama.cpp
LM Studio
KoboldCpp
OpenWebUI

Available Quantizations

File	Recommended Use
`Q4_K_M.gguf`	Cân bằng tốt giữa chất lượng và tốc độ
`Q5_K_M.gguf`	Chất lượng cao hơn, dùng nhiều VRAM/RAM hơn

🚀 Chạy với Ollama

Tạo Modelfile:

FROM ./empathAI-llama3.1-8b.Q4_K_M.gguf

TEMPLATE """{{ .Prompt }}"""

PARAMETER temperature 0.5
PARAMETER num_ctx 4096

Build model:

ollama create empathai -f Modelfile

Run:

ollama run empathai

🚀 Chạy với llama.cpp

./llama-cli \
--model empathAI-llama3.1-8b.Q4_K_M.gguf \
-p "Xin chào"

💻 Recommended Hardware

Quant	RAM / VRAM khuyến nghị
Q4_K_M	~8GB+
Q5_K_M	~10GB+

🎯 Mục tiêu của project

EmpathAI không hướng tới:

reasoning tổng quát
coding assistant
general-purpose chatbot

Mục tiêu chính là:

realistic Vietnamese customer-support workflow
toxic customer handling
de-escalation
policy-aware support
safer e-commerce interactions

🔥 Ghi chú

Project hiện đang được maintain và cải tiến liên tục thông qua:

dataset cleaning
synthetic data generation
DPO refinement
benchmark evaluation
safety-focused iteration

Các release mới sẽ tập trung vào:

giảm hallucination
cải thiện workflow thực tế
tăng khả năng multi-turn
tăng độ ổn định khi dùng với RAG/tool systems
tăng realism cho toxic e-commerce support

Downloads last month: 2,194

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for thanhhoangnvbg/empathAI-llama3.1-8b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit