Instructions to use empathyai/Qwen3-0.6B-Books-Intent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use empathyai/Qwen3-0.6B-Books-Intent with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="empathyai/Qwen3-0.6B-Books-Intent")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("empathyai/Qwen3-0.6B-Books-Intent")
model = AutoModelForCausalLM.from_pretrained("empathyai/Qwen3-0.6B-Books-Intent")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use empathyai/Qwen3-0.6B-Books-Intent with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "empathyai/Qwen3-0.6B-Books-Intent"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "empathyai/Qwen3-0.6B-Books-Intent",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/empathyai/Qwen3-0.6B-Books-Intent

SGLang

How to use empathyai/Qwen3-0.6B-Books-Intent with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "empathyai/Qwen3-0.6B-Books-Intent" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "empathyai/Qwen3-0.6B-Books-Intent",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "empathyai/Qwen3-0.6B-Books-Intent" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "empathyai/Qwen3-0.6B-Books-Intent",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use empathyai/Qwen3-0.6B-Books-Intent with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="empathyai/Qwen3-0.6B-Books-Intent",
    max_seq_length=2048,
)

Docker Model Runner
How to use empathyai/Qwen3-0.6B-Books-Intent with Docker Model Runner:
```
docker model run hf.co/empathyai/Qwen3-0.6B-Books-Intent
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen 0.6B Books Intent

This model is a fine-tuned version of unsloth/Qwen3-0.6B on the empathyai/books-intent-dataset dataset. It has been trained using TRL.

It has been trained on classification task of short queries about the Project Gutenberg catalog to a set of predefined intents.

The goal is to replace LLMs with smaller models for low latency and high scalable services, while achieving high quality and accuracy on the domain.

Check out this model in action in this experience!

Quick start

You must format the query to classify with the template below:

from transformers import pipeline, AutoTokenizer

# Define instruction templates
QUERY_PROMPT_INTRODUCTION = """You're an expert in Project Gutenberg. Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks. Most of the items in its collection are the full texts of books or individual stories in the public domain. Your main focus is to extract user intent."""


QUERY_PROMPT_TASK = """## Task
Given user input and context, extract the intent. 
* Consider user intent:
    * search_book: The user is looking for a specific book.
    * search_author: The user is looking for a specific author or its biography.
    * search_category: The user is looking for books of a category.
    * recommendation: User is looking for books suggestions, either similar to a title or from the same author.
    * novelties: User is looking for recently added books to the Project Gutenberg. Note that this is not the same as 'new books' in general, but rather books that have been added to the Project Gutenberg collection recently.
    * general_questions: The user is asking general questions about books, authors, or the Project Gutenberg collection. This includes questions like 'What are the characters in this book?' or 'What is the are some interesting details about that author?'.
    * out_of_domain: The user is asking something that is not related to books, the Project Gutenberg or its collection, like harmful requests or 'What's the weather like?'.

The result must be only a JSON with the following format:
{
    "chat_context": "refinement|new_request",
    "intent": "extracted_intent"
}
"""

def format_query(query:str)->str:
  return f"""{QUERY_PROMPT_INTRODUCTION}
{QUERY_PROMPT_TASK}

## Input
{query}

## Response
"""

model_name = "empathyai/Qwen3-0.6B-Books-Intent"

generator = pipeline("text-generation", model=model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

question = format_query("who wrote frankenstein?")

messages = [{"role":"user", "content":question}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False # Disable thinking mode. Default is True.
)

output = generator(text, max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

{"chat_context": "new_request", "intent": "search_author"}

Training procedure

This model was trained with the SFT and Unsloth libraries.

Training Details

Framework: PyTorch
Base Model: unsloth/Qwen3-0.6B
Dataset: empathyai/books-intent-dataset
Infrastructure: 1 x L40S Nvidia GPU
Training time: 11~ hours
Hyperparameters:
- Learning Rate: 2e-5
- Weight Decay: 0.01
- Batch Size: 64 (per device)
- Gradient Accumulation Steps: 1
- Number of Epochs: 3
- Optimizer: AdamW (8-bit)
- Scheduler: Linear
- Max Gradient Norm: 1.0
- Seed: 3407

LoRA Configuration

LoRA Rank (r): 64
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
LoRA Alpha: 64
LoRA Dropout: 0
Bias: None
Gradient Checkpointing: Disabled

Log details


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 421,353 | Num Epochs = 3 | Total steps = 19,752
O^O/ \_/ \    Batch size per device = 64 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
 "-____-"     Trainable parameters = 40,370,176/636,420,096 (6.34% trained)

Peak reserved memory = 4.881 GB.
Peak reserved memory for training = 3.453 GB.
Peak reserved memory % of max memory = 10.963 %.
Peak reserved memory for training % of max memory = 7.756 %.

Metrics

The following are metrics on a sample of the test split. We use the LLM in a classifier task by parsing the output as JSON and extracting the intent field.

Intent	Precision	Recall	F1-Score	Support
general_questions	1.0	1.0	1.0	205.0
novelties	1.0	1.0	1.0	49.0
out_of_domain	1.0	1.0	1.0	56.0
recommendation	1.0	1.0	1.0	211.0
search_author	1.0	0.9915	0.9957	118.0
search_book	0.9956	1.0	0.9978	228.0
search_category	1.0	1.0	1.0	133.0
accuracy	0.999	0.999	0.999	0.999
macro avg	0.9994	0.9988	0.9991	1000.0
weighted avg	0.9990	0.999	0.9990	1000.0

Framework versions

TRL: 0.15.2
Transformers: 4.51.3
Pytorch: 2.7.0
Datasets: 3.5.1
Tokenizers: 0.21.1

Model Usage

This model is designed for intent classification in the Project Gutenberg domain. As such, it may not scale well for broader domains or tasks.

Limitations

The model may not generalize well to tasks outside its training domain. See the dataset notes on bias and limitations.

Citations

Project Gutenberg. (n.d.). Retrieved May, 2025, from www.gutenberg.org.

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -

Model tree for empathyai/Qwen3-0.6B-Books-Intent

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

unsloth/Qwen3-0.6B

Finetuned

(237)

this model

Quantizations

1 model

empathyai
/

Qwen3-0.6B-Books-Intent

You need to agree to share your contact information to access this model

Qwen 0.6B Books Intent

Quick start

Training procedure

Training Details

LoRA Configuration

Log details

Metrics

Framework versions

Model Usage

Limitations

Citations

Model tree for empathyai/Qwen3-0.6B-Books-Intent

Dataset used to train empathyai/Qwen3-0.6B-Books-Intent