Instructions to use empathyai/Qwen3-0.6B-Books-Intent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use empathyai/Qwen3-0.6B-Books-Intent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="empathyai/Qwen3-0.6B-Books-Intent") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("empathyai/Qwen3-0.6B-Books-Intent") model = AutoModelForCausalLM.from_pretrained("empathyai/Qwen3-0.6B-Books-Intent") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use empathyai/Qwen3-0.6B-Books-Intent with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "empathyai/Qwen3-0.6B-Books-Intent" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "empathyai/Qwen3-0.6B-Books-Intent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/empathyai/Qwen3-0.6B-Books-Intent
- SGLang
How to use empathyai/Qwen3-0.6B-Books-Intent with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "empathyai/Qwen3-0.6B-Books-Intent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "empathyai/Qwen3-0.6B-Books-Intent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "empathyai/Qwen3-0.6B-Books-Intent" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "empathyai/Qwen3-0.6B-Books-Intent", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use empathyai/Qwen3-0.6B-Books-Intent with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for empathyai/Qwen3-0.6B-Books-Intent to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="empathyai/Qwen3-0.6B-Books-Intent", max_seq_length=2048, ) - Docker Model Runner
How to use empathyai/Qwen3-0.6B-Books-Intent with Docker Model Runner:
docker model run hf.co/empathyai/Qwen3-0.6B-Books-Intent
Qwen 0.6B Books Intent
This model is a fine-tuned version of unsloth/Qwen3-0.6B on the empathyai/books-intent-dataset dataset. It has been trained using TRL.
It has been trained on classification task of short queries about the Project Gutenberg catalog to a set of predefined intents.
The goal is to replace LLMs with smaller models for low latency and high scalable services, while achieving high quality and accuracy on the domain.
Check out this model in action in this experience!
Quick start
You must format the query to classify with the template below:
from transformers import pipeline, AutoTokenizer
# Define instruction templates
QUERY_PROMPT_INTRODUCTION = """You're an expert in Project Gutenberg. Project Gutenberg (PG) is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks. Most of the items in its collection are the full texts of books or individual stories in the public domain. Your main focus is to extract user intent."""
QUERY_PROMPT_TASK = """## Task
Given user input and context, extract the intent.
* Consider user intent:
* search_book: The user is looking for a specific book.
* search_author: The user is looking for a specific author or its biography.
* search_category: The user is looking for books of a category.
* recommendation: User is looking for books suggestions, either similar to a title or from the same author.
* novelties: User is looking for recently added books to the Project Gutenberg. Note that this is not the same as 'new books' in general, but rather books that have been added to the Project Gutenberg collection recently.
* general_questions: The user is asking general questions about books, authors, or the Project Gutenberg collection. This includes questions like 'What are the characters in this book?' or 'What is the are some interesting details about that author?'.
* out_of_domain: The user is asking something that is not related to books, the Project Gutenberg or its collection, like harmful requests or 'What's the weather like?'.
The result must be only a JSON with the following format:
{
"chat_context": "refinement|new_request",
"intent": "extracted_intent"
}
"""
def format_query(query:str)->str:
return f"""{QUERY_PROMPT_INTRODUCTION}
{QUERY_PROMPT_TASK}
## Input
{query}
## Response
"""
model_name = "empathyai/Qwen3-0.6B-Books-Intent"
generator = pipeline("text-generation", model=model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
question = format_query("who wrote frankenstein?")
messages = [{"role":"user", "content":question}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False # Disable thinking mode. Default is True.
)
output = generator(text, max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
{"chat_context": "new_request", "intent": "search_author"}
Training procedure
This model was trained with the SFT and Unsloth libraries.
Training Details
- Framework: PyTorch
- Base Model: unsloth/Qwen3-0.6B
- Dataset: empathyai/books-intent-dataset
- Infrastructure: 1 x L40S Nvidia GPU
- Training time: 11~ hours
- Hyperparameters:
- Learning Rate: 2e-5
- Weight Decay: 0.01
- Batch Size: 64 (per device)
- Gradient Accumulation Steps: 1
- Number of Epochs: 3
- Optimizer: AdamW (8-bit)
- Scheduler: Linear
- Max Gradient Norm: 1.0
- Seed: 3407
LoRA Configuration
- LoRA Rank (r): 64
- Target Modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - LoRA Alpha: 64
- LoRA Dropout: 0
- Bias: None
- Gradient Checkpointing: Disabled
Log details
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 421,353 | Num Epochs = 3 | Total steps = 19,752
O^O/ \_/ \ Batch size per device = 64 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (64 x 1 x 1) = 64
"-____-" Trainable parameters = 40,370,176/636,420,096 (6.34% trained)
Peak reserved memory = 4.881 GB.
Peak reserved memory for training = 3.453 GB.
Peak reserved memory % of max memory = 10.963 %.
Peak reserved memory for training % of max memory = 7.756 %.
Metrics
The following are metrics on a sample of the test split. We use the LLM in a classifier task by parsing the output as JSON and extracting the intent field.
| Intent | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| general_questions | 1.0 | 1.0 | 1.0 | 205.0 |
| novelties | 1.0 | 1.0 | 1.0 | 49.0 |
| out_of_domain | 1.0 | 1.0 | 1.0 | 56.0 |
| recommendation | 1.0 | 1.0 | 1.0 | 211.0 |
| search_author | 1.0 | 0.9915 | 0.9957 | 118.0 |
| search_book | 0.9956 | 1.0 | 0.9978 | 228.0 |
| search_category | 1.0 | 1.0 | 1.0 | 133.0 |
| accuracy | 0.999 | 0.999 | 0.999 | 0.999 |
| macro avg | 0.9994 | 0.9988 | 0.9991 | 1000.0 |
| weighted avg | 0.9990 | 0.999 | 0.9990 | 1000.0 |
Framework versions
- TRL: 0.15.2
- Transformers: 4.51.3
- Pytorch: 2.7.0
- Datasets: 3.5.1
- Tokenizers: 0.21.1
Model Usage
This model is designed for intent classification in the Project Gutenberg domain. As such, it may not scale well for broader domains or tasks.
Limitations
The model may not generalize well to tasks outside its training domain. See the dataset notes on bias and limitations.
Citations
Project Gutenberg. (n.d.). Retrieved May, 2025, from www.gutenberg.org.
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- -
