Instructions to use faresfawzi/ToolACE-2-8B-SCRIBE with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use faresfawzi/ToolACE-2-8B-SCRIBE with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="faresfawzi/ToolACE-2-8B-SCRIBE")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("faresfawzi/ToolACE-2-8B-SCRIBE")
model = AutoModelForCausalLM.from_pretrained("faresfawzi/ToolACE-2-8B-SCRIBE")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use faresfawzi/ToolACE-2-8B-SCRIBE with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "faresfawzi/ToolACE-2-8B-SCRIBE"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "faresfawzi/ToolACE-2-8B-SCRIBE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/faresfawzi/ToolACE-2-8B-SCRIBE

SGLang

How to use faresfawzi/ToolACE-2-8B-SCRIBE with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "faresfawzi/ToolACE-2-8B-SCRIBE" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "faresfawzi/ToolACE-2-8B-SCRIBE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "faresfawzi/ToolACE-2-8B-SCRIBE" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "faresfawzi/ToolACE-2-8B-SCRIBE",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use faresfawzi/ToolACE-2-8B-SCRIBE with Docker Model Runner:
```
docker model run hf.co/faresfawzi/ToolACE-2-8B-SCRIBE
```

Model Card for faresfawzi/ToolACE-2-8B-SCRIBE

Abstract

Language models can be used to provide interactive, personalized student feedback in educational settings. However, real-world deployment faces three key challenges: privacy concerns, limited computational resources, and the need for pedagogically valid responses. These constraints require small, open-source models that can run locally and reliably ground their outputs in correct information. We introduce SCRIBE, a framework for multi-hop, tool-augmented reasoning designed to generate valid responses to student questions about feedback reports. SCRIBE combines domain-specific tools with a self-reflective inference pipeline that supports iterative reasoning, tool use, and error recovery. We distil these capabilities into 3B and 8B models via two-stage LoRA fine-tuning on synthetic GPT-4o-generated data. Evaluation with a human-aligned GPT-Judge and a user study with 108 students shows that 8B-SCRIBE models achieve comparable or superior quality to much larger models in key dimensions such as relevance and actionability, while being perceived on par with GPT-4o and Llama-3.3 70B by students. These findings demonstrate the viability of SCRIBE for low-resource, privacy-sensitive educational applications.

Model Description

ToolACE-2-8B-SCRIBE is a fine-tuned large language model for interactive educational feedback.
It builds on Team-ACE/ToolACE-2.5-Llama-3.1-8B and incorporates the SCRIBE framework: structured chain reasoning with multi-hop tool calling and self-reflection, enabling small models to deliver pedagogically valid, actionable, and context-grounded explanations to student questions.

Developed by: EPFL (Machine Learning for Education Lab)
Paper: SCRIBE: Structured Chain Reasoning for Interactive Behavior Explanations using Tool Calling
Authors: Fares Fawzi, Vinitra Swamy, Dominik Glandorf, Tanya Nazaretsky, Tanja Käser
Model type: Tool-augmented 8B LLM fine-tuned with two-stage LoRA
Languages: English
License: Apache 2.0
Finetuned from: Team-ACE/ToolACE-2.5-Llama-3.1-8B

Uses

Direct Use

The model is designed to:

Provide personalized, interactive feedback to students in MOOCs.
Generate pedagogically grounded explanations using multi-step reasoning with tool calls.
Support privacy-sensitive deployments by running locally.

Notes

Built on a function-calling base model, enabling robust integration with external tools/APIs.
Ideal for education-focused assistants that need both reasoning and grounded outputs.

Training Details

Training Data

Base data: Feedback reports from MOOCs (DSP, GEO, VA, LNV)
Synthetic data: ~7,000 student-like questions generated with GPT-4o, including reasoning traces, tool calls, and final responses
Real data: 75 student-authored questions annotated into pedagogical categories

Training Procedure

Two-stage LoRA fine-tuning:
1. Stage 1: Initial reasoning + tool selection
2. Stage 2: Multi-hop reasoning + final answer generation
Inference: Closed-loop tool-calling with self-reflection and error recovery

Training Hyperparameters

Regime: bf16 mixed precision
LoRA rank: 256 for 8B variant

Citation

If you use this model, please cite:

BibTeX

@inproceedings{2025-EMNLP-Scribe,
  author    = {Fares Fawzi and Vinitra Swamy and Dominik Glandorf and Tanya Nazaretsky and Tanja K{\"a}ser},
  booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  title     = {SCRIBE: Structured Chain Reasoning for Interactive Behavior Explanations using Tool Calling},
  year      = {2025}
}

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16