Instructions to use apple/sage-ft-mixtral-8x7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use apple/sage-ft-mixtral-8x7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="apple/sage-ft-mixtral-8x7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("apple/sage-ft-mixtral-8x7b")
model = AutoModelForCausalLM.from_pretrained("apple/sage-ft-mixtral-8x7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use apple/sage-ft-mixtral-8x7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "apple/sage-ft-mixtral-8x7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "apple/sage-ft-mixtral-8x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/apple/sage-ft-mixtral-8x7b

SGLang

How to use apple/sage-ft-mixtral-8x7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "apple/sage-ft-mixtral-8x7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "apple/sage-ft-mixtral-8x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "apple/sage-ft-mixtral-8x7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "apple/sage-ft-mixtral-8x7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use apple/sage-ft-mixtral-8x7b with Docker Model Runner:
```
docker model run hf.co/apple/sage-ft-mixtral-8x7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

SAGE Dialogue Gen 🌱

Authors: Yizhe Zhang, Navdeep Jaitly (Apple)

Model Information

Language: English
License: Apache 2.0
Base Model: mistralai/Mixtral-8x7B-Instruct-v0.1
Library: transformers
Tags: dialog-generation, conversational-ai, state-action-model
Dataset: ShareGPT
Metrics: Custom emotional-intelligence evaluation

Citation

@misc{zhang2025sage,
  title = {SAGE: Steering and Refining Dialogue Generation with State‑Action Augmentation},
  author = {Zhang, Yizhe and Jaitly, Navdeep},
  year = {2025},
  howpublished = {arXiv preprint},
  note = {arXiv:2503.03040}
}

📄 Paper: Available on arXiv and Papers with Code

Model Description

SAGE introduces latent state-action variables between dialogue turns, enabling:

Structured Control: Precise management of emotional tone and conversational strategy
Enhanced Emotional Intelligence: Explicit state planning for more empathetic responses
Self-Improving Pipeline: Comprehensive training approach including:
- Data augmentation
- Dialogue-tree search
- Reward modeling
- Fine-tuning optimization

This approach allows for more nuanced and contextually appropriate dialogue generation compared to traditional methods.

Intended Uses

✅ Recommended Applications

Emotional or empathetic chatbots
Long-horizon, strategy-aware conversation systems
Research on structured latent-variable dialogue control
Educational conversational AI systems
Customer service applications requiring emotional intelligence

⚠️ Important Limitations

Not suitable for high-stakes, safety-critical deployment without further evaluation
Requires additional testing for production environments
May need domain-specific fine-tuning for specialized applications

Training Details

Base Model: Mixtral-8x7B-Instruct

Training Pipeline:

Data Preparation: ShareGPT-style JSON formatting
Supervised Fine-Tuning (SFT): Initial model adaptation
Dialogue-Tree Search: Exploration of conversation paths
Preference Learning: Reward model training
Comparative Evaluation: Performance assessment and inference optimization

Performance

SAGE demonstrates significant improvements on emotional-intelligence metrics compared to baseline models while maintaining generative flexibility and coherence. The model shows particular strength in:

Emotional tone consistency
Contextual appropriateness
Long-term conversation planning
Empathetic response generation

Usage

Quick Start

git clone https://github.com/apple/ml-sage-dialog-gen
cd ml-sage-dialog-gen
bash setup.sh

Basic Implementation

from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the model
tokenizer = AutoTokenizer.from_pretrained("apple/sage-dialogue-gen")
model = AutoModelForCausalLM.from_pretrained("apple/sage-dialogue-gen")

# Generate dialogue
input_text = "I'm feeling overwhelmed with work lately."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Requirements

Python 3.8+
PyTorch 1.12+
Transformers 4.21+
Additional dependencies listed in requirements.txt

Contributing

Contributions are welcome! Please see our contributing guidelines and code of conduct before submitting pull requests.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Acknowledgments

Built upon the Mixtral-8x7B-Instruct foundation model
Trained using ShareGPT dataset
Developed by the Apple Machine Learning Research team

Contact

For questions or issues, please open a GitHub issue or contact the development team through the official Apple ML research channels.

Downloads last month: 99

Safetensors

Model size

47B params

Tensor type

BF16

Model tree for apple/sage-ft-mixtral-8x7b

Base model

mistralai/Mixtral-8x7B-v0.1

Finetuned

mistralai/Mixtral-8x7B-Instruct-v0.1

Finetuned

(65)

this model

Quantizations

5 models

Paper for apple/sage-ft-mixtral-8x7b

SAGE: Steering and Refining Dialog Generation with State-Action Augmentation

Paper • 2503.03040 • Published Mar 4, 2025 • 2