Instructions to use apple/sage-ft-mixtral-8x7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use apple/sage-ft-mixtral-8x7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="apple/sage-ft-mixtral-8x7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("apple/sage-ft-mixtral-8x7b") model = AutoModelForCausalLM.from_pretrained("apple/sage-ft-mixtral-8x7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use apple/sage-ft-mixtral-8x7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "apple/sage-ft-mixtral-8x7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/sage-ft-mixtral-8x7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/apple/sage-ft-mixtral-8x7b
- SGLang
How to use apple/sage-ft-mixtral-8x7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "apple/sage-ft-mixtral-8x7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/sage-ft-mixtral-8x7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "apple/sage-ft-mixtral-8x7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "apple/sage-ft-mixtral-8x7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use apple/sage-ft-mixtral-8x7b with Docker Model Runner:
docker model run hf.co/apple/sage-ft-mixtral-8x7b
SAGE Dialogue Gen 🌱
Authors: Yizhe Zhang, Navdeep Jaitly (Apple)
Model Information
- Language: English
- License: Apache 2.0
- Base Model: mistralai/Mixtral-8x7B-Instruct-v0.1
- Library: transformers
- Tags: dialog-generation, conversational-ai, state-action-model
- Dataset: ShareGPT
- Metrics: Custom emotional-intelligence evaluation
Citation
@misc{zhang2025sage,
title = {SAGE: Steering and Refining Dialogue Generation with State‑Action Augmentation},
author = {Zhang, Yizhe and Jaitly, Navdeep},
year = {2025},
howpublished = {arXiv preprint},
note = {arXiv:2503.03040}
}
📄 Paper: Available on arXiv and Papers with Code
Model Description
SAGE introduces latent state-action variables between dialogue turns, enabling:
- Structured Control: Precise management of emotional tone and conversational strategy
- Enhanced Emotional Intelligence: Explicit state planning for more empathetic responses
- Self-Improving Pipeline: Comprehensive training approach including:
- Data augmentation
- Dialogue-tree search
- Reward modeling
- Fine-tuning optimization
This approach allows for more nuanced and contextually appropriate dialogue generation compared to traditional methods.
Intended Uses
✅ Recommended Applications
- Emotional or empathetic chatbots
- Long-horizon, strategy-aware conversation systems
- Research on structured latent-variable dialogue control
- Educational conversational AI systems
- Customer service applications requiring emotional intelligence
⚠️ Important Limitations
- Not suitable for high-stakes, safety-critical deployment without further evaluation
- Requires additional testing for production environments
- May need domain-specific fine-tuning for specialized applications
Training Details
Base Model: Mixtral-8x7B-Instruct
Training Pipeline:
- Data Preparation: ShareGPT-style JSON formatting
- Supervised Fine-Tuning (SFT): Initial model adaptation
- Dialogue-Tree Search: Exploration of conversation paths
- Preference Learning: Reward model training
- Comparative Evaluation: Performance assessment and inference optimization
Performance
SAGE demonstrates significant improvements on emotional-intelligence metrics compared to baseline models while maintaining generative flexibility and coherence. The model shows particular strength in:
- Emotional tone consistency
- Contextual appropriateness
- Long-term conversation planning
- Empathetic response generation
Usage
Quick Start
git clone https://github.com/apple/ml-sage-dialog-gen
cd ml-sage-dialog-gen
bash setup.sh
Basic Implementation
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the model
tokenizer = AutoTokenizer.from_pretrained("apple/sage-dialogue-gen")
model = AutoModelForCausalLM.from_pretrained("apple/sage-dialogue-gen")
# Generate dialogue
input_text = "I'm feeling overwhelmed with work lately."
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=150, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Requirements
- Python 3.8+
- PyTorch 1.12+
- Transformers 4.21+
- Additional dependencies listed in
requirements.txt
Contributing
Contributions are welcome! Please see our contributing guidelines and code of conduct before submitting pull requests.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Acknowledgments
- Built upon the Mixtral-8x7B-Instruct foundation model
- Trained using ShareGPT dataset
- Developed by the Apple Machine Learning Research team
Contact
For questions or issues, please open a GitHub issue or contact the development team through the official Apple ML research channels.
- Downloads last month
- 104