koyeb/Apple-MLX-QA
Viewer • Updated • 1.22k • 71 • 1
How to use koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("question-answering", model="koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX")
model = AutoModelForCausalLM.from_pretrained("koyeb/Meta-Llama-3.1-8B-Instruct-Apple-MLX")This model is a merge of the MLX QLORA Adapter and the base model Meta LLaMa 3.1 8B Instruct model, trained to answer questions and provide guidance on Apple's latest machine learning framework, MLX. The fine-tuning was done using the LORA (Low-Rank Adaptation) method on a custom dataset of question-answer pairs derived from the MLX documentation.
Fine-tuned on a single epoch of Apple MLX QA.
To use the model, you need to install the required dependencies:
pip install peft transformers jinja2==3.1.0
Here’s a sample code snippet to load and interact with the model:
import transformers
import torch
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
outputs = pipeline(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Base model
meta-llama/Llama-3.1-8B