Instructions to use Open-Orca/Mistral-7B-OpenOrca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Open-Orca/Mistral-7B-OpenOrca with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open-Orca/Mistral-7B-OpenOrca") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Open-Orca/Mistral-7B-OpenOrca") model = AutoModelForCausalLM.from_pretrained("Open-Orca/Mistral-7B-OpenOrca") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Open-Orca/Mistral-7B-OpenOrca with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Open-Orca/Mistral-7B-OpenOrca" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/Mistral-7B-OpenOrca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Open-Orca/Mistral-7B-OpenOrca
- SGLang
How to use Open-Orca/Mistral-7B-OpenOrca with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Open-Orca/Mistral-7B-OpenOrca" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/Mistral-7B-OpenOrca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Open-Orca/Mistral-7B-OpenOrca" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/Mistral-7B-OpenOrca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Open-Orca/Mistral-7B-OpenOrca with Docker Model Runner:
docker model run hf.co/Open-Orca/Mistral-7B-OpenOrca
Does your fine-tuning process overfit?
Thanks for your contribution.
After I fine tuning LLaMA-13B on OpenOrca or SlimOrca, I want to ask two questions.
- What is your training configuration? Such as, GPU numers, learning rate, fine tune strategy and epoch numbers.
- Does your fine-tuning process overfit? When i start the second epoch, the training loss dropped significantly. Is this normal? Do you have any suggestions to avoid this problem?
For the compute config, it is 8x a6000 gpus, rented from runpod.io. To prevent overfitting we use packing, which also will speed up training a considerable amount. As far as that trainer we use, it is called axolotl, and you can find it here https://github.com/OpenAccess-AI-Collective/axolotl. for learning rate and all other config options, in the configs folder on each model there is a yaml file which details all the options which axolotl uses.
Hope that helps!
Thanks for storing the axolotl config! I suggest you add this to the model card so that people know where to find it :] just my 2c