Instructions to use HirCoir/MiniChat-1.5-3B-Sorah with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HirCoir/MiniChat-1.5-3B-Sorah with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HirCoir/MiniChat-1.5-3B-Sorah")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HirCoir/MiniChat-1.5-3B-Sorah") model = AutoModelForCausalLM.from_pretrained("HirCoir/MiniChat-1.5-3B-Sorah") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HirCoir/MiniChat-1.5-3B-Sorah with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HirCoir/MiniChat-1.5-3B-Sorah" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HirCoir/MiniChat-1.5-3B-Sorah", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HirCoir/MiniChat-1.5-3B-Sorah
- SGLang
How to use HirCoir/MiniChat-1.5-3B-Sorah with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HirCoir/MiniChat-1.5-3B-Sorah" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HirCoir/MiniChat-1.5-3B-Sorah", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HirCoir/MiniChat-1.5-3B-Sorah" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HirCoir/MiniChat-1.5-3B-Sorah", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio new
How to use HirCoir/MiniChat-1.5-3B-Sorah with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HirCoir/MiniChat-1.5-3B-Sorah to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for HirCoir/MiniChat-1.5-3B-Sorah to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for HirCoir/MiniChat-1.5-3B-Sorah to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="HirCoir/MiniChat-1.5-3B-Sorah", max_seq_length=2048, ) - Docker Model Runner
How to use HirCoir/MiniChat-1.5-3B-Sorah with Docker Model Runner:
docker model run hf.co/HirCoir/MiniChat-1.5-3B-Sorah
MiniChat-2-3B-Sorah
MiniChat-2-3B-Sorah es un modelo de lenguaje basado en MiniChat-1.5-3B y ajustado con datos de instrucción y preferencia.
El modelo MiniChat-1.5-3B-Sorah complementa el modelo de voz Sorah, que fue creado por el proyecto Piper. El modelo Sorah está en proceso de entrenamiento y mejoras, por lo que su acceso no es público, pero puedes encontrar más información en el repositorio de Sorah Neuronal.
El modelo MiniChat-2-3B-Sorah supera a Vicuna-7B y se acerca a LLaMA-2-Chat-7B en MT-Bench.
A continuación, se muestra un ejemplo de código para usar Sorah basado en MiniChat-2-3B:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from conversation import get_default_conv_template
# Cargar MiniChat-2-3B-Sorah
tokenizer = AutoTokenizer.from_pretrained("HirCoir/minichat-3b-sorah", use_fast=False)
# Configuración para GPU
model = AutoModelForCausalLM.from_pretrained(
"HirCoir/minichat-3b-sorah",
use_cache=True,
device_map="auto",
torch_dtype=torch.float16
).eval()
# Configuración para CPU (opcional)
# model = AutoModelForCausalLM.from_pretrained(
# "HirCoir/minichat-3b-sorah",
# use_cache=True,
# device_map="cpu",
# torch_dtype=torch.float16
# ).eval()
# Crear una conversación
conv = get_default_conv_template("minichat")
# Ejemplo de pregunta
question = "Como te llamas?"
# Añadir la pregunta a la conversación
conv.append_message(conv.roles[0], question)
conv.append_message(conv.roles[1], None)
# Obtener el prompt
prompt = conv.get_prompt()
input_ids = tokenizer([prompt]).input_ids
# Generar respuesta
output_ids = model.generate(
torch.as_tensor(input_ids).cuda(),
do_sample=True,
temperature=0.7,
max_new_tokens=1024
)
output_ids = output_ids[0][len(input_ids[0]):]
output = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
# La respuesta se almacena en 'output'
# Salida: "def common_elements(arr1, arr2):\n if len(arr1) == 0:\n return []\n if len(arr2) == 0:\n return arr1\n\n common_elements = []\n for element in arr1:\n if element in arr2:\n common_elements.append(element)\n\n return common_elements"
# Una conversación multivuelta se puede realizar añadiendo preguntas continuamente a `conv`.
- Downloads last month
- 3
