Instructions to use 0dAI/0dAI-7.5B-v2-4bpw with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 0dAI/0dAI-7.5B-v2-4bpw with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="0dAI/0dAI-7.5B-v2-4bpw")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("0dAI/0dAI-7.5B-v2-4bpw") model = AutoModelForCausalLM.from_pretrained("0dAI/0dAI-7.5B-v2-4bpw") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use 0dAI/0dAI-7.5B-v2-4bpw with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "0dAI/0dAI-7.5B-v2-4bpw" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0dAI/0dAI-7.5B-v2-4bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/0dAI/0dAI-7.5B-v2-4bpw
- SGLang
How to use 0dAI/0dAI-7.5B-v2-4bpw with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "0dAI/0dAI-7.5B-v2-4bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0dAI/0dAI-7.5B-v2-4bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "0dAI/0dAI-7.5B-v2-4bpw" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "0dAI/0dAI-7.5B-v2-4bpw", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use 0dAI/0dAI-7.5B-v2-4bpw with Docker Model Runner:
docker model run hf.co/0dAI/0dAI-7.5B-v2-4bpw
Files are missing
En el model.safetensors.index.json se indican tres archivos .safetensor pero en el repositorio s贸lo hay uno.
Three .safetensor files are indicated in the model.safetensors.index.json but there is only one in the repository.
al estar cuantizado no es necesario el archivo, no deber铆as de tener problemas para correrlo con las instrucciones del README
Intento correrlo en un servidor con la CPU, s茅 que lo ideal es tener el modelo en gguf, pero tampoco tengo suficiente RAM para convertir el modelo normal a GGUF y he intentado usar este con un script usando transformer.
Exllama es s贸lo gpu, por eso estoy buscando otras formas.
Estamos trabajando en la versi贸n GGUF
Entonces perfecto, estar茅 atento a twitter. Muchas gracias.