Merged vLLM Model

This folder contains a merged vLLM model export produced by the project's merging workflow.

Contents typically include:

One or more .safetensors shard files (e.g. model-00001-of-00004.safetensors)
model.safetensors.index.json (index for shards)
Tokenizer files: tokenizer.json, tokenizer_config.json, special_tokens_map.json
config.json and generation_config.json

How to push this folder to Hugging Face Hub

Install dependencies:

pip install huggingface-hub

From the repository root run the helper script:

python scripts/push_to_hf.py --repo-id YOUR_USERNAME/YOUR_MODEL_NAME
# or with env token:
HF_TOKEN=xxx python scripts/push_to_hf.py --repo-id YOUR_USERNAME/YOUR_MODEL_NAME --private

Loading the model

With vLLM (recommended for inference server):

from vllm import Model
model = Model.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
# then use vLLM APIs to run inference

With Hugging Face Transformers (if applicable):

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME", trust_remote_code=True)

Notes

For large .safetensors shards, ensure you have sufficient bandwidth and storage.
If you plan to host the model publicly, review license and privacy requirements.
If you encounter upload limits, generate an access token with appropriate scopes and pass it via HF_TOKEN or --token.

If you want me to also commit the repo on Hugging Face with git lfs (instead of API uploads), say so and I can add an alternative script.

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support