Instructions to use leveldevai/MarcBeagle-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leveldevai/MarcBeagle-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="leveldevai/MarcBeagle-7B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("leveldevai/MarcBeagle-7B") model = AutoModelForCausalLM.from_pretrained("leveldevai/MarcBeagle-7B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use leveldevai/MarcBeagle-7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "leveldevai/MarcBeagle-7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leveldevai/MarcBeagle-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/leveldevai/MarcBeagle-7B
- SGLang
How to use leveldevai/MarcBeagle-7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "leveldevai/MarcBeagle-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leveldevai/MarcBeagle-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "leveldevai/MarcBeagle-7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leveldevai/MarcBeagle-7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use leveldevai/MarcBeagle-7B with Docker Model Runner:
docker model run hf.co/leveldevai/MarcBeagle-7B
added_tokens_decoder Seem to Cause Index Errors
I think the <|im_start|> and <|im_end|> were added after another user mentioned they weren't natively in the model vocab. But the current config throws indexing errors anytime it is used. The below code - changing the vocab size - will make it run without error, but the loss is very high (~10 when other models are ~3 on my data), I presume because the new tokens are just noise. But the indexing errors are gone. I'm not sure how to just remove them from the tokenizer once instantiated.
Seems like these tokens shouldn't be added now if the model wasn't trained with them and hasn't learned them? Am I missing something? Does it work out of the box for others?
config = AutoConfig.from_pretrained(self.args.generation_model_name)
config.vocab_size += 2
generator = AutoModelForCausalLM.from_pretrained(
'leveldevai/MarcBeagle-7B', config=config, ignore_mismatched_sizes=True,
torch_dtype=torch.bfloat16,
attn_implementation='flash_attention_2',
trust_remote_code=True
)
Thanks for noticing.
This file seems to come from one of the models used in the merge, I updated a few things and it appears to be working well for me please let me know if you see anything