How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sequelbox/Llama2-70B-SharpBalance"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sequelbox/Llama2-70B-SharpBalance",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/sequelbox/Llama2-70B-SharpBalance
Quick Links

Sharp Balance is a general capability upgrade to Llama 2 70b.

It does not have any current practical use. The model is available for legacy and reference purposes. View our profile for our latest models.

The original upload of Sharp Balance contained errors in how weights were saved, which have now been fixed. Additional issues and bugs may be expected; no support is available. Use at your own discretion.

Downloads last month
24
Safetensors
Model size
69B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sequelbox/Llama2-70B-SharpBalance

Quantizations
2 models