Instructions to use ToolBench/ToolLLaMA-2-7b-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ToolBench/ToolLLaMA-2-7b-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ToolBench/ToolLLaMA-2-7b-v2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ToolBench/ToolLLaMA-2-7b-v2")
model = AutoModelForCausalLM.from_pretrained("ToolBench/ToolLLaMA-2-7b-v2")

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ToolBench/ToolLLaMA-2-7b-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ToolBench/ToolLLaMA-2-7b-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToolBench/ToolLLaMA-2-7b-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ToolBench/ToolLLaMA-2-7b-v2

SGLang

How to use ToolBench/ToolLLaMA-2-7b-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ToolBench/ToolLLaMA-2-7b-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToolBench/ToolLLaMA-2-7b-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ToolBench/ToolLLaMA-2-7b-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ToolBench/ToolLLaMA-2-7b-v2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ToolBench/ToolLLaMA-2-7b-v2 with Docker Model Runner:
```
docker model run hf.co/ToolBench/ToolLLaMA-2-7b-v2
```

Enhancement Request: Model Sharding for ToolLLaMA-2-7b-v2 for Better Accessibility

by Firejowl - opened Nov 8, 2023

Discussion

Firejowl

Nov 8, 2023

Hello ToolBench Community,

I hope this message finds you well. I am reaching out with a suggestion that could significantly improve the accessibility of the ToolLLaMA-2-7b-v2 model for a broader audience. As it stands, running such large models requires high-spec hardware, which may not be accessible to all users.

To address this, I propose sharding the ToolLLaMA-2-7b-v2 model. Sharding would allow users with lower-spec PCs to run the model by dividing it into smaller, more manageable pieces that could be processed in parallel or sequentially with less strain on their systems.

Moreover, considering the growing popularity of cloud-based platforms like Google Colab and Kaggle, which provide limited but free access to powerful computational resources, model sharding could also enhance the user experience on these platforms. Users could leverage the distributed nature of sharded models to run experiments and larger workloads without encountering resource limitations that often come with free tiers.

By enabling model sharding, we could democratize access to state-of-the-art models, foster greater experimentation, and inclusivity within the community.

I would love to hear your thoughts on this proposal or any alternative solutions that could facilitate running large models on less powerful machines or within the resource constraints of popular cloud services.

Thank you for considering this enhancement.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment