Instructions to use tiiuae/falcon-40b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tiiuae/falcon-40b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tiiuae/falcon-40b-instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b-instruct", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tiiuae/falcon-40b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tiiuae/falcon-40b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/tiiuae/falcon-40b-instruct

SGLang

How to use tiiuae/falcon-40b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tiiuae/falcon-40b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tiiuae/falcon-40b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tiiuae/falcon-40b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use tiiuae/falcon-40b-instruct with Docker Model Runner:
```
docker model run hf.co/tiiuae/falcon-40b-instruct
```

falcon-40b-instruct

Commit History

Move to in-library model (for real this time) (#86)

ecb78d9

Rocketknight1 HF Staff commited on Sep 29, 2023

Revert in-library PR (#67)

ca78eac

sgugger commited on Jul 13, 2023

Move to in-library checkpoint (#60)

7475ff8

FalconLLM commited on Jul 12, 2023

Update README.md: Update Model Description to reference Falcon-40B as the base model for falcon-40b-instruct (#17)

1e7fdcc

FalconLLM

AliSab commited on Jun 9, 2023

Add usage recommendations

6f1c467

FalconLLM commited on Jun 9, 2023

Update citation info

4e8f82c

FalconLLM commited on Jun 5, 2023

Add hf endpoint handler.py (#30)

b55d7ec

FalconLLM

olivierdehaene commited on Jun 5, 2023

Remove TII Falcon LLM license

5b94094

FalconLLM commited on May 31, 2023

Update license information to Apache 2.0

7205e7e

FalconLLM commited on May 31, 2023

Update README.md (#10)

8fac8c1

victor HF Staff commited on May 31, 2023

license: tii-falcon-llm (#5)

93e2373

FalconLLM

julien-c HF Staff commited on May 30, 2023

Update config.json

662a9a4

FalconLLM commited on May 30, 2023

Update README.md

de92598

FalconLLM commited on May 30, 2023

Update modelling_RW.py

357dc3f

FalconLLM commited on May 30, 2023

Add model card

38377d0

slippylolo commited on May 26, 2023

Add license

74aef5b

slippylolo commited on May 26, 2023

Update modelling_RW.py

74e0515

Daniel Hesslow commited on May 25, 2023

Upload tokenizer

7a666cb

Daniel Hesslow commited on May 25, 2023

Upload RWForCausalLM

e22aa38

Daniel Hesslow commited on May 25, 2023

initial commit

61aa3b4

DanielHesslow commited on May 25, 2023

Commit History

Move to in-library model (for real this time) (#86) ecb78d9

Revert in-library PR (#67) ca78eac

Move to in-library checkpoint (#60) 7475ff8

Update README.md: Update Model Description to reference Falcon-40B as the base model for falcon-40b-instruct (#17) 1e7fdcc

Add usage recommendations 6f1c467

Update citation info 4e8f82c

Add hf endpoint handler.py (#30) b55d7ec

Remove TII Falcon LLM license 5b94094

Update license information to Apache 2.0 7205e7e

Update README.md (#10) 8fac8c1

license: tii-falcon-llm (#5) 93e2373

Update config.json 662a9a4

Update README.md de92598

Update modelling_RW.py 357dc3f

Add model card 38377d0

Add license 74aef5b

Update modelling_RW.py 74e0515

Upload tokenizer 7a666cb

Upload RWForCausalLM e22aa38

initial commit 61aa3b4

Move to in-library model (for real this time) (#86)

ecb78d9

Revert in-library PR (#67)

ca78eac

Move to in-library checkpoint (#60)

7475ff8

Update README.md: Update Model Description to reference Falcon-40B as the base model for falcon-40b-instruct (#17)

1e7fdcc

Add usage recommendations

6f1c467

Update citation info

4e8f82c

Add hf endpoint handler.py (#30)

b55d7ec

Remove TII Falcon LLM license

5b94094

Update license information to Apache 2.0

7205e7e

Update README.md (#10)

8fac8c1

license: tii-falcon-llm (#5)

93e2373

Update config.json

662a9a4

Update README.md

de92598

Update modelling_RW.py

357dc3f

Add model card

38377d0

Add license

74aef5b

Update modelling_RW.py

74e0515

Upload tokenizer

7a666cb

Upload RWForCausalLM

e22aa38

initial commit

61aa3b4