Instructions to use tiiuae/falcon-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-7b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tiiuae/falcon-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-7b
- SGLang
How to use tiiuae/falcon-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-7b with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-7b
configuration_RW.py missing in latest commit
Could not locate the configuration_RW.py inside tiiuae/falcon-7b.
HTTPError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py in hf_raise_for_status(response, endpoint_name)
260 try:
--> 261 response.raise_for_status()
262 except HTTPError as e:
12 frames
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/tiiuae/falcon-7b/resolve/main/configuration_RW.py
The above exception was the direct cause of the following exception:
EntryNotFoundError Traceback (most recent call last)
EntryNotFoundError: 404 Client Error. (Request ID: Root=1-64af215f-17743c4c17dd4b075a71a76c;cc383af2-79fb-445b-b791-6819c7f1439f)
Entry Not Found for url: https://huggingface.co/tiiuae/falcon-7b/resolve/main/configuration_RW.py.
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash)
461 if revision is None:
462 revision = "main"
--> 463 raise EnvironmentError(
464 f"{path_or_repo_id} does not appear to have a file named {full_filename}. Checkout "
465 f"'https://huggingface.co/{path_or_repo_id}/{revision}' for available files."
OSError: tiiuae/falcon-7b does not appear to have a file named configuration_RW.py. Checkout 'https://huggingface.co/tiiuae/falcon-7b/main' for available files.
I'm getting the same error .
this is causing all the models which are based off this (ex: ybelkada/falcon-7b-sharded-bf16) to stop loading.
Same issue faced by me. This commit is very disappointing.I think we can think of 2 below workarounds
A. Can we use the missing file from local storage where the script is executing from?
B. Also if can load a previous revision of a model commit in the above code. Then I think we can bypass this.
I'm getting the same error .
this is causing all the models which are based off this (ex: ybelkada/falcon-7b-sharded-bf16) to stop loading.
My code also referring this repo. Thanks for pointing it out
Using FalconForCausalLM seems to work as the other configuration file was removed.
Yes @ditchtech worked for me too. Thank you :)
may I know how did you use FalconForCausalLM? I can't seem to find how to import it
Yes @ditchtech worked for me too. Thank you :)
may I know how did you use FalconForCausalLM? I can't seem to find how to import it
import FalconForCausalLM
@ditchtech thanks a million :) !
Using FalconForCausalLM seems to work as the other configuration file was removed.
from transformers import FalconForCausalLM
model_name_or_path = "huggingface/falcon-40b-gptq"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = FalconForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=True,
revision="main")
It works. Thanks @ditchtech