Instructions to use tiiuae/falcon-40b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-40b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-40b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-40b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tiiuae/falcon-40b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-40b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-40b
- SGLang
How to use tiiuae/falcon-40b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-40b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-40b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-40b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-40b with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-40b
Additional Languages - Turkish
Hello, congratulations for this amazing work.
Do you have any plans to incorporate the Turkish language into this model? The Turkish language is widely studied in academia, and there is a significant community of individuals developing commercial applications with natural language processing (NLP). Additionally, it is worth noting that the government supports an annual competition specifically focused on Turkish NLP.
Not at this time, this is a primarily an English only model. We've added a some European languages which are related and should not incur too much of a performance penalty.
High quality multilingual models are an interesting topic which I'm sure we will get back to at some point though.
I was excited to hear that there was a model coming from an institution based in the UAE. I came here racing, expecting it to be versatile with Arabic, but was quite disappointed to find that it wasn't trained on it at all. Should we expect an upcoming version - in the near future - trained extensively on Arabic sources?
Does it support swedish language as good as open AI does?
Hi Kemal and Hatem,
For this model, we focused on English first and foremost, and added European languages for which we could gather enough data in our web crawl. To avoid issues with tokenization, we only included European languages using the latin alphabet.
We have also been working on state-of-the-art Arabic language models, and hopefully you get to hear about them soon π€.
@hassanback , we do not have good evaluation coverage in Swedish, so this is difficult to answer. Happy to hear back from you if you end up testing this!
First of all thank you very much for this model.
Turkish is a European language with latin alphabet, Turkey and its culture is very different than Arabic countries (by far) .Secular, latin alphabet , no islamic rule , totally free and governed by law. And far more democratic than most of the western countries however which is not enough for citizen thats why people find it anti-democratic (it is not relatively).
So I ve been giving a try to fine tune it with %15 of Stanford-alpaca instruction set translated to Turkish. It seems promising. Would it differ to fine tune it afterwards using QLORA orther than pretrain it ?
I am using instruction based json dataset. Would it be logical to give simple text such as wikipedia in Turkish, before giving instruction based data ?
Btw it is going like this:
Saving model checkpoint to ./falcon-40b-instruct-4bit-alpaca/checkpoint-5300
Trainer.model is not a PreTrainedModel, only saving its state dict.
Deleting older checkpoint [falcon-40b-instruct-4bit-alpaca/checkpoint-5150] due to args.save_total_limit
{'loss': 0.7811, 'learning_rate': 9.39177797950979e-05, 'epoch': 2.06}
{'loss': 0.9781, 'learning_rate': 9.372325249643366e-05, 'epoch': 2.06}
{'loss': 0.9802, 'learning_rate': 9.35287251977694e-05, 'epoch': 2.07}
{'loss': 0.7647, 'learning_rate': 9.333419789910517e-05, 'epoch': 2.07}
{'loss': 0.8621, 'learning_rate': 9.313967060044092e-05, 'epoch': 2.07}
{'loss': 1.0175, 'learning_rate': 9.294514330177668e-05, 'epoch': 2.07}
{'loss': 0.8003, 'learning_rate': 9.275061600311242e-05, 'epoch': 2.07}
{'loss': 0.9179, 'learning_rate': 9.255608870444818e-05, 'epoch': 2.08}
{'loss': 0.9157, 'learning_rate': 9.236156140578393e-05, 'epoch': 2.08}
{'loss': 0.9958, 'learning_rate': 9.216703410711969e-05, 'epoch': 2.08}
69%|βββββββββββββββ ...
Questions reminder : Pretrain vs QLORA ? First Simple Text than Instruction based using QLORA ?
I have finished that. Results are promising entire Stanford alpaca dataset should take a day using A100 40GB with falcon 40B.
Great job, can we try it somewhere if you do the entire dataset?
I dont plan to train entire dataset. However it would be very wise to make the model generate most probable answers to instructions (top_p top_k temperature) and then using gpt3.5-turbo api to translate them turkish and feeding them into model. Such very smaller dataset gave a lot better output than standford aplaca. Only 4k instructions exceeded my preivious 12k Stf-Alpc finetuning. Answered coherently to many questions. So If I try this again, I will try this that way with 20K instruction or so. Then I will share.
Bard says yes via fine tuning. Can LLMs be fine-tuned to add new languages?