tiiuae
/

Falcon-H1-Tiny-R-90M

Text Generation

Model card Files Files and versions

Falcon-H1-Tiny-R-90M / README.md

JingweiZuo's picture

Update README.md

ac5b2d0 verified 25 days ago

|

history blame contribute delete

3.16 kB

	---
	library_name: transformers
	tags:
	- falcon-h1
	- edge
	license: other
	license_name: falcon-llm-license
	license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
	---

	<img src="https://cdn-uploads.huggingface.co/production/uploads/62441d1d9fdefb55a0b7d12c/l1du02RjuAZJcksI5tQ-F.png" alt="drawing" width="800"/>

	# Table of Contents

	0. [TL;DR](#TL;DR)
	1. [Model Details](#model-details)
	2. [Training Details](#training-details)
	3. [Usage](#usage)
	4. [Evaluation](#evaluation)
	5. [Citation](#citation)

	# TL;DR

	# Model Details

	## Model Description

	- Developed by: [https://www.tii.ae](https://www.tii.ae)
	- Model type: Causal decoder-only
	- Architecture: Hybrid Transformers + Mamba architecture
	- Language(s) (NLP): English
	- Number of Parameters: 90M
	- License: Falcon-LLM License

	# Training details

	For more details about the training protocol of this model, please refer to the [Falcon-H1-Tiny technical blogpost](https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost).

	# Usage

	Currently to use this model you can either rely on Hugging Face `transformers`, `vLLM`, `sglang`, `llama.cpp`, `ollama` or `mlx` library.

	## Inference

	### 🤗 transformers

	Refer to the snippet below to run H1 models using 🤗 transformers:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "tiiuae/Falcon-H1-Tiny-R-90M"

	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Perform text generation
	```

	or

	```bash
	transformers serve tiiuae/Falcon-H1-Tiny-R-90M
	```

	### `llama.cpp`

	You can find all GGUF files compatible with `llama.cpp` under [our official collection]() - an example setup could be:

	```bash
	brew install llama.cpp
	pip install huggingface_hub
	hf download tiiuae/Falcon-H1-Tiny-R-90M-GGUF Falcon-H1-Tiny-R-90M-Q8_0.gguf --local-dir ./
	llama-cli ./Falcon-H1-Tiny-R-90M-Q8_0.gguf -cnv
	```

	### `ollama`

	```bash
	ollama run hf.co/tiiuae/Falcon-H1-Tiny-R-90M:Q8_0
	```

	### Apple `mlx`

	```bash
	mlx_lm.chat --model tiiuae/Falcon-H1-Tiny-R-90M
	```

	### vLLM

	For vLLM, simply start a server by executing the command below:

	```bash
	# pip install vllm>=0.9.0
	vllm serve tiiuae/Falcon-H1-Tiny-R-90M --tensor-parallel-size 2 --data-parallel-size 1
	```

	### sglang

	```bash
	python -m sglang.launch_server \
	--model ttiiuae/Falcon-H1-Tiny-R-90M \
	--tensor-parallel-size 1
	```

	# Evaluation

	For detailed evaluation of Falcon-H1-Tiny series, please refer to our [technical blogpost](https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost)

	# Useful links

	- View [our release blogpost](https://huggingface.co/spaces/tiiuae/tiny-h1-blogpost).
	- Feel free to join [our discord server](https://discord.gg/trwMYP9PYm) if you have any questions or to interact with our researchers and developers.

	# Citation

	If the Falcon-H1-Tiny family of models were helpful to your work, feel free to give us a cite.

	```
	@misc{falcon_h1_tiny,
	title={Falcon-H1-Tiny: A series of extremely small, yet powerful language models redefining capabilities at small scale},
	author={Falcon-LLM Team},
	year={2026},
	}
	```