Instructions to use khazarai/Fino1-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use khazarai/Fino1-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="khazarai/Fino1-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/Fino1-4B")
model = AutoModelForCausalLM.from_pretrained("khazarai/Fino1-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use khazarai/Fino1-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "khazarai/Fino1-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/Fino1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/khazarai/Fino1-4B

SGLang

How to use khazarai/Fino1-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "khazarai/Fino1-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/Fino1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "khazarai/Fino1-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/Fino1-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use khazarai/Fino1-4B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for khazarai/Fino1-4B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for khazarai/Fino1-4B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for khazarai/Fino1-4B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="khazarai/Fino1-4B",
    max_seq_length=2048,
)

Docker Model Runner
How to use khazarai/Fino1-4B with Docker Model Runner:
```
docker model run hf.co/khazarai/Fino1-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Model ID

Model Details

Fino1-4B is a fine-tuned version of Qwen3-4B adapted for financial reasoning and question answering. The model was trained with LoRA parameter-efficient fine-tuning on a curated dataset derived from FinQA and enriched with GPT-4o-generated reasoning paths, enabling more structured and explainable answers to financial questions.

Model Description

Language(s) (NLP): English
License: MIT
Finetuned from model: Qwen3-4B
Domain: Finance

Uses

Direct Use

Primary use case: Answering financial reasoning questions with step-by-step structured reasoning.
Example applications:
- Financial report analysis
- Numerical reasoning over tabular data
- Structured financial Q&A for educational/research purposes

Bias, Risks, and Limitations

Not a financial advisor; outputs should not be considered professional financial advice.
May hallucinate numbers or misinterpret complex financial contexts.
Trained on ~5.5K examples; may lack coverage for niche financial instruments or international standards.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/Fino1-4B")
model = AutoModelForCausalLM.from_pretrained(
    "khazarai/Fino1-4B",
    device_map={"": 0}
)


question = """
Please answer the given financial question based on the context.
Context: note 9 2014 benefit plans the company has defined benefit pension plans covering certain employees in the united states and certain international locations . postretirement healthcare and life insurance benefits provided to qualifying domestic retirees as well as other postretirement benefit plans in international countries are not material . the measurement date used for the company 2019s employee benefit plans is september 30 . effective january 1 , 2018 , the legacy u.s . pension plan was frozen to limit the participation of employees who are hired or re-hired by the company , or who transfer employment to the company , on or after january 1 , net pension cost for the years ended september 30 included the following components: .
|( millions of dollars )|pension plans 2019|pension plans 2018|pension plans 2017|
|service cost|$ 134|$ 136|$ 110|
|interest cost|107|90|61|
|expected return on plan assets|( 180 )|( 154 )|( 112 )|
|amortization of prior service credit|( 13 )|( 13 )|( 14 )|
|amortization of loss|78|78|92|
|settlements|10|2|2014|
|net pension cost|$ 135|$ 137|$ 138|
|net pension cost included in the preceding table that is attributable to international plans|$ 32|$ 34|$ 43|
net pension cost included in the preceding table that is attributable to international plans $ 32 $ 34 $ 43 the amounts provided above for amortization of prior service credit and amortization of loss represent the reclassifications of prior service credits and net actuarial losses that were recognized in accumulated other comprehensive income ( loss ) in prior periods . the settlement losses recorded in 2019 and 2018 primarily included lump sum benefit payments associated with the company 2019s u.s . supplemental pension plan . the company recognizes pension settlements when payments from the supplemental plan exceed the sum of service and interest cost components of net periodic pension cost associated with this plan for the fiscal year . as further discussed in note 2 , upon adopting an accounting standard update on october 1 , 2018 , all components of the company 2019s net periodic pension and postretirement benefit costs , aside from service cost , are recorded to other income ( expense ) , net on its consolidated statements of income , for all periods presented . notes to consolidated financial statements 2014 ( continued ) becton , dickinson and company .
Question: what is the percentage increase in service costs from 2017 to 2018?
Answer:
"""

messages = [
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, 
    enable_thinking = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 4000,
    temperature = 0.6,
    top_p = 0.95,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Training Data

Citation

@article{chen2021finqa,
  title={Finqa: A dataset of numerical reasoning over financial data},
  author={Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and others},
  journal={arXiv preprint arXiv:2109.00122},
  year={2021}

@article{qian2025fino1,
  title={Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance},
  author={Qian, Lingfei and Zhou, Weipeng and Wang, Yan and Peng, Xueqing and Huang, Jimin and Xie, Qianqian},
  journal={arXiv preprint arXiv:2502.08127},
  year={2025}
}