| | --- |
| | license: apache-2.0 |
| | library_name: transformers |
| | datasets: |
| | - rLLM/rLLM-FinQA-Dataset |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen3-4B-Instruct-2507 |
| | pipeline_tag: text-generation |
| | tags: |
| | - finance |
| | - tool-use |
| | - agent |
| | --- |
| | <div align="center"> |
| | <span style="font-family: default; font-size: 1.5em;">FinQA</span> |
| | <div> |
| | Training Financial Agents with Reinforcement Learning |
| | </div> |
| | </div> |
| | <br> |
| | <div align="center" style="line-height: 1;"> |
| | <a href="https://github.com/rllm-org/rllm" style="margin: 2px;"> |
| | <img alt="Code" src="https://img.shields.io/badge/FinQA-000000?style=for-the-badge&logo=github&logoColor=000&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://rllm-project.com/post.html?post=finqa.md" target="_blank" style="margin: 2px;"> |
| | <img alt="Blog" src="https://img.shields.io/badge/Blog-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://x.com/rllm_project" style="margin: 2px;"> |
| | <img alt="X.ai" src="https://img.shields.io/badge/rLLM-white?style=for-the-badge&logo=X&logoColor=000&color=000&labelColor=white" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | <a href="https://huggingface.co/rLLM" style="margin: 2px;"> |
| | <img alt="Hugging Face" src="https://img.shields.io/badge/rLLM-fcd022?style=for-the-badge&logo=huggingface&logoColor=000&labelColor" style="display: inline-block; vertical-align: middle;"/> |
| | </a> |
| | </div> |
| | </div> |
| | </div> |
| | |
| | ## FinQA Overview |
| |
|
| | FinQA is a financial question-answering agent fine-tuned from Qwen3-4B-Instruct-2507 using reinforcement learning (RL). The model answers questions about SEC 10-K financial statements using specialized tools (SQL queries, table lookup, calculators), achieving 59.70% accuracy on Snorkel Finance Benchmark and 26.6% on Snorkel Finance Reasoning. |
| |
|
| | ## Data |
| |
|
| | Our training dataset is built from SEC 10-K filings and consists of 5,110 question-answer pairs across: |
| | - **207 companies** spanning multiple sectors |
| | - **6,923 financial tables** extracted from 10-K filings |
| | - **Single-table questions**: Direct lookups and calculations from individual tables |
| | - **Multi-table questions**: Cross-table reasoning requiring data from multiple sources |
| |
|
| | The dataset is available on [HuggingFace](https://huggingface.co/datasets/rLLM/rLLM-FinQA-Dataset). |
| |
|
| | ## Tools |
| |
|
| | The agent uses 4 specialized tools for financial analysis: |
| |
|
| | | Tool | Description | |
| | |------|-------------| |
| | | `get_table_names` | List available tables for a given company | |
| | | `get_table_info` | Get table metadata, columns, dtypes, and sample values | |
| | | `sql_query` | Execute SQL queries on financial tables (SQLite) | |
| | | `calculator` | Evaluate mathematical expressions | |
| |
|
| | ## Training |
| |
|
| | We fine-tune Qwen3-4B-Instruct-2507 using GRPO with LLM-as-judge rewards for correctness evaluation. A more detailed description of the training recipe can be found in our [documentation](https://rllm-project.readthedocs.io/en/latest/projects/finqa/). |
| |
|
| | ## Evaluation |
| |
|
| | | Model | FinQA | FinQA Reasoning | |
| | |-------|-------|-----------------| |
| | | Qwen3-4B-Instruct-2507 (Base) | 27.90% | 13.90% | |
| | | gpt-5-nano-2025-08-07 | 50.00% | 26.60% | |
| | | Qwen3-235B-A22B | 51.37% | 18.90% | |
| | | **rLLM-FinQA-4B (Ours)** | **59.70%** | **26.60%** | |
| | | Gemini-2.5-Pro-Preview | 60.60% | 34.60% | |
| | | GPT-4.1-2025-04-14 | 62.70% | 37.90% | |
| | | o3-mini-2025-01-31 | 63.79% | 30.37% | |
| |
|
| |
|
| | ## Serving FinQA |
| |
|
| | Start a vLLM server and run the agent: |
| |
|
| | ```bash |
| | python -m vllm.entrypoints.openai.api_server \ |
| | --model rLLM/rLLM-FinQA-4B \ |
| | --host 0.0.0.0 \ |
| | --port 30000 \ |
| | --dtype bfloat16 |
| | |
| | python -m projects.finqa.run_finqa |
| | ``` |
| |
|
| | For detailed setup instructions, see the [project README](https://github.com/rllm-org/rllm/tree/main/projects/finqa). |
| |
|
| | ## Acknowledgement |
| |
|
| | - This is a joint collaboration between the [rLLM](https://github.com/rllm-org/rllm) team at UC Berkeley and [Snorkel AI](https://snorkel.ai/). |
| | - Our model is trained on top of [`Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507). |
| | - Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/). |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{rllm2026finqa, |
| | title={FinQA: Training Financial Agents with Reinforcement Learning}, |
| | author={Manan Roongta and Sijun Tan and Bhavishya Pohani and Charles Dickens and Christopher Glaze}, |
| | year={2026}, |
| | howpublished={\url{https://rllm-project.com/post.html?post=finqa.md}}, |
| | note={Blog Post} |
| | } |
| | ``` |