Spaces:

snorkelai
/

finqa-env

Running on CPU Upgrade

App Files Files Community

finqa-env / README.md

Bhavishya Pohani

Fix base_path to / to match Gradio root mount

055d215 28 days ago

preview code

raw

history blame contribute delete

6.05 kB

metadata

title: FinQA Environment Server
emoji: 📊
colorFrom: blue
colorTo: gray
sdk: docker
pinned: false
app_port: 8000
base_path: /
datasets:
  - snorkelai/finqa-data
tags:
  - openenv

FinQA Environment

A financial question-answering environment for RL training. Evaluates LLMs on their ability to answer complex financial questions using tool calls on SEC 10-K filing data.

Based on FinQABenchmark from Snorkel AI.

Overview

FinQA tests an agent's ability to:

Explore available financial tables for a company
Query table metadata and execute SQL queries
Perform calculations on extracted data
Submit final answers to financial questions

Dataset: 290 questions from SEC 10-K filings across multiple companies (Alphabet, Amazon, Apple, AT&T, etc.)

Reward: Binary (1.0 for correct answer, 0.0 for incorrect) using fuzzy numerical matching with 1% tolerance.

Note: This dataset is for evaluation only. Do not train on it.

Quick Start

Using Docker

# Build the image (from OpenEnv repo root)
docker build -t finqa-env:latest -f envs/finqa_env/server/Dockerfile .

# Run the server
docker run -p 8000:8000 finqa-env:latest

# To run evaluation script (example model gpt-5)
API_BASE_URL=https://api.openai.com/v1 API_KEY=$OPENAI_API_KEY MODEL=gpt-5 python examples/finqa_inference.py

Local Development

# Install dependencies
uv pip install pandas

# Download data from HuggingFace
cd envs/finqa_env
./download_data.sh

Using the Client

The client uses the MCP protocol and is async by default:

import asyncio
from envs.finqa_env import FinQAEnv, CallToolAction

async def main():
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        # Reset to get a question
        obs = await env.reset()
        question = obs.metadata["question"]
        company = obs.metadata["company"]
        print(f"Question: {question}")
        print(f"Company: {company}")

        # Discover available tools
        tools = await env.list_tools()
        print([t.name for t in tools])

        # Use tools via call_tool (convenience method)
        result = await env.call_tool("get_descriptions", company_name=company)
        print(f"Available tables: {result}")

        # Or use step() with CallToolAction for full observation access
        step_result = await env.step(CallToolAction(
            tool_name="sql_query",
            arguments={
                "company_name": "alphabet",
                "table_name": "us_gaap_ScheduleOfIncomeBeforeIncomeTaxDomesticAndForeignTableTextBlock",
                "query": "SELECT * FROM data WHERE year = '2022'"
            }
        ))
        print(f"Done: {step_result.done}, Reward: {step_result.reward}")

        # Submit answer
        result = await env.call_tool("submit_answer", answer="6.118")

asyncio.run(main())

Available Tools

Tools are auto-discovered via MCP. Use await env.list_tools() to see all available tools at runtime.

Tool	Description	Arguments
`get_descriptions`	Get list of available table names for a company	`company_name: str`
`get_table_info`	Get table metadata (columns, dtypes, unique values)	`company_name: str, table_name: str`
`sql_query`	Execute SQL query on a table (requires filters)	`company_name: str, table_name: str, query: str`
`submit_answer`	Submit final answer (ends episode)	`answer: str`

Tool Constraints

sql_query: Must include filters (WHERE, HAVING, etc.). SELECT * is not allowed.

Environment Variables

Variable	Default	Description
`FINQA_DATA_PATH`	`/app/env/data`	Path to data directory
`FINQA_MAX_STEPS`	`50`	Maximum tool calls per episode
`FINQA_TASK`	`finqa`	Task name

Reward Computation

Rewards use fuzzy numerical matching:

Extracts numbers from \boxed{...} format
Handles percentages, fractions, and decimals
1% relative tolerance or 0.01 absolute tolerance
Returns 1.0 for correct, 0.0 for incorrect

Local Development

# From OpenEnv repo root
cd envs/finqa_env

# Run server locally
FINQA_DATA_PATH=./data uvicorn server.app:app --reload --port 8000

# Test with curl
curl http://localhost:8000/health
curl -X POST http://localhost:8000/reset

Integration with RL Frameworks

TRL (GRPO)

import asyncio
from trl import GRPOTrainer
from envs.finqa_env import FinQAEnv

async def rollout_func(prompts, trainer):
    async with FinQAEnv(base_url="http://localhost:8000") as env:
        obs = await env.reset()
        # Your agent logic here using await env.call_tool(...)
        return {"reward": obs.reward, "completion": completion}

trainer = GRPOTrainer(
    model=model,
    rollout_func=rollout_func,
    ...
)

Project Structure

finqa_env/
├── __init__.py           # Exports FinQAEnv, CallToolAction, ListToolsAction
├── models.py             # FinQAState and tool name constants
├── client.py             # MCP client (subclasses MCPToolClient)
├── pyproject.toml        # Dependencies
├── README.md             # This file
├── data/                 # Benchmark data (run download_data.sh)
│   ├── benchmark_questions/
│   │   └── finqa.csv
│   └── input_companies/
│       └── [company folders]
├── download_data.sh      # Downloads data from HuggingFace
└── server/
    ├── __init__.py
    ├── finqa_environment.py  # MCPEnvironment subclass with @mcp.tool decorators
    ├── tools.py              # Tool implementations
    ├── rewards.py            # Reward computation
    ├── app.py                # FastAPI server
    └── Dockerfile

FinQA Environment

Overview

Quick Start

Using Docker

Local Development

Using the Client

Available Tools

Tool Constraints

Environment Variables

Reward Computation

Local Development

Integration with RL Frameworks

TRL (GRPO)

Project Structure

References