FinAI / README.md

Add proper model card

77cfc79 verified 9 days ago

5.16 kB

license: mit
language:
  - en
tags:
  - finance
  - text-generation
  - mixture-of-experts
  - continual-learning
  - financial-nlp
  - custom-architecture
library_name: transformers
pipeline_tag: text-generation

Meridian.AI — Finance Language Model

Meridian.AI is a custom sparse Mixture-of-Experts (MoE) language model continually trained on finance data. It is designed to run on commodity CPU hardware (including GitHub Actions free runners) and improves automatically via scheduled training runs.

Not financial advice. This is an experimental research model.

Model Details

Property	Value
Architecture	Custom SMoE + GQA + RoPE + SwiGLU + Numeracy Encoding
Total parameters	~479M (tied embeddings)
Unique parameters	~283M
Experts	8 total, top-2 active per token
Tokenizer	`Qwen/Qwen2.5-0.5B` (151k vocab)
Context length	2048 tokens
Training method	Continual learning with EWC (Elastic Weight Consolidation)
License	MIT

Architecture

Meridian.AI is a fully custom transformer built from scratch with the following components:

Sparse MoE FFN — 8 experts per MoE layer, top-2 routing. Only 2 of 8 experts activate per token, keeping compute low while retaining capacity. MoE layers alternate every 2nd transformer layer.
Grouped Query Attention (GQA) — 12 query heads, 4 key/value heads. Reduces memory bandwidth during inference.
Rotary Position Embeddings (RoPE) — rope_theta=500,000 for length generalisation.
SwiGLU FFN — activation function used in dense layers and expert FFNs.
RMSNorm — replaces LayerNorm for faster normalisation.
Financial Numeracy Encoding — a learned 64-dim embedding for numeric tokens to improve precision on quantitative finance tasks.
Elastic Weight Consolidation (EWC) — prevents catastrophic forgetting across continual training runs.
Tied word embeddings — input embeddings and lm_head share weights, saving ~197M parameters.

How to Use

The model weights are stored under the checkpoint/ subfolder in this repo.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "meridianal/FinAI"

tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="checkpoint")
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    subfolder="checkpoint",
    trust_remote_code=True,
    torch_dtype=torch.float32,
    low_cpu_mem_usage=True,
)
model.eval()

prompt = """### Instruction:
What does a high price-to-earnings ratio indicate about a stock?

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.92,
        repetition_penalty=1.3,
        no_repeat_ngram_size=3,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Prompt format

All training examples use this instruction/response format:

### Instruction:
<your question or task>

### Response:
<answer>

Classification tasks are also formatted this way with a short label-only response.

Generation tips

Continual training can introduce mild repetition. Recommended settings:

Parameter	Range
`temperature`	0.7 – 0.95
`top_p`	0.85 – 0.95
`repetition_penalty`	1.2 – 1.4
`no_repeat_ngram_size`	3

If you see repeated phrases, increase repetition_penalty and lower temperature.

Training Data

Training streams finance datasets from the FinanceMTEB family:

Financial sentiment analysis (FinancialPhraseBank, etc.)
ESG and sustainability classification
FOMC statement analysis
Fraud and financial complaint datasets
Financial QA pairs
Earnings call and filing excerpts

Datasets are loaded in streaming mode with a 15MB-per-source cap to stay within GitHub Actions memory limits.

Continual Learning

The model trains automatically via GitHub Actions on a scheduled hourly cron. Key features:

EWC regularisation — Fisher information matrix computed from recent data protects previously learned weights from being overwritten.
RAM-safe checkpointing — training halts and saves before hitting memory limits (MAX_RAM_GB=13).
Optimizer-free saves — AdaFactor optimizer state is discarded before upload to keep checkpoint size small.
Auto-recovery — each run pulls the latest checkpoint from this repo before training, resuming from where the last run left off.

Limitations

Experimental model — outputs may be incorrect, hallucinated, or outdated.
Not intended for production financial applications.
Continual training without human evaluation gates means quality can regress between runs.
Numeric reasoning is improved by the numeracy encoder but not guaranteed accurate.

Source Code

Training pipeline, architecture, and CI workflows:
github.com/MeridianAlgo/FinAI