FinAI / README.md

Add proper model card

77cfc79 verified 9 days ago

5.16 kB

	---
	license: mit
	language:
	- en
	tags:
	- finance
	- text-generation
	- mixture-of-experts
	- continual-learning
	- financial-nlp
	- custom-architecture
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Meridian.AI — Finance Language Model

	Meridian.AI is a custom sparse Mixture-of-Experts (MoE) language model continually trained on finance data. It is designed to run on commodity CPU hardware (including GitHub Actions free runners) and improves automatically via scheduled training runs.

	> Not financial advice. This is an experimental research model.

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| Custom SMoE + GQA + RoPE + SwiGLU + Numeracy Encoding \|
	\| Total parameters \| ~479M (tied embeddings) \|
	\| Unique parameters \| ~283M \|
	\| Experts \| 8 total, top-2 active per token \|
	\| Tokenizer \| `Qwen/Qwen2.5-0.5B` (151k vocab) \|
	\| Context length \| 2048 tokens \|
	\| Training method \| Continual learning with EWC (Elastic Weight Consolidation) \|
	\| License \| MIT \|

	---

	## Architecture

	Meridian.AI is a fully custom transformer built from scratch with the following components:

	- Sparse MoE FFN — 8 experts per MoE layer, top-2 routing. Only 2 of 8 experts activate per token, keeping compute low while retaining capacity. MoE layers alternate every 2nd transformer layer.
	- Grouped Query Attention (GQA) — 12 query heads, 4 key/value heads. Reduces memory bandwidth during inference.
	- Rotary Position Embeddings (RoPE) — `rope_theta=500,000` for length generalisation.
	- SwiGLU FFN — activation function used in dense layers and expert FFNs.
	- RMSNorm — replaces LayerNorm for faster normalisation.
	- Financial Numeracy Encoding — a learned 64-dim embedding for numeric tokens to improve precision on quantitative finance tasks.
	- Elastic Weight Consolidation (EWC) — prevents catastrophic forgetting across continual training runs.
	- Tied word embeddings — input embeddings and `lm_head` share weights, saving ~197M parameters.

	---

	## How to Use

	> The model weights are stored under the `checkpoint/` subfolder in this repo.

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	repo_id = "meridianal/FinAI"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder="checkpoint")
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	subfolder="checkpoint",
	trust_remote_code=True,
	torch_dtype=torch.float32,
	low_cpu_mem_usage=True,
	)
	model.eval()

	prompt = """### Instruction:
	What does a high price-to-earnings ratio indicate about a stock?

	### Response:
	"""

	inputs = tokenizer(prompt, return_tensors="pt")
	with torch.no_grad():
	out = model.generate(
	**inputs,
	max_new_tokens=200,
	do_sample=True,
	temperature=0.8,
	top_p=0.92,
	repetition_penalty=1.3,
	no_repeat_ngram_size=3,
	pad_token_id=tokenizer.pad_token_id,
	eos_token_id=tokenizer.eos_token_id,
	)

	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	### Prompt format

	All training examples use this instruction/response format:

	```
	### Instruction:
	<your question or task>

	### Response:
	<answer>
	```

	Classification tasks are also formatted this way with a short label-only response.

	### Generation tips

	Continual training can introduce mild repetition. Recommended settings:

	\| Parameter \| Range \|
	\|---\|---\|
	\| `temperature` \| 0.7 – 0.95 \|
	\| `top_p` \| 0.85 – 0.95 \|
	\| `repetition_penalty` \| 1.2 – 1.4 \|
	\| `no_repeat_ngram_size` \| 3 \|

	If you see repeated phrases, increase `repetition_penalty` and lower `temperature`.

	---

	## Training Data

	Training streams finance datasets from the FinanceMTEB family:

	- Financial sentiment analysis (FinancialPhraseBank, etc.)
	- ESG and sustainability classification
	- FOMC statement analysis
	- Fraud and financial complaint datasets
	- Financial QA pairs
	- Earnings call and filing excerpts

	Datasets are loaded in streaming mode with a 15MB-per-source cap to stay within GitHub Actions memory limits.

	---

	## Continual Learning

	The model trains automatically via GitHub Actions on a scheduled hourly cron. Key features:

	- EWC regularisation — Fisher information matrix computed from recent data protects previously learned weights from being overwritten.
	- RAM-safe checkpointing — training halts and saves before hitting memory limits (`MAX_RAM_GB=13`).
	- Optimizer-free saves — AdaFactor optimizer state is discarded before upload to keep checkpoint size small.
	- Auto-recovery — each run pulls the latest checkpoint from this repo before training, resuming from where the last run left off.

	---

	## Limitations

	- Experimental model — outputs may be incorrect, hallucinated, or outdated.
	- Not intended for production financial applications.
	- Continual training without human evaluation gates means quality can regress between runs.
	- Numeric reasoning is improved by the numeracy encoder but not guaranteed accurate.

	---

	## Source Code

	Training pipeline, architecture, and CI workflows:
	[github.com/MeridianAlgo/FinAI](https://github.com/MeridianAlgo/FinAI)