Fin.AI / README.md

Update README.md

f1419db verified 1 day ago

7.99 kB

	---
	license: mit
	tags:
	- pytorch
	- gpt2
	- text-generation
	- fin-ai
	- experimental
	- in-training
	- from-scratch
	- automated-training
	language:
	- en
	datasets:
	- wikitext
	- roneneldan/TinyStories
	- openai/gsm8k
	- squad
	- imdb
	- ag_news
	- yelp_review_full
	- cnn_dailymail
	- billsum
	- commonsense_qa
	- hellaswag
	- winogrande
	- boolq
	- race
	- stanfordnlp/coqa
	- allenai/c4
	- Skylion007/openwebtext
	- trivia_qa
	- hotpot_qa
	- microsoft/ms_marco
	- duorc
	- amazon_polarity
	- zeroshot/twitter-financial-news-sentiment
	- sciq
	- quail
	- wiki_qa
	- paws
	- medical_questions_pairs
	- app_reviews
	- rotten_tomatoes
	metrics:
	- perplexity
	library_name: pytorch
	pipeline_tag: text-generation
	---
	<style>
	.container {

	font-size: 2em; /* Relative to parent font size */
	display: flex;
	align-items: center;
	justify-content: center;


	}
	</style>
	<div align="center">
	<div class="container">
	🤖 Fin.AI v2.0
	</div>


	![Status](https://img.shields.io/badge/status-training-yellow)
	![Version](https://img.shields.io/badge/version-2.0.0-blue)
	![Parameters](https://img.shields.io/badge/parameters-30M-green)
	![License](https://img.shields.io/badge/license-MIT-blue)

	⚠️ EXPERIMENTAL MODEL - Training from scratch

	[GitHub](https://github.com/MeridianAlgo/FinAI) • [Training Logs](https://wandb.ai/meridianalgo-meridianalgo/fin-ai) • [Report Issue](https://github.com/MeridianAlgo/FinAI/issues)

	</div>

	---

	## 🚨 Important Notice

	This model is training from scratch and outputs will be gibberish initially.

	- 🔴 Brand new model - Starting from random weights
	- ⏳ Training time needed: 2-4 weeks for basic coherence
	- 🤖 Automated training: Every 1 hour 10 minutes via GitHub Actions
	- 📊 Current quality: Expect complete nonsense initially
	- 🎯 Purpose: Research/experimental continuous learning

	---

	## 📊 Model Overview

	\| Specification \| Value \|
	\|--------------\|-------\|
	\| Architecture \| GPT-2 style Transformer \|
	\| Parameters \| 30,142,848 (~30M) \|
	\| Layers \| 6 \|
	\| Attention Heads \| 6 \|
	\| Embedding Dimension \| 384 \|
	\| Feed-Forward Dimension \| 1,536 \|
	\| Max Sequence Length \| 512 tokens \|
	\| Vocabulary Size \| 50,257 (GPT-2 tokenizer) \|
	\| Position Encoding \| Rotary (RoPE) \|
	\| Activation \| GELU \|

	---

	## 🎯 Training Details

	### Training Schedule
	- Frequency: Every 1 hour 10 minutes (6 cycles/hour)
	- Steps per cycle: 800 steps
	- Daily steps: ~115,200 steps
	- Weekly steps: ~806,400 steps
	- Batch size: 8 (effective: 32 with gradient accumulation)
	- Learning rate: 3e-4 with cosine decay
	- Warmup steps: 100

	### Training Infrastructure
	- Platform: GitHub Actions (free tier)
	- Hardware: CPU only
	- Training time: ~15-20 minutes per cycle
	- Automatic upload: To Hugging Face after each cycle

	### Datasets (30 total, rotating hourly)

	The model trains on a diverse set of 30 datasets, cycling through one per hour:

	📚 Knowledge & Reference
	- WikiText-2, OpenWebText, C4

	✍️ Creative Writing
	- TinyStories

	📰 News & Articles
	- CNN/DailyMail, AG News, Billsum

	❓ Question Answering
	- SQuAD, CoQA, TriviaQA, HotpotQA, MS MARCO, WikiQA, Quail

	🧠 Reasoning & Logic
	- GSM8K (Math), Common Sense QA, HellaSwag, WinoGrande, BoolQ

	📖 Reading Comprehension
	- RACE, DuoRC

	💬 Reviews & Sentiment
	- IMDB, Yelp, Amazon Polarity, Rotten Tomatoes, App Reviews

	🔬 Scientific & Medical
	- SciQ, Medical Questions

	💰 Financial
	- Twitter Financial News

	🔄 Paraphrase & Similarity
	- PAWS

	---

	## 📈 Training Progress

	### Current Status
	- Version: v2.0.0
	- Training started: December 28, 2024
	- Model type: fresh_init
	- Total parameters: 30,142,848

	### Expected Timeline

	\| Week \| Expected Quality \| Description \|
	\|------\|-----------------\|-------------\|
	\| 1 \| 🔴 Gibberish \| Random weights, no coherence \|
	\| 2 \| 🟠 Patterns \| Some token patterns emerging \|
	\| 3-4 \| 🟡 Basic \| Simple word sequences \|
	\| 5-8 \| 🟢 Improving \| Short coherent phrases \|
	\| 9-12 \| 🔵 Decent \| Usable for simple tasks \|

	### Monitoring
	- GitHub Actions: [View Training Runs](https://github.com/MeridianAlgo/FinAI/actions)
	- Wandb Dashboard: [View Metrics](https://wandb.ai/meridianalgo-meridianalgo/fin-ai)
	- Model Updates: This page updates automatically

	---

	## 💻 Usage

	### Installation

	```bash
	pip install torch transformers huggingface-hub
	```

	### Download Model

	```python
	from huggingface_hub import hf_hub_download
	import os

	# Create directory
	os.makedirs("./fin_ai_model", exist_ok=True)

	# Download model files
	hf_hub_download("MeridianAlgo/Fin.AI", "model.pt", local_dir="./fin_ai_model")
	hf_hub_download("MeridianAlgo/Fin.AI", "config.json", local_dir="./fin_ai_model")
	```

	### Generate Text (Experimental)

	```python
	from fin_ai.model import FinAIModel
	import torch
	from transformers import AutoTokenizer

	# Load model
	model = FinAIModel.from_pretrained("./fin_ai_model")
	model.eval()

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("gpt2")

	# Generate text (expect poor quality initially)
	input_text = "Once upon a time"
	input_ids = tokenizer.encode(input_text, return_tensors="pt")

	with torch.no_grad():
	output = model.generate(
	input_ids,
	max_length=50,
	temperature=0.8,
	top_p=0.9,
	do_sample=True,
	)

	generated_text = tokenizer.decode(output[0])
	print(generated_text)

	# Note: Output quality is poor initially and improves over weeks
	```

	---

	## 🔬 Technical Details

	### Architecture Improvements (v2.0)

	Compared to v1.x:
	- ✅ 3x more parameters (10M → 30M)
	- ✅ Better architecture (4 layers → 6 layers)
	- ✅ Larger embeddings (256 → 384 dimensions)
	- ✅ More attention heads (4 → 6 heads)
	- ✅ Improved training (600 → 800 steps/cycle)

	### Training Configuration

	```yaml
	model:
	size_preset: "small"
	n_layers: 6
	n_heads: 6
	embed_dim: 384
	ff_dim: 1536
	max_seq_len: 512

	training:
	batch_size: 8
	gradient_accumulation_steps: 4
	learning_rate: 3.0e-4
	weight_decay: 0.01
	warmup_steps: 100
	max_steps: 800
	```

	---

	## 📊 Evaluation

	### Metrics Tracked
	- Training Loss: Cross-entropy loss
	- Perplexity: exp(loss)
	- Tokens/Second: Training throughput
	- Learning Rate: Cosine schedule with warmup
	- Gradient Norm: For stability monitoring

	### Benchmarks (Coming Soon)
	Once the model reaches basic coherence, we'll evaluate on:
	- HellaSwag (common sense)
	- LAMBADA (reading comprehension)
	- WikiText perplexity
	- Custom generation quality tests

	---

	## ⚠️ Limitations

	1. Early Training: Model is in very early training stages
	2. Output Quality: Expect gibberish for several weeks
	3. CPU Training: Slower than GPU training
	4. Small Model: 30M parameters is relatively small
	5. Limited Context: 512 token context window
	6. No Fine-tuning: Base model only, not instruction-tuned
	7. English Only: Trained primarily on English text

	---

	## 🤝 Contributing

	This is an open research project! Contributions welcome:

	- Code: [GitHub Repository](https://github.com/MeridianAlgo/FinAI)
	- Issues: [Report Problems](https://github.com/MeridianAlgo/FinAI/issues)
	- Discussions: [Join Discussion](https://github.com/MeridianAlgo/FinAI/discussions)

	---

	## 📜 License

	MIT License - See [LICENSE](https://github.com/MeridianAlgo/FinAI/blob/main/LICENSE)


	---

	## 🔗 Links

	- Repository: https://github.com/MeridianAlgo/FinAI
	- Training Logs: https://wandb.ai/meridianalgo-meridianalgo/fin-ai
	- GitHub Actions: https://github.com/MeridianAlgo/FinAI/actions
	- Issues: https://github.com/MeridianAlgo/FinAI/issues

	---

	<div align="center">

	Last Updated: 2025-12-28 17:54 UTC

	Status: 🔴 Training from Scratch

	Quality: ⚠️ Expect Gibberish (2-4 weeks needed)

	</div>