|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- pytorch |
|
|
- gpt2 |
|
|
- text-generation |
|
|
- fin-ai |
|
|
- experimental |
|
|
- in-training |
|
|
- from-scratch |
|
|
- automated-training |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- wikitext |
|
|
- roneneldan/TinyStories |
|
|
- openai/gsm8k |
|
|
- squad |
|
|
- imdb |
|
|
- ag_news |
|
|
- yelp_review_full |
|
|
- cnn_dailymail |
|
|
- billsum |
|
|
- commonsense_qa |
|
|
- hellaswag |
|
|
- winogrande |
|
|
- boolq |
|
|
- race |
|
|
- stanfordnlp/coqa |
|
|
- allenai/c4 |
|
|
- Skylion007/openwebtext |
|
|
- trivia_qa |
|
|
- hotpot_qa |
|
|
- microsoft/ms_marco |
|
|
- duorc |
|
|
- amazon_polarity |
|
|
- zeroshot/twitter-financial-news-sentiment |
|
|
- sciq |
|
|
- quail |
|
|
- wiki_qa |
|
|
- paws |
|
|
- medical_questions_pairs |
|
|
- app_reviews |
|
|
- rotten_tomatoes |
|
|
metrics: |
|
|
- perplexity |
|
|
library_name: pytorch |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
<style> |
|
|
.container { |
|
|
|
|
|
font-size: 2em; /* Relative to parent font size */ |
|
|
display: flex; |
|
|
align-items: center; |
|
|
justify-content: center; |
|
|
|
|
|
|
|
|
} |
|
|
</style> |
|
|
<div align="center"> |
|
|
<div class="container"> |
|
|
π€ Fin.AI v2.0 |
|
|
</div> |
|
|
|
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
**β οΈ EXPERIMENTAL MODEL - Training from scratch** |
|
|
|
|
|
[GitHub](https://github.com/MeridianAlgo/FinAI) β’ [Training Logs](https://wandb.ai/meridianalgo-meridianalgo/fin-ai) β’ [Report Issue](https://github.com/MeridianAlgo/FinAI/issues) |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π¨ Important Notice |
|
|
|
|
|
**This model is training from scratch and outputs will be gibberish initially.** |
|
|
|
|
|
- π΄ **Brand new model** - Starting from random weights |
|
|
- β³ **Training time needed**: 2-4 weeks for basic coherence |
|
|
- π€ **Automated training**: Every 1 hour 10 minutes via GitHub Actions |
|
|
- π **Current quality**: Expect complete nonsense initially |
|
|
- π― **Purpose**: Research/experimental continuous learning |
|
|
|
|
|
--- |
|
|
|
|
|
## π Model Overview |
|
|
|
|
|
| Specification | Value | |
|
|
|--------------|-------| |
|
|
| **Architecture** | GPT-2 style Transformer | |
|
|
| **Parameters** | 30,142,848 (~30M) | |
|
|
| **Layers** | 6 | |
|
|
| **Attention Heads** | 6 | |
|
|
| **Embedding Dimension** | 384 | |
|
|
| **Feed-Forward Dimension** | 1,536 | |
|
|
| **Max Sequence Length** | 512 tokens | |
|
|
| **Vocabulary Size** | 50,257 (GPT-2 tokenizer) | |
|
|
| **Position Encoding** | Rotary (RoPE) | |
|
|
| **Activation** | GELU | |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Training Details |
|
|
|
|
|
### Training Schedule |
|
|
- **Frequency**: Every 1 hour 10 minutes (6 cycles/hour) |
|
|
- **Steps per cycle**: 800 steps |
|
|
- **Daily steps**: ~115,200 steps |
|
|
- **Weekly steps**: ~806,400 steps |
|
|
- **Batch size**: 8 (effective: 32 with gradient accumulation) |
|
|
- **Learning rate**: 3e-4 with cosine decay |
|
|
- **Warmup steps**: 100 |
|
|
|
|
|
### Training Infrastructure |
|
|
- **Platform**: GitHub Actions (free tier) |
|
|
- **Hardware**: CPU only |
|
|
- **Training time**: ~15-20 minutes per cycle |
|
|
- **Automatic upload**: To Hugging Face after each cycle |
|
|
|
|
|
### Datasets (30 total, rotating hourly) |
|
|
|
|
|
The model trains on a diverse set of 30 datasets, cycling through one per hour: |
|
|
|
|
|
**π Knowledge & Reference** |
|
|
- WikiText-2, OpenWebText, C4 |
|
|
|
|
|
**βοΈ Creative Writing** |
|
|
- TinyStories |
|
|
|
|
|
**π° News & Articles** |
|
|
- CNN/DailyMail, AG News, Billsum |
|
|
|
|
|
**β Question Answering** |
|
|
- SQuAD, CoQA, TriviaQA, HotpotQA, MS MARCO, WikiQA, Quail |
|
|
|
|
|
**π§ Reasoning & Logic** |
|
|
- GSM8K (Math), Common Sense QA, HellaSwag, WinoGrande, BoolQ |
|
|
|
|
|
**π Reading Comprehension** |
|
|
- RACE, DuoRC |
|
|
|
|
|
**π¬ Reviews & Sentiment** |
|
|
- IMDB, Yelp, Amazon Polarity, Rotten Tomatoes, App Reviews |
|
|
|
|
|
**π¬ Scientific & Medical** |
|
|
- SciQ, Medical Questions |
|
|
|
|
|
**π° Financial** |
|
|
- Twitter Financial News |
|
|
|
|
|
**π Paraphrase & Similarity** |
|
|
- PAWS |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Progress |
|
|
|
|
|
### Current Status |
|
|
- **Version**: v2.0.0 |
|
|
- **Training started**: December 28, 2024 |
|
|
- **Model type**: fresh_init |
|
|
- **Total parameters**: 30,142,848 |
|
|
|
|
|
### Expected Timeline |
|
|
|
|
|
| Week | Expected Quality | Description | |
|
|
|------|-----------------|-------------| |
|
|
| 1 | π΄ Gibberish | Random weights, no coherence | |
|
|
| 2 | π Patterns | Some token patterns emerging | |
|
|
| 3-4 | π‘ Basic | Simple word sequences | |
|
|
| 5-8 | π’ Improving | Short coherent phrases | |
|
|
| 9-12 | π΅ Decent | Usable for simple tasks | |
|
|
|
|
|
### Monitoring |
|
|
- **GitHub Actions**: [View Training Runs](https://github.com/MeridianAlgo/FinAI/actions) |
|
|
- **Wandb Dashboard**: [View Metrics](https://wandb.ai/meridianalgo-meridianalgo/fin-ai) |
|
|
- **Model Updates**: This page updates automatically |
|
|
|
|
|
--- |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install torch transformers huggingface-hub |
|
|
``` |
|
|
|
|
|
### Download Model |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import os |
|
|
|
|
|
# Create directory |
|
|
os.makedirs("./fin_ai_model", exist_ok=True) |
|
|
|
|
|
# Download model files |
|
|
hf_hub_download("MeridianAlgo/Fin.AI", "model.pt", local_dir="./fin_ai_model") |
|
|
hf_hub_download("MeridianAlgo/Fin.AI", "config.json", local_dir="./fin_ai_model") |
|
|
``` |
|
|
|
|
|
### Generate Text (Experimental) |
|
|
|
|
|
```python |
|
|
from fin_ai.model import FinAIModel |
|
|
import torch |
|
|
from transformers import AutoTokenizer |
|
|
|
|
|
# Load model |
|
|
model = FinAIModel.from_pretrained("./fin_ai_model") |
|
|
model.eval() |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("gpt2") |
|
|
|
|
|
# Generate text (expect poor quality initially) |
|
|
input_text = "Once upon a time" |
|
|
input_ids = tokenizer.encode(input_text, return_tensors="pt") |
|
|
|
|
|
with torch.no_grad(): |
|
|
output = model.generate( |
|
|
input_ids, |
|
|
max_length=50, |
|
|
temperature=0.8, |
|
|
top_p=0.9, |
|
|
do_sample=True, |
|
|
) |
|
|
|
|
|
generated_text = tokenizer.decode(output[0]) |
|
|
print(generated_text) |
|
|
|
|
|
# Note: Output quality is poor initially and improves over weeks |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Technical Details |
|
|
|
|
|
### Architecture Improvements (v2.0) |
|
|
|
|
|
Compared to v1.x: |
|
|
- β
**3x more parameters** (10M β 30M) |
|
|
- β
**Better architecture** (4 layers β 6 layers) |
|
|
- β
**Larger embeddings** (256 β 384 dimensions) |
|
|
- β
**More attention heads** (4 β 6 heads) |
|
|
- β
**Improved training** (600 β 800 steps/cycle) |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
```yaml |
|
|
model: |
|
|
size_preset: "small" |
|
|
n_layers: 6 |
|
|
n_heads: 6 |
|
|
embed_dim: 384 |
|
|
ff_dim: 1536 |
|
|
max_seq_len: 512 |
|
|
|
|
|
training: |
|
|
batch_size: 8 |
|
|
gradient_accumulation_steps: 4 |
|
|
learning_rate: 3.0e-4 |
|
|
weight_decay: 0.01 |
|
|
warmup_steps: 100 |
|
|
max_steps: 800 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Evaluation |
|
|
|
|
|
### Metrics Tracked |
|
|
- **Training Loss**: Cross-entropy loss |
|
|
- **Perplexity**: exp(loss) |
|
|
- **Tokens/Second**: Training throughput |
|
|
- **Learning Rate**: Cosine schedule with warmup |
|
|
- **Gradient Norm**: For stability monitoring |
|
|
|
|
|
### Benchmarks (Coming Soon) |
|
|
Once the model reaches basic coherence, we'll evaluate on: |
|
|
- HellaSwag (common sense) |
|
|
- LAMBADA (reading comprehension) |
|
|
- WikiText perplexity |
|
|
- Custom generation quality tests |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
1. **Early Training**: Model is in very early training stages |
|
|
2. **Output Quality**: Expect gibberish for several weeks |
|
|
3. **CPU Training**: Slower than GPU training |
|
|
4. **Small Model**: 30M parameters is relatively small |
|
|
5. **Limited Context**: 512 token context window |
|
|
6. **No Fine-tuning**: Base model only, not instruction-tuned |
|
|
7. **English Only**: Trained primarily on English text |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Contributing |
|
|
|
|
|
This is an open research project! Contributions welcome: |
|
|
|
|
|
- **Code**: [GitHub Repository](https://github.com/MeridianAlgo/FinAI) |
|
|
- **Issues**: [Report Problems](https://github.com/MeridianAlgo/FinAI/issues) |
|
|
- **Discussions**: [Join Discussion](https://github.com/MeridianAlgo/FinAI/discussions) |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License - See [LICENSE](https://github.com/MeridianAlgo/FinAI/blob/main/LICENSE) |
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## π Links |
|
|
|
|
|
- **Repository**: https://github.com/MeridianAlgo/FinAI |
|
|
- **Training Logs**: https://wandb.ai/meridianalgo-meridianalgo/fin-ai |
|
|
- **GitHub Actions**: https://github.com/MeridianAlgo/FinAI/actions |
|
|
- **Issues**: https://github.com/MeridianAlgo/FinAI/issues |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Last Updated**: 2025-12-28 17:54 UTC |
|
|
|
|
|
**Status**: π΄ Training from Scratch |
|
|
|
|
|
**Quality**: β οΈ Expect Gibberish (2-4 weeks needed) |
|
|
|
|
|
</div> |
|
|
|