Fin.AI / README.md
IshaanMan123's picture
Update README.md
f1419db verified
metadata
license: mit
tags:
  - pytorch
  - gpt2
  - text-generation
  - fin-ai
  - experimental
  - in-training
  - from-scratch
  - automated-training
language:
  - en
datasets:
  - wikitext
  - roneneldan/TinyStories
  - openai/gsm8k
  - squad
  - imdb
  - ag_news
  - yelp_review_full
  - cnn_dailymail
  - billsum
  - commonsense_qa
  - hellaswag
  - winogrande
  - boolq
  - race
  - stanfordnlp/coqa
  - allenai/c4
  - Skylion007/openwebtext
  - trivia_qa
  - hotpot_qa
  - microsoft/ms_marco
  - duorc
  - amazon_polarity
  - zeroshot/twitter-financial-news-sentiment
  - sciq
  - quail
  - wiki_qa
  - paws
  - medical_questions_pairs
  - app_reviews
  - rotten_tomatoes
metrics:
  - perplexity
library_name: pytorch
pipeline_tag: text-generation
πŸ€– Fin.AI v2.0

Status Version Parameters License

⚠️ EXPERIMENTAL MODEL - Training from scratch

GitHub β€’ Training Logs β€’ Report Issue


🚨 Important Notice

This model is training from scratch and outputs will be gibberish initially.

  • πŸ”΄ Brand new model - Starting from random weights
  • ⏳ Training time needed: 2-4 weeks for basic coherence
  • πŸ€– Automated training: Every 1 hour 10 minutes via GitHub Actions
  • πŸ“Š Current quality: Expect complete nonsense initially
  • 🎯 Purpose: Research/experimental continuous learning

πŸ“Š Model Overview

Specification Value
Architecture GPT-2 style Transformer
Parameters 30,142,848 (~30M)
Layers 6
Attention Heads 6
Embedding Dimension 384
Feed-Forward Dimension 1,536
Max Sequence Length 512 tokens
Vocabulary Size 50,257 (GPT-2 tokenizer)
Position Encoding Rotary (RoPE)
Activation GELU

🎯 Training Details

Training Schedule

  • Frequency: Every 1 hour 10 minutes (6 cycles/hour)
  • Steps per cycle: 800 steps
  • Daily steps: ~115,200 steps
  • Weekly steps: ~806,400 steps
  • Batch size: 8 (effective: 32 with gradient accumulation)
  • Learning rate: 3e-4 with cosine decay
  • Warmup steps: 100

Training Infrastructure

  • Platform: GitHub Actions (free tier)
  • Hardware: CPU only
  • Training time: ~15-20 minutes per cycle
  • Automatic upload: To Hugging Face after each cycle

Datasets (30 total, rotating hourly)

The model trains on a diverse set of 30 datasets, cycling through one per hour:

πŸ“š Knowledge & Reference

  • WikiText-2, OpenWebText, C4

✍️ Creative Writing

  • TinyStories

πŸ“° News & Articles

  • CNN/DailyMail, AG News, Billsum

❓ Question Answering

  • SQuAD, CoQA, TriviaQA, HotpotQA, MS MARCO, WikiQA, Quail

🧠 Reasoning & Logic

  • GSM8K (Math), Common Sense QA, HellaSwag, WinoGrande, BoolQ

πŸ“– Reading Comprehension

  • RACE, DuoRC

πŸ’¬ Reviews & Sentiment

  • IMDB, Yelp, Amazon Polarity, Rotten Tomatoes, App Reviews

πŸ”¬ Scientific & Medical

  • SciQ, Medical Questions

πŸ’° Financial

  • Twitter Financial News

πŸ”„ Paraphrase & Similarity

  • PAWS

πŸ“ˆ Training Progress

Current Status

  • Version: v2.0.0
  • Training started: December 28, 2024
  • Model type: fresh_init
  • Total parameters: 30,142,848

Expected Timeline

Week Expected Quality Description
1 πŸ”΄ Gibberish Random weights, no coherence
2 🟠 Patterns Some token patterns emerging
3-4 🟑 Basic Simple word sequences
5-8 🟒 Improving Short coherent phrases
9-12 πŸ”΅ Decent Usable for simple tasks

Monitoring


πŸ’» Usage

Installation

pip install torch transformers huggingface-hub

Download Model

from huggingface_hub import hf_hub_download
import os

# Create directory
os.makedirs("./fin_ai_model", exist_ok=True)

# Download model files
hf_hub_download("MeridianAlgo/Fin.AI", "model.pt", local_dir="./fin_ai_model")
hf_hub_download("MeridianAlgo/Fin.AI", "config.json", local_dir="./fin_ai_model")

Generate Text (Experimental)

from fin_ai.model import FinAIModel
import torch
from transformers import AutoTokenizer

# Load model
model = FinAIModel.from_pretrained("./fin_ai_model")
model.eval()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")

# Generate text (expect poor quality initially)
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_length=50,
        temperature=0.8,
        top_p=0.9,
        do_sample=True,
    )

generated_text = tokenizer.decode(output[0])
print(generated_text)

# Note: Output quality is poor initially and improves over weeks

πŸ”¬ Technical Details

Architecture Improvements (v2.0)

Compared to v1.x:

  • βœ… 3x more parameters (10M β†’ 30M)
  • βœ… Better architecture (4 layers β†’ 6 layers)
  • βœ… Larger embeddings (256 β†’ 384 dimensions)
  • βœ… More attention heads (4 β†’ 6 heads)
  • βœ… Improved training (600 β†’ 800 steps/cycle)

Training Configuration

model:
  size_preset: "small"
  n_layers: 6
  n_heads: 6
  embed_dim: 384
  ff_dim: 1536
  max_seq_len: 512

training:
  batch_size: 8
  gradient_accumulation_steps: 4
  learning_rate: 3.0e-4
  weight_decay: 0.01
  warmup_steps: 100
  max_steps: 800

πŸ“Š Evaluation

Metrics Tracked

  • Training Loss: Cross-entropy loss
  • Perplexity: exp(loss)
  • Tokens/Second: Training throughput
  • Learning Rate: Cosine schedule with warmup
  • Gradient Norm: For stability monitoring

Benchmarks (Coming Soon)

Once the model reaches basic coherence, we'll evaluate on:

  • HellaSwag (common sense)
  • LAMBADA (reading comprehension)
  • WikiText perplexity
  • Custom generation quality tests

⚠️ Limitations

  1. Early Training: Model is in very early training stages
  2. Output Quality: Expect gibberish for several weeks
  3. CPU Training: Slower than GPU training
  4. Small Model: 30M parameters is relatively small
  5. Limited Context: 512 token context window
  6. No Fine-tuning: Base model only, not instruction-tuned
  7. English Only: Trained primarily on English text

🀝 Contributing

This is an open research project! Contributions welcome:


πŸ“œ License

MIT License - See LICENSE


πŸ”— Links


Last Updated: 2025-12-28 17:54 UTC

Status: πŸ”΄ Training from Scratch

Quality: ⚠️ Expect Gibberish (2-4 weeks needed)