YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Seriguela - Symbolic Regression with Large Language Models

A research project exploring the application of GPT-2 and other LLMs to symbolic regression through fine-tuning and reinforcement learning.

πŸ† Key Results (Model Scaling Study - Feb 2025)

Model Parameters Valid Rate (Quality) Valid Rate (Nguyen) Avg RΒ² Max RΒ²
Base 124M 99.4% 62.5% 0.9190 0.9994
Medium 355M 99.2% 75.2% 0.9812 0.9999
Large 774M 100% πŸ† 89.0% πŸ† 0.9852 πŸ† 1.0000 πŸ†β­

Breakthrough: First model to achieve 100% valid rate and RΒ²=1.0 perfect fit on Nguyen-8


πŸ“‚ Project Structure

Organized by research phases for systematic experimentation:

seriguela/
β”œβ”€β”€ 1_data/                 # FASE 1: PreparaΓ§Γ£o de Dados
β”œβ”€β”€ 2_training/            # FASE 2: Treinamento e Fine-tuning
β”œβ”€β”€ 3_evaluation/          # FASE 3: AvaliaΓ§Γ£o
β”œβ”€β”€ 4_analysis/            # FASE 4: AnΓ‘lise e VisualizaΓ§Γ£o
β”œβ”€β”€ models/                # Modelos treinados
β”œβ”€β”€ results/               # Resultados de experimentos
β”œβ”€β”€ docs/                  # DocumentaΓ§Γ£o completa
β”œβ”€β”€ src/                   # CΓ³digo fonte (package)
└── scripts/               # Scripts auxiliares

Each directory contains a README.md with detailed documentation.


πŸš€ Quick Start

# 1. Clone and setup
git clone https://github.com/augustocsc/seriguela.git
cd seriguela
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Train model
cd 2_training/supervised
python train_with_json.py --model_size gpt2-medium

# 3. Evaluate
cd ../../3_evaluation/benchmarks
python run_all_nguyen_benchmarks.py --model_path ../../models/gpt2/medium_700k_json

# 4. Analyze
cd ../../4_analysis/visualization
python create_visualizations.py

πŸ“Š Complete Documentation


πŸŽ“ Citation

@misc{seriguela2025,
  title={Scaling Laws for Symbolic Regression with LLMs},
  author={Augusto Cesar},
  year={2025},
  note={First 100% valid rate + RΒ²=1.0 achieved}
}

Status: βœ… Production-ready | πŸ“Š Publication-ready | Last Updated: 2026-02-04

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support