metadata
language:
- en
license: llama3
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
tags:
- data-management
- sql
- migration
- grpo
- reinforcement-learning
Agentic Data 1 — GRPO-Trained
A specialized 8B parameter model for data management, migration, and SQL tasks.
Training Pipeline
- Base: DeepSeek-R1-Distill-Llama-8B
- SFT: Fine-tuned on 1000+ data management examples (Oracle→Postgres, DB2→Snowflake, ETL, data quality)
- GRPO: 500 steps of Group Relative Policy Optimization on H100, with reward functions for:
- Code parsability (SQL validation)
- Reasoning quality (step-by-step thinking)
- Answer accuracy
Training Metrics (GRPO)
| Metric | Start | End |
|---|---|---|
| Reward | 0.43 | 0.49 |
| Code Parsability | 0.15 | 0.21 |
| KL Divergence | 0.0005 | 0.0014 |
| Grad Norm | 0.295 | 0.210 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("DataManagement-AI/Agentic-Data-1")
tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
Capabilities
- Oracle → PostgreSQL migration
- DB2 → Snowflake conversion
- SQL generation and validation
- ETL pipeline design
- Data quality assessment
- Schema analysis and optimization