---
title: CodeCraftLab
emoji: 👁
colorFrom: pink
colorTo: purple
sdk: streamlit
sdk_version: 1.57.0
app_file: app.py
pinned: false
license: mit
short_description: A fine-tuning platform
datasets:
- angie-chen55/python-github-code
- sdiazlor/python-reasoning-dataset
- MatrixStudio/Codeforces-Python-Submissions
---

# CodeCraftLab
A production-grade platform for fine-tuning, evaluating, and serving code generation models. Built on FastAPI + React with a hardened training pipeline, structured logging, and HuggingFace Hub integration.
---
## What It Does
'''
Capability	Detail
Dataset management	Upload, validate, and preprocess Python code datasets via REST API
Fine-tuning	Configure and run training jobs with Pydantic-validated configs
Evaluation	Automated eval hooks — pass@k, BLEU, execution accuracy
Model serving	Authenticated inference endpoints for trained models
HF Hub sync	Push/pull models and datasets to/from HuggingFace Hub
'''
---
## Quick Start
Requirements: Python 3.11+, Docker, CUDA-capable GPU (optional, CPU fallback available)
```bash
git clone https://github.com/your-org/codecraftlab.git
cd codecraftlab

# Copy and configure environment
cp .env.example .env
# Edit .env: set HF_TOKEN, SECRET_KEY, DATABASE_URL

# Start with Docker Compose
docker compose up --build

# API available at http://localhost:8000
# Docs at http://localhost:8000/docs
```
### Without Docker:
```bash
pip install uv
uv sync
uv run uvicorn app:app --reload --port 8000
```
---
## API Overview
All endpoints require a Bearer token. Get one via `POST /auth/token`.
```bash
# Authenticate
curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your-password"}'

# Upload a dataset
curl -X POST http://localhost:8000/datasets/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@data/train.jsonl"

# Launch a training job
curl -X POST http://localhost:8000/training/jobs \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d @configs/example_job.json

# Check job status
curl http://localhost:8000/training/jobs/{job_id} \
  -H "Authorization: Bearer <token>"
```
## Full interactive docs: `http://localhost:8000/docs`
---
## Training Configuration
Jobs are defined as JSON and validated against Pydantic v2 schemas:
```json
{
  "job_name": "codegen-finetune-v1",
  "base_model": "Salesforce/codegen-350M-mono",
  "dataset_id": "ds_abc123",
  "training": {
    "num_epochs": 3,
    "batch_size": 8,
    "learning_rate": 2e-5,
    "warmup_ratio": 0.1,
    "max_seq_length": 1024,
    "gradient_accumulation_steps": 4
  },
  "evaluation": {
    "enabled": true,
    "strategy": "epoch",
    "metrics": ["pass_at_1", "pass_at_10", "bleu"]
  },
  "hub": {
    "push_to_hub": true,
    "repo_id": "your-org/codegen-finetune-v1"
  }
}
```
---
## Evaluation Metrics
### Metric	Description
`pass@k`	Fraction of problems solved by at least 1 of k samples
`BLEU`	N-gram overlap against reference completions
`execution_accuracy`	Fraction of generated code that runs without error
`exact_match`	Exact string match against reference outputs
Eval results are logged to structured JSON and optionally pushed to HF Hub model cards.
---
## Architecture
```
codecraftlab/
├── app.py                  # FastAPI entrypoint
├── routers/
│   ├── auth.py             # JWT auth
│   ├── datasets.py         # Upload, validate, preprocess
│   ├── training.py         # Job management
│   └── inference.py        # Model serving
├── training/
│   ├── config.py           # Pydantic v2 training configs
│   ├── pipeline.py         # Fine-tuning pipeline + eval hooks
│   └── evaluators.py       # Metric implementations
├── models/                 # SQLAlchemy ORM models
├── core/
│   ├── auth.py             # JWT utils
│   ├── logging.py          # structlog setup
│   └── settings.py         # Pydantic settings
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml
```
---
### HuggingFace Space Config — Audit Notes
The original Space was configured as `sdk: streamlit`. This repo now runs on FastAPI via Docker:
Field	Before	After	Reason
`sdk`	`streamlit`	`docker`	FastAPI served via Uvicorn
`sdk_version`	`1.57.0`	(removed)	Not applicable for Docker SDK
`app_port`	(missing)	`8000`	Required for Docker SDK
`pinned`	`false`	`true`	Production Space, should persist
`short_description`	Generic	Specific	Better discoverability on HF Hub
`tags`	(missing)	Added	Enables HF search indexing
---
## Development
```bash
# Run tests
uv run pytest tests/ -v --cov=. --cov-report=term-missing

# Lint
uv run ruff check .
uv run mypy . --strict

# Format
uv run ruff format .
```
Test a training run locally (CPU, minimal config):
```bash
uv run python -m training.pipeline \
  --config configs/smoke_test.json \
  --dry-run
```
---
### Environment Variables
Variable	Required	Description
`SECRET_KEY`	Yes	JWT signing secret (min 32 chars)
`HF_TOKEN`	Yes	HuggingFace token with write access
`DATABASE_URL`	Yes	PostgreSQL connection string
`LOG_LEVEL`	No	`DEBUG`/`INFO`/`WARNING` (default: `INFO`)
`MAX_CONCURRENT_JOBS`	No	Max parallel training jobs (default: `2`)
`MODEL_CACHE_DIR`	No	Local model cache path (default: `./cache`)
---
## License
MIT — see LICENSE