Spaces:
Runtime error
Runtime error
metadata
title: CodeCraftLab
emoji: π
colorFrom: pink
colorTo: purple
sdk: streamlit
sdk_version: 1.57.0
app_file: app.py
pinned: false
license: mit
short_description: A fine-tuning platform
datasets:
- angie-chen55/python-github-code
- sdiazlor/python-reasoning-dataset
- MatrixStudio/Codeforces-Python-Submissions
CodeCraftLab
A production-grade platform for fine-tuning, evaluating, and serving code generation models. Built on FastAPI + React with a hardened training pipeline, structured logging, and HuggingFace Hub integration.
What It Does
''' Capability Detail Dataset management Upload, validate, and preprocess Python code datasets via REST API Fine-tuning Configure and run training jobs with Pydantic-validated configs Evaluation Automated eval hooks β pass@k, BLEU, execution accuracy Model serving Authenticated inference endpoints for trained models HF Hub sync Push/pull models and datasets to/from HuggingFace Hub '''
Quick Start
Requirements: Python 3.11+, Docker, CUDA-capable GPU (optional, CPU fallback available)
git clone https://github.com/your-org/codecraftlab.git
cd codecraftlab
# Copy and configure environment
cp .env.example .env
# Edit .env: set HF_TOKEN, SECRET_KEY, DATABASE_URL
# Start with Docker Compose
docker compose up --build
# API available at http://localhost:8000
# Docs at http://localhost:8000/docs
Without Docker:
pip install uv
uv sync
uv run uvicorn app:app --reload --port 8000
API Overview
All endpoints require a Bearer token. Get one via POST /auth/token.
# Authenticate
curl -X POST http://localhost:8000/auth/token \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "your-password"}'
# Upload a dataset
curl -X POST http://localhost:8000/datasets/upload \
-H "Authorization: Bearer <token>" \
-F "file=@data/train.jsonl"
# Launch a training job
curl -X POST http://localhost:8000/training/jobs \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d @configs/example_job.json
# Check job status
curl http://localhost:8000/training/jobs/{job_id} \
-H "Authorization: Bearer <token>"
Full interactive docs: http://localhost:8000/docs
Training Configuration
Jobs are defined as JSON and validated against Pydantic v2 schemas:
json { "job_name": "codegen-finetune-v1", "base_model": "Salesforce/codegen-350M-mono", "dataset_id": "ds_abc123", "training": { "num_epochs": 3, "batch_size": 8, "learning_rate": 2e-5, "warmup_ratio": 0.1, "max_seq_length": 1024, "gradient_accumulation_steps": 4 }, "evaluation": { "enabled": true, "strategy": "epoch", "metrics": ["pass_at_1", "pass_at_10", "bleu"] }, "hub": { "push_to_hub": true, "repo_id": "your-org/codegen-finetune-v1" } }
Evaluation Metrics
Metric Description
pass@k Fraction of problems solved by at least 1 of k samples
BLEU N-gram overlap against reference completions
execution_accuracy Fraction of generated code that runs without error
exact_match Exact string match against reference outputs
Eval results are logged to structured JSON and optionally pushed to HF Hub model cards.
Architecture
codecraftlab/
βββ app.py # FastAPI entrypoint
βββ routers/
β βββ auth.py # JWT auth
β βββ datasets.py # Upload, validate, preprocess
β βββ training.py # Job management
β βββ inference.py # Model serving
βββ training/
β βββ config.py # Pydantic v2 training configs
β βββ pipeline.py # Fine-tuning pipeline + eval hooks
β βββ evaluators.py # Metric implementations
βββ models/ # SQLAlchemy ORM models
βββ core/
β βββ auth.py # JWT utils
β βββ logging.py # structlog setup
β βββ settings.py # Pydantic settings
βββ Dockerfile
βββ docker-compose.yml
βββ pyproject.toml
HuggingFace Space Config β Audit Notes
The original Space was configured as sdk: streamlit. This repo now runs on FastAPI via Docker:
Field Before After Reason
sdk streamlit docker FastAPI served via Uvicorn
sdk_version 1.57.0 (removed) Not applicable for Docker SDK
app_port (missing) 8000 Required for Docker SDK
pinned false true Production Space, should persist
short_description Generic Specific Better discoverability on HF Hub
tags (missing) Added Enables HF search indexing
Development
# Run tests
uv run pytest tests/ -v --cov=. --cov-report=term-missing
# Lint
uv run ruff check .
uv run mypy . --strict
# Format
uv run ruff format .
Test a training run locally (CPU, minimal config):
bash uv run python -m training.pipeline \ --config configs/smoke_test.json \ --dry-run
Environment Variables
Variable Required Description
SECRET_KEY Yes JWT signing secret (min 32 chars)
HF_TOKEN Yes HuggingFace token with write access
DATABASE_URL Yes PostgreSQL connection string
LOG_LEVEL No DEBUG/INFO/WARNING (default: INFO)
MAX_CONCURRENT_JOBS No Max parallel training jobs (default: 2)
MODEL_CACHE_DIR No Local model cache path (default: ./cache)
License
MIT β see LICENSE