Spaces:

S-Dreamer
/

CodeCraftLab

Runtime error

App Files Files Community

CodeCraftLab / README.md

S-Dreamer

Update README.md

1be4869 verified 16 days ago

5.38 kB

title: CodeCraftLab
emoji: 👁
colorFrom: pink
colorTo: purple
sdk: streamlit
sdk_version: 1.57.0
app_file: app.py
pinned: false
license: mit
short_description: A fine-tuning platform
datasets:
  - angie-chen55/python-github-code
  - sdiazlor/python-reasoning-dataset
  - MatrixStudio/Codeforces-Python-Submissions

CodeCraftLab

A production-grade platform for fine-tuning, evaluating, and serving code generation models. Built on FastAPI + React with a hardened training pipeline, structured logging, and HuggingFace Hub integration.

What It Does

''' Capability Detail Dataset management Upload, validate, and preprocess Python code datasets via REST API Fine-tuning Configure and run training jobs with Pydantic-validated configs Evaluation Automated eval hooks — pass@k, BLEU, execution accuracy Model serving Authenticated inference endpoints for trained models HF Hub sync Push/pull models and datasets to/from HuggingFace Hub '''

Quick Start

Requirements: Python 3.11+, Docker, CUDA-capable GPU (optional, CPU fallback available)

git clone https://github.com/your-org/codecraftlab.git
cd codecraftlab

# Copy and configure environment
cp .env.example .env
# Edit .env: set HF_TOKEN, SECRET_KEY, DATABASE_URL

# Start with Docker Compose
docker compose up --build

# API available at http://localhost:8000
# Docs at http://localhost:8000/docs

Without Docker:

pip install uv
uv sync
uv run uvicorn app:app --reload --port 8000

API Overview

All endpoints require a Bearer token. Get one via POST /auth/token.

# Authenticate
curl -X POST http://localhost:8000/auth/token \
  -H "Content-Type: application/json" \
  -d '{"username": "admin", "password": "your-password"}'

# Upload a dataset
curl -X POST http://localhost:8000/datasets/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@data/train.jsonl"

# Launch a training job
curl -X POST http://localhost:8000/training/jobs \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d @configs/example_job.json

# Check job status
curl http://localhost:8000/training/jobs/{job_id} \
  -H "Authorization: Bearer <token>"

Full interactive docs: `http://localhost:8000/docs`

Training Configuration

Jobs are defined as JSON and validated against Pydantic v2 schemas: json { "job_name": "codegen-finetune-v1", "base_model": "Salesforce/codegen-350M-mono", "dataset_id": "ds_abc123", "training": { "num_epochs": 3, "batch_size": 8, "learning_rate": 2e-5, "warmup_ratio": 0.1, "max_seq_length": 1024, "gradient_accumulation_steps": 4 }, "evaluation": { "enabled": true, "strategy": "epoch", "metrics": ["pass_at_1", "pass_at_10", "bleu"] }, "hub": { "push_to_hub": true, "repo_id": "your-org/codegen-finetune-v1" } }

Evaluation Metrics

Metric Description

`pass@k` Fraction of problems solved by at least 1 of k samples `BLEU` N-gram overlap against reference completions `execution_accuracy` Fraction of generated code that runs without error `exact_match` Exact string match against reference outputs Eval results are logged to structured JSON and optionally pushed to HF Hub model cards.

Architecture

codecraftlab/
├── app.py                  # FastAPI entrypoint
├── routers/
│   ├── auth.py             # JWT auth
│   ├── datasets.py         # Upload, validate, preprocess
│   ├── training.py         # Job management
│   └── inference.py        # Model serving
├── training/
│   ├── config.py           # Pydantic v2 training configs
│   ├── pipeline.py         # Fine-tuning pipeline + eval hooks
│   └── evaluators.py       # Metric implementations
├── models/                 # SQLAlchemy ORM models
├── core/
│   ├── auth.py             # JWT utils
│   ├── logging.py          # structlog setup
│   └── settings.py         # Pydantic settings
├── Dockerfile
├── docker-compose.yml
└── pyproject.toml

HuggingFace Space Config — Audit Notes

The original Space was configured as `sdk: streamlit`. This repo now runs on FastAPI via Docker: Field Before After Reason `sdk` `streamlit` `docker` FastAPI served via Uvicorn `sdk_version` `1.57.0` (removed) Not applicable for Docker SDK `app_port` (missing) `8000` Required for Docker SDK `pinned` `false` `true` Production Space, should persist `short_description` Generic Specific Better discoverability on HF Hub `tags` (missing) Added Enables HF search indexing

Development

# Run tests
uv run pytest tests/ -v --cov=. --cov-report=term-missing

# Lint
uv run ruff check .
uv run mypy . --strict

# Format
uv run ruff format .

Test a training run locally (CPU, minimal config): `bash uv run python -m training.pipeline \ --config configs/smoke_test.json \ --dry-run`

Environment Variables

Variable Required Description `SECRET_KEY` Yes JWT signing secret (min 32 chars) `HF_TOKEN` Yes HuggingFace token with write access `DATABASE_URL` Yes PostgreSQL connection string `LOG_LEVEL` No `DEBUG`/`INFO`/`WARNING` (default: `INFO`) `MAX_CONCURRENT_JOBS` No Max parallel training jobs (default: `2`) `MODEL_CACHE_DIR` No Local model cache path (default: `./cache`)

License

MIT — see LICENSE