Spaces:
Runtime error
Runtime error
| title: CodeCraftLab | |
| emoji: π | |
| colorFrom: pink | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.57.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: A fine-tuning platform | |
| datasets: | |
| - angie-chen55/python-github-code | |
| - sdiazlor/python-reasoning-dataset | |
| - MatrixStudio/Codeforces-Python-Submissions | |
| # CodeCraftLab | |
| A production-grade platform for fine-tuning, evaluating, and serving code generation models. Built on FastAPI + React with a hardened training pipeline, structured logging, and HuggingFace Hub integration. | |
| --- | |
| ## What It Does | |
| ''' | |
| Capability Detail | |
| Dataset management Upload, validate, and preprocess Python code datasets via REST API | |
| Fine-tuning Configure and run training jobs with Pydantic-validated configs | |
| Evaluation Automated eval hooks β pass@k, BLEU, execution accuracy | |
| Model serving Authenticated inference endpoints for trained models | |
| HF Hub sync Push/pull models and datasets to/from HuggingFace Hub | |
| ''' | |
| --- | |
| ## Quick Start | |
| Requirements: Python 3.11+, Docker, CUDA-capable GPU (optional, CPU fallback available) | |
| ```bash | |
| git clone https://github.com/your-org/codecraftlab.git | |
| cd codecraftlab | |
| # Copy and configure environment | |
| cp .env.example .env | |
| # Edit .env: set HF_TOKEN, SECRET_KEY, DATABASE_URL | |
| # Start with Docker Compose | |
| docker compose up --build | |
| # API available at http://localhost:8000 | |
| # Docs at http://localhost:8000/docs | |
| ``` | |
| ### Without Docker: | |
| ```bash | |
| pip install uv | |
| uv sync | |
| uv run uvicorn app:app --reload --port 8000 | |
| ``` | |
| --- | |
| ## API Overview | |
| All endpoints require a Bearer token. Get one via `POST /auth/token`. | |
| ```bash | |
| # Authenticate | |
| curl -X POST http://localhost:8000/auth/token \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"username": "admin", "password": "your-password"}' | |
| # Upload a dataset | |
| curl -X POST http://localhost:8000/datasets/upload \ | |
| -H "Authorization: Bearer <token>" \ | |
| -F "file=@data/train.jsonl" | |
| # Launch a training job | |
| curl -X POST http://localhost:8000/training/jobs \ | |
| -H "Authorization: Bearer <token>" \ | |
| -H "Content-Type: application/json" \ | |
| -d @configs/example_job.json | |
| # Check job status | |
| curl http://localhost:8000/training/jobs/{job_id} \ | |
| -H "Authorization: Bearer <token>" | |
| ``` | |
| ## Full interactive docs: `http://localhost:8000/docs` | |
| --- | |
| ## Training Configuration | |
| Jobs are defined as JSON and validated against Pydantic v2 schemas: | |
| ```json | |
| { | |
| "job_name": "codegen-finetune-v1", | |
| "base_model": "Salesforce/codegen-350M-mono", | |
| "dataset_id": "ds_abc123", | |
| "training": { | |
| "num_epochs": 3, | |
| "batch_size": 8, | |
| "learning_rate": 2e-5, | |
| "warmup_ratio": 0.1, | |
| "max_seq_length": 1024, | |
| "gradient_accumulation_steps": 4 | |
| }, | |
| "evaluation": { | |
| "enabled": true, | |
| "strategy": "epoch", | |
| "metrics": ["pass_at_1", "pass_at_10", "bleu"] | |
| }, | |
| "hub": { | |
| "push_to_hub": true, | |
| "repo_id": "your-org/codegen-finetune-v1" | |
| } | |
| } | |
| ``` | |
| --- | |
| ## Evaluation Metrics | |
| ### Metric Description | |
| `pass@k` Fraction of problems solved by at least 1 of k samples | |
| `BLEU` N-gram overlap against reference completions | |
| `execution_accuracy` Fraction of generated code that runs without error | |
| `exact_match` Exact string match against reference outputs | |
| Eval results are logged to structured JSON and optionally pushed to HF Hub model cards. | |
| --- | |
| ## Architecture | |
| ``` | |
| codecraftlab/ | |
| βββ app.py # FastAPI entrypoint | |
| βββ routers/ | |
| β βββ auth.py # JWT auth | |
| β βββ datasets.py # Upload, validate, preprocess | |
| β βββ training.py # Job management | |
| β βββ inference.py # Model serving | |
| βββ training/ | |
| β βββ config.py # Pydantic v2 training configs | |
| β βββ pipeline.py # Fine-tuning pipeline + eval hooks | |
| β βββ evaluators.py # Metric implementations | |
| βββ models/ # SQLAlchemy ORM models | |
| βββ core/ | |
| β βββ auth.py # JWT utils | |
| β βββ logging.py # structlog setup | |
| β βββ settings.py # Pydantic settings | |
| βββ Dockerfile | |
| βββ docker-compose.yml | |
| βββ pyproject.toml | |
| ``` | |
| --- | |
| ### HuggingFace Space Config β Audit Notes | |
| The original Space was configured as `sdk: streamlit`. This repo now runs on FastAPI via Docker: | |
| Field Before After Reason | |
| `sdk` `streamlit` `docker` FastAPI served via Uvicorn | |
| `sdk_version` `1.57.0` (removed) Not applicable for Docker SDK | |
| `app_port` (missing) `8000` Required for Docker SDK | |
| `pinned` `false` `true` Production Space, should persist | |
| `short_description` Generic Specific Better discoverability on HF Hub | |
| `tags` (missing) Added Enables HF search indexing | |
| --- | |
| ## Development | |
| ```bash | |
| # Run tests | |
| uv run pytest tests/ -v --cov=. --cov-report=term-missing | |
| # Lint | |
| uv run ruff check . | |
| uv run mypy . --strict | |
| # Format | |
| uv run ruff format . | |
| ``` | |
| Test a training run locally (CPU, minimal config): | |
| ```bash | |
| uv run python -m training.pipeline \ | |
| --config configs/smoke_test.json \ | |
| --dry-run | |
| ``` | |
| --- | |
| ### Environment Variables | |
| Variable Required Description | |
| `SECRET_KEY` Yes JWT signing secret (min 32 chars) | |
| `HF_TOKEN` Yes HuggingFace token with write access | |
| `DATABASE_URL` Yes PostgreSQL connection string | |
| `LOG_LEVEL` No `DEBUG`/`INFO`/`WARNING` (default: `INFO`) | |
| `MAX_CONCURRENT_JOBS` No Max parallel training jobs (default: `2`) | |
| `MODEL_CACHE_DIR` No Local model cache path (default: `./cache`) | |
| --- | |
| ## License | |
| MIT β see LICENSE |