File size: 632 Bytes
74ececf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | """See full script at: https://huggingface.co/spaces/djordjebatic/sandbox-6c414f08
Self-contained synthetic data generation for FCA classification.
Runs on a GPU with local Qwen2.5-7B-Instruct model.
Uses transformers pipeline for generation + LLM-as-judge quality checks.
Usage:
python generate_self_contained.py
Environment variables:
TEACHER_MODEL - Model ID (default: Qwen/Qwen2.5-7B-Instruct)
PROMPTS_PER_LABEL - Generation prompts per class (default: 50)
OUTPUT_DIR - Output directory (default: ./generated_data)
DATASET_HUB_ID - HF Hub dataset ID to push to
"""
# Full script available in the sandbox workspace
|