"""See full script at: https://huggingface.co/spaces/djordjebatic/sandbox-6c414f08 Self-contained synthetic data generation for FCA classification. Runs on a GPU with local Qwen2.5-7B-Instruct model. Uses transformers pipeline for generation + LLM-as-judge quality checks. Usage: python generate_self_contained.py Environment variables: TEACHER_MODEL - Model ID (default: Qwen/Qwen2.5-7B-Instruct) PROMPTS_PER_LABEL - Generation prompts per class (default: 50) OUTPUT_DIR - Output directory (default: ./generated_data) DATASET_HUB_ID - HF Hub dataset ID to push to """ # Full script available in the sandbox workspace