| """See full script at: https://huggingface.co/spaces/djordjebatic/sandbox-6c414f08 | |
| Self-contained synthetic data generation for FCA classification. | |
| Runs on a GPU with local Qwen2.5-7B-Instruct model. | |
| Uses transformers pipeline for generation + LLM-as-judge quality checks. | |
| Usage: | |
| python generate_self_contained.py | |
| Environment variables: | |
| TEACHER_MODEL - Model ID (default: Qwen/Qwen2.5-7B-Instruct) | |
| PROMPTS_PER_LABEL - Generation prompts per class (default: 50) | |
| OUTPUT_DIR - Output directory (default: ./generated_data) | |
| DATASET_HUB_ID - HF Hub dataset ID to push to | |
| """ | |
| # Full script available in the sandbox workspace | |