File size: 632 Bytes
74ececf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"""See full script at: https://huggingface.co/spaces/djordjebatic/sandbox-6c414f08
Self-contained synthetic data generation for FCA classification.
Runs on a GPU with local Qwen2.5-7B-Instruct model.
Uses transformers pipeline for generation + LLM-as-judge quality checks.

Usage:
  python generate_self_contained.py
  
Environment variables:
  TEACHER_MODEL - Model ID (default: Qwen/Qwen2.5-7B-Instruct)
  PROMPTS_PER_LABEL - Generation prompts per class (default: 50)
  OUTPUT_DIR - Output directory (default: ./generated_data)
  DATASET_HUB_ID - HF Hub dataset ID to push to
"""
# Full script available in the sandbox workspace