fca-financial-classifier / scripts /generate_self_contained.py
djordjebatic's picture
Add generation script reference
74ececf verified
"""See full script at: https://huggingface.co/spaces/djordjebatic/sandbox-6c414f08
Self-contained synthetic data generation for FCA classification.
Runs on a GPU with local Qwen2.5-7B-Instruct model.
Uses transformers pipeline for generation + LLM-as-judge quality checks.
Usage:
python generate_self_contained.py
Environment variables:
TEACHER_MODEL - Model ID (default: Qwen/Qwen2.5-7B-Instruct)
PROMPTS_PER_LABEL - Generation prompts per class (default: 50)
OUTPUT_DIR - Output directory (default: ./generated_data)
DATASET_HUB_ID - HF Hub dataset ID to push to
"""
# Full script available in the sandbox workspace