Spaces:

beAnalytic
/

Training

Runtime error

App Files Files Community

amarorn commited on Dec 24, 2025

Commit

d6c8680

1 Parent(s): 3f4afaf

Corrigir repositório para beAnalytic/Training

Browse files

Files changed (3) hide show

Dockerfile +37 -0
README.md +57 -5
train.py +294 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,37 @@

+FROM huggingface/transformers-pytorch-gpu:latest
+WORKDIR /app
+# Instalar dependências do sistema
+# python-is-python3 cria automaticamente o symlink python -> python3
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    python3 \
+    python3-pip \
+    python-is-python3 \
+    && rm -rf /var/lib/apt/lists/*
+# Verificar que python está disponível (entrypoint do NVIDIA precisa)
+RUN python --version && \
+    python3 --version && \
+    echo "✅ Python disponível: $(which python)"
+# Instalar dependências Python
+COPY requirements.txt .
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir -r requirements.txt
+# Copiar scripts de treinamento
+COPY train.py /app/train.py
+COPY app.py /app/app.py
+# Configurar variáveis de ambiente padrão (podem ser sobrescritas)
+ENV MODEL_NAME=microsoft/Phi-3-mini-4k-instruct
+ENV DATASET_REPO=beAnalytic/eda-training-dataset
+ENV OUTPUT_REPO=beAnalytic/eda-llm-model
+# Executar treinamento
+# Usar 'python' (que será o symlink para python3 criado acima)
+# O entrypoint do NVIDIA espera 'python' estar disponível
+CMD ["python", "/app/app.py"]

README.md CHANGED Viewed

@@ -1,10 +1,62 @@
 ---
-title: Training
-emoji: 🐠
-colorFrom: yellow
-colorTo: yellow
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: EDA Model Training
+emoji: 🤖
+colorFrom: blue
+colorTo: purple
 sdk: docker
+sdk_version: "latest"
+app_file: app.py
 pinned: false
 ---
+# Treinamento do Modelo EDA
+Este Space contém o script de treinamento para o modelo de Análise Exploratória de Dados (EDA).
+## Configuração
+### Variáveis de Ambiente Obrigatórias
+**⚠️ IMPORTANTE**: Configure a variável de ambiente `HF_TOKEN` no Settings do Space para habilitar o push automático dos checkpoints para o Hub.
+### Variáveis de Ambiente
+Configure as seguintes variáveis de ambiente no Settings do Space:
+- **`HF_TOKEN`** (OBRIGATÓRIO): Seu token do HuggingFace com permissões de escrita
+  - Gere em: https://huggingface.co/settings/tokens
+  - Permissões necessárias: `write`
+  - Sem este token, o treinamento funcionará mas os checkpoints não serão enviados ao Hub
+- `MODEL_NAME`: Modelo base (padrão: `microsoft/Phi-3-mini-4k-instruct`)
+- `DATASET_REPO`: ID do dataset (padrão: `beAnalytic/eda-training-dataset`)
+- `OUTPUT_REPO`: ID do modelo de saída (padrão: `beAnalytic/eda-llm-model`)
+### Como Configurar HF_TOKEN no Space
+1. Acesse: https://huggingface.co/spaces/beAnalytic/Training/settings
+2. Vá para a seção **"Repository secrets"**
+3. Clique em **"New secret"**
+4. Nome: `HF_TOKEN`
+5. Valor: Cole seu token do HuggingFace
+6. Clique em **"Add secret"**
+**Nota**: O token será usado automaticamente pelo script durante o treinamento.
+### Execução
+O script `train.py` será executado automaticamente quando o Space for iniciado.
+## Estrutura
+- `train.py`: Script principal de treinamento
+- `training_config.json`: Configurações de treinamento
+- `requirements.txt`: Dependências Python
+## Monitoramento
+Acompanhe o progresso do treinamento através dos logs do Space na aba "Logs".
+## Resultados
+O modelo treinado será salvo automaticamente no HuggingFace Hub no repositório especificado em `OUTPUT_REPO`.

train.py ADDED Viewed

	@@ -0,0 +1,294 @@

+#!/usr/bin/env python3
+"""
+Script de treinamento gerado para HuggingFace Training Platform.
+Execute este script no HuggingFace Training ou localmente.
+"""
+from datasets import load_dataset
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    TrainingArguments,
+    Trainer,
+    DataCollatorForLanguageModeling,
+)
+from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
+from transformers import BitsAndBytesConfig
+from huggingface_hub import login as hf_login, logout as hf_logout
+import torch
+import os
+# Configuração (pode ser sobrescrita por variáveis de ambiente)
+MODEL_NAME = os.getenv("MODEL_NAME", "microsoft/Phi-3-mini-4k-instruct")
+DATASET_REPO = os.getenv("DATASET_REPO", "beAnalytic/eda-training-dataset")
+OUTPUT_REPO = os.getenv("OUTPUT_REPO", "beAnalytic/eda-llm-model")
+HF_TOKEN = os.getenv("HF_TOKEN")
+# Autenticar no HuggingFace se token estiver disponível
+# IMPORTANTE: Limpar qualquer token do ambiente se não estiver configurado explicitamente
+push_to_hub_enabled = False
+# Primeiro, limpar qualquer autenticação existente para garantir estado limpo
+try:
+    hf_logout()
+except Exception:
+    pass
+# Limpar tokens alternativos do ambiente (mantém HF_TOKEN que será usado depois)
+tokens_to_remove = ["HUGGING_FACE_HUB_TOKEN", "HF_HUB_TOKEN", "HUGGINGFACE_HUB_TOKEN"]
+for token_var in tokens_to_remove:
+    if token_var in os.environ:
+        del os.environ[token_var]
+# Limpar cache de autenticação (hf_logout já faz isso)
+try:
+    hf_logout()
+except Exception:
+    pass
+if HF_TOKEN and HF_TOKEN.strip():
+    print("Autenticando no HuggingFace Hub...")
+    try:
+        hf_login(token=HF_TOKEN, add_to_git_credential=False)
+        print("✅ Autenticação bem-sucedida!")
+        push_to_hub_enabled = True
+    except Exception as e:
+        print(f"⚠️ Aviso: Erro ao autenticar no HuggingFace: {e}")
+        print("O treinamento continuará, mas o push para o Hub será desabilitado.")
+        push_to_hub_enabled = False
+        # Limpar novamente após falha
+        try:
+            hf_logout()
+        except Exception:
+            pass
+else:
+    print("⚠️ Aviso: HF_TOKEN não encontrado ou vazio. O push para o Hub será desabilitado.")
+    print("Configure a variável de ambiente HF_TOKEN no Space para habilitar push automático.")
+    push_to_hub_enabled = False
+# Carregar dataset
+print(f"Carregando dataset: {DATASET_REPO}")
+dataset = load_dataset(DATASET_REPO)
+# Carregar modelo e tokenizer
+print(f"Carregando modelo: {MODEL_NAME}")
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+tokenizer.pad_token = tokenizer.eos_token
+# Configurar quantização 4-bit
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.float16,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_use_double_quant=True,
+)
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    quantization_config=bnb_config,
+    device_map="auto",
+    trust_remote_code=True,
+)
+# Preparar modelo para LoRA
+model = prepare_model_for_kbit_training(model)
+# Configurar LoRA
+peft_config = LoraConfig(
+    r=16,
+    lora_alpha=32,
+    target_modules=['q_proj', 'v_proj', 'k_proj', 'o_proj'],
+    lora_dropout=0.1,
+    bias="none",
+    task_type="CAUSAL_LM",
+)
+model = get_peft_model(model, peft_config)
+# Formatar prompts
+def format_prompt(example):
+    system_prompt = (
+        "Você é um analista de dados sênior realizando uma Análise Exploratória de Dados (EDA) "
+        "com rigor estatístico, honestidade analítica e pensamento crítico.\n\n"
+        "Seu objetivo não é gerar insights a qualquer custo, mas avaliar se os dados possuem "
+        "estrutura informativa, comportamento emergente ou apenas relações estruturais triviais."
+    )
+    instruction = example.get("instruction", "")
+    input_text = example.get("input", "")
+    output_text = example.get("output", "")
+    prompt = f"<|system|>\n{system_prompt}\n<|user|>\n{instruction}\n\n{input_text}\n<|assistant|>\n{output_text}<|end|>"
+    return {"text": prompt}
+# Aplicar formatação
+train_dataset = dataset["train"].map(format_prompt, remove_columns=dataset["train"].column_names)
+eval_dataset = dataset["validation"].map(format_prompt, remove_columns=dataset["validation"].column_names)
+# Tokenizar
+def tokenize_function(examples):
+    return tokenizer(
+        examples["text"],
+        truncation=True,
+        max_length=1024,
+        padding="max_length",
+    )
+train_dataset = train_dataset.map(tokenize_function, batched=True, remove_columns=["text"])
+eval_dataset = eval_dataset.map(tokenize_function, batched=True, remove_columns=["text"])
+# Configurar argumentos de treinamento
+# push_to_hub_enabled já foi definido acima durante a autenticação
+# Argumentos base de treinamento
+training_args_dict = {
+    "output_dir": "./results",
+    "num_train_epochs": 3,
+    "per_device_train_batch_size": 4,
+    "per_device_eval_batch_size": 4,
+    "learning_rate": 3e-05,
+    "warmup_steps": 100,
+    "logging_steps": 10,
+    "save_steps": 500,
+    "eval_strategy": "steps",
+    "eval_steps": 500,
+    "save_total_limit": 3,
+    "load_best_model_at_end": True,
+    "fp16": True,
+    "gradient_accumulation_steps": 2,
+}
+# Adicionar parâmetros do Hub apenas se autenticado
+# IMPORTANTE: Não passar NENHUM parâmetro relacionado ao Hub quando não há token
+# para evitar que o Trainer tente inicializar o repositório
+if push_to_hub_enabled:
+    training_args_dict.update({
+        "push_to_hub": True,
+        "hub_model_id": OUTPUT_REPO,
+        "hub_strategy": "checkpoint",
+    })
+else:
+    # Garantir explicitamente que push_to_hub está desabilitado
+    # E que hub_model_id é None (não passar o parâmetro pode fazer o Trainer usar um valor padrão)
+    training_args_dict["push_to_hub"] = False
+    training_args_dict["hub_model_id"] = None
+training_args = TrainingArguments(**training_args_dict)
+if push_to_hub_enabled:
+    print(f"✅ Push para Hub habilitado: {OUTPUT_REPO}")
+else:
+    print("ℹ️ Push para Hub desabilitado (HF_TOKEN não configurado)")
+    print("Os checkpoints serão salvos apenas localmente em ./results")
+# Data collator
+data_collator = DataCollatorForLanguageModeling(
+    tokenizer=tokenizer,
+    mlm=False,
+)
+# Trainer
+# IMPORTANTE: Garantir que não há token no ambiente quando push_to_hub está desabilitado
+# para evitar que o Trainer tente inicializar o repositório durante __init__
+if not push_to_hub_enabled:
+    # Limpar todos os possíveis tokens do ambiente
+    tokens_to_remove = ["HUGGING_FACE_HUB_TOKEN", "HF_HUB_TOKEN", "HUGGINGFACE_HUB_TOKEN"]
+    for token_var in tokens_to_remove:
+        if token_var in os.environ:
+            del os.environ[token_var]
+    # Fazer logout para garantir que não há token no cache
+    try:
+        hf_logout()
+    except Exception:
+        pass
+    # Verificação de segurança - garantir que push_to_hub está False
+    if training_args.push_to_hub:
+        print("⚠️ AVISO: push_to_hub está True mas não há token! Forçando False...")
+        training_args.push_to_hub = False
+    if training_args.hub_model_id is not None:
+        print("⚠️ AVISO: hub_model_id está definido mas não há token! Removendo...")
+        training_args.hub_model_id = None
+print(f"🔍 Debug: push_to_hub={training_args.push_to_hub}, hub_model_id={training_args.hub_model_id}")
+print(f"🔍 Debug: push_to_hub_enabled={push_to_hub_enabled}")
+# Verificação final: se push_to_hub está False, garantir que não há token no cache
+if not push_to_hub_enabled:
+    # Limpar qualquer token residual do cache
+    try:
+        hf_logout()
+    except Exception:
+        pass
+    # Verificação final dos argumentos
+    if training_args.push_to_hub or training_args.hub_model_id:
+        print("❌ ERRO: push_to_hub ou hub_model_id ainda está definido! Corrigindo...")
+        training_args.push_to_hub = False
+        training_args.hub_model_id = None
+print(f"✅ Criando Trainer com push_to_hub={training_args.push_to_hub}, hub_model_id={training_args.hub_model_id}")
+# Criar Trainer
+# Se push_to_hub está False, garantir que não há token no cache antes de criar
+if not push_to_hub_enabled:
+    # Última verificação: limpar qualquer token residual
+    try:
+        hf_logout()
+    except Exception:
+        pass
+try:
+    trainer = Trainer(
+        model=model,
+        args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+        data_collator=data_collator,
+    )
+    print("✅ Trainer criado com sucesso!")
+except Exception as e:
+    if "401" in str(e) or "Unauthorized" in str(e):
+        print("❌ ERRO: Trainer tentou autenticar sem token válido!")
+        print("Isso não deveria acontecer. Verificando configuração...")
+        print(f"push_to_hub={training_args.push_to_hub}")
+        print(f"hub_model_id={training_args.hub_model_id}")
+        print(f"push_to_hub_enabled={push_to_hub_enabled}")
+        # Tentar novamente após limpar tudo
+        try:
+            hf_logout()
+        except Exception:
+            pass
+        # Forçar push_to_hub=False novamente
+        training_args.push_to_hub = False
+        training_args.hub_model_id = None
+        print("Tentando criar Trainer novamente com push_to_hub=False...")
+        trainer = Trainer(
+            model=model,
+            args=training_args,
+            train_dataset=train_dataset,
+            eval_dataset=eval_dataset,
+            data_collator=data_collator,
+        )
+    else:
+        raise
+# Treinar
+print("Iniciando treinamento...")
+trainer.train()
+# Fazer push final apenas se autenticado
+if push_to_hub_enabled:
+    print(f"Fazendo push do modelo final para {OUTPUT_REPO}")
+    try:
+        trainer.push_to_hub()
+        print("✅ Push para Hub concluído!")
+    except Exception as e:
+        print(f"❌ Erro ao fazer push para Hub: {e}")
+        print("Os checkpoints estão salvos localmente em ./results")
+else:
+    print("ℹ️ Push para Hub pulado (HF_TOKEN não configurado)")
+    print("Os checkpoints estão salvos em ./results")
+print("✅ Treinamento concluído!")