Spaces:

histlearn
/

communitynotesbr

Sleeping

App Files Files Community

histlearn commited on Apr 24

Commit

a2ad1d2

verified ·

1 Parent(s): 5938c74

feat: endpoint FT-Solo inicial (Qwen3-Embedding-4B + LoRA fold 01 + linear head)

Browse files

Sobe o código do endpoint (app.py, inference.py, config.py) e os artefatos do FT-Solo (adapter LoRA + cabeça linear do melhor fold segundo o manifesto). Gerado via Colab a partir do tar + zip no Drive.

Files changed (10) hide show

.gitignore +25 -0
README.md +190 -7
app.py +364 -0
artifacts/fold_01_adapter/README.md +206 -0
artifacts/fold_01_adapter/adapter_config.json +49 -0
artifacts/fold_01_adapter/adapter_model.safetensors +3 -0
artifacts/fold_01_head.pt +3 -0
config.py +54 -0
inference.py +225 -0
requirements.txt +8 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,25 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+.venv/
+.env
+# IDE / OS
+.DS_Store
+.idea/
+.vscode/
+*.swp
+# Jupyter
+.ipynb_checkpoints/
+# Artefatos opcionais/pesados — NÃO committados no Space
+# (o adapter e a head, que são obrigatórios, não aparecem aqui de propósito)
+artifacts/embeddings_qwen3_4b_finetuned.npz
+artifacts/dataset.parquet
+artifacts/*.zip
+artifacts/_raw/
+# Logs
+*.log

README.md CHANGED Viewed

@@ -1,14 +1,197 @@
 ---
-title: Communitynotesbr
-emoji: 📊
-colorFrom: red
 colorTo: gray
 sdk: gradio
-sdk_version: 6.13.0
 app_file: app.py
 pinned: false
-license: other
-short_description: classificação e análise interpretável de Notas da Comunidade
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Notinhas Endpoint (FT-Solo)
+emoji: 📝
+colorFrom: green
 colorTo: gray
 sdk: gradio
+sdk_version: 5.50.0
+python_version: "3.10"
 app_file: app.py
 pinned: false
+short_description: Classificador de utilidade para community notes em PT-BR.
+models:
+  - Qwen/Qwen3-Embedding-4B
 ---
+# Notinhas — endpoint de utilidade (FT-Solo)
+Endpoint privado do modelo **FT-Solo** do projeto: dado o texto de uma *community
+note* em português, devolve a probabilidade de ela ser classificada como "útil"
+(`label_binary_strict = 1`), junto com uma leitura opcional da contribuição de
+cada palavra.
+Arquitetura: **Qwen3-Embedding-4B + LoRA + cabeça linear**, idêntica ao
+`predict_from_text` do notebook `explicabilidade_qwen4b_redesign` em modo fiel
+(fold 01).
+## Estrutura do repositório
+```
+.
+├── app.py              # Gradio UI + API (abas Prever / Explicar / Sobre)
+├── inference.py        # Loader + predict + explain (occlusion word-level)
+├── config.py           # Constantes (modelo, prompt, paths, thresholds)
+├── requirements.txt    # Dependências Python
+├── README.md           # Este arquivo (com YAML header lido pelo HF)
+├── .gitignore
+└── artifacts/          # ← você popula isso (veja a seção Setup)
+    ├── fold_01_adapter/   # Pasta do adapter LoRA
+    │   ├── adapter_config.json
+    │   └── adapter_model.safetensors
+    └── fold_01_head.pt    # State dict do nn.Linear(2560, 1)
+```
+## Setup — do zero até o Space no ar
+### 1. Criar o Space (privado)
+Na UI do Hugging Face:
+1. **New Space**.
+2. SDK: **Gradio**.
+3. Hardware: **T4 small** (recomendado — caber na memória em bf16 e inferência
+   em ~0,5 s). **A10G small** dá latência ainda menor. **ZeroGPU** funciona mas
+   com cold-start mais longo. **CPU** roda, porém cada inferência leva 20–40 s.
+4. Visibility: **Private**.
+### 2. Popular `artifacts/`
+Os pesos vêm do pipeline do projeto. O zip base do Drive (`artefatos_projeto.zip`)
+traz as pastas `qwen4b_adapters/` e `qwen4b_heads/`. Rode localmente:
+```bash
+pip install gdown
+gdown "https://drive.google.com/uc?id=1_wCCxZG25tcGIVHgrdfOj54vI5Iw6MUF" \
+      -O artefatos_projeto.zip
+unzip -q artefatos_projeto.zip -d _raw/
+# Estrutura esperada pelo Space:
+mkdir -p artifacts
+cp -r _raw/qwen4b_adapters/fold_01_adapter artifacts/
+cp    _raw/qwen4b_heads/fold_01_head.pt    artifacts/
+```
+> **Qual fold usar?** O notebook escolhe dinamicamente o "melhor fold" via
+> `qwen4b_ftsolo_manifest.json`. Para servir em produção é coerente reusar o
+> mesmo. Se o manifesto apontar para outro fold (digamos, `fold_03`), renomeie
+> os arquivos acima para `fold_01_adapter/` e `fold_01_head.pt` **ou** edite
+> `config.py` para apontar para os nomes reais.
+### 3. Commitar e subir
+Spaces são repositórios git hospedados no HF. Dentro da pasta clonada do Space:
+```bash
+git lfs install                 # safetensors > 10 MB usam LFS
+git lfs track "*.safetensors"
+git lfs track "*.pt"
+git add .gitattributes artifacts/
+git add app.py inference.py config.py requirements.txt README.md .gitignore
+git commit -m "feat: endpoint inicial FT-Solo"
+git push
+```
+O adapter do Qwen3-Embedding-4B em LoRA costuma ficar entre **20 e 80 MB**
+(dependendo do rank e dos módulos-alvo). A cabeça é ~20 KB. Tudo cabe
+confortavelmente sem apertar quota.
+### 4. (Opcional) Secrets
+Em **Settings → Variables and secrets**:
+- `HF_TOKEN` — só necessário se `Qwen/Qwen3-Embedding-4B` virar gated no futuro.
+  Hoje o modelo é público, então você pode ignorar.
+### 5. Primeiro boot
+Na primeira inicialização o Space:
+1. Instala `requirements.txt` (~1 min).
+2. Baixa `Qwen/Qwen3-Embedding-4B` da HF (~8 GB, ~2–3 min).
+3. Carrega adapter + head (~5 s).
+4. Fica pronto — e o warm-up do modelo já aconteceu, o primeiro request é rápido.
+Acompanhe pela aba **Logs** do Space.
+## Uso
+### Via UI web
+Basta acessar a URL privada do Space. Três abas:
+- **Prever** — score + label + faixa de confiança.
+- **Explicar** — o mesmo + texto com destaque por contribuição de palavra, mais
+  uma tabela dos top 5 tokens de cada lado.
+- **Sobre** — detalhes técnicos e limitações.
+### Via `gradio_client` (Python)
+```python
+from gradio_client import Client
+client = Client("<seu-usuario>/<nome-do-space>", hf_token="hf_...")
+# Só a probabilidade
+card_html, payload = client.predict(
+    "Segundo o Ministério da Saúde, o número é falso. Fonte: https://...",
+    api_name="/predict",
+)
+print(payload)
+# {'proba_util': 0.87, 'label': 'Útil', 'confidence_band': 'Alta'}
+# Com explicação
+card, highlight_html, tokens_html, full = client.predict(
+    "Essa nota é claramente desnecessária, opinião pessoal.",
+    api_name="/explain",
+)
+print(full["tokens"][:3], full["contributions"][:3])
+```
+### Via HTTP puro
+Gradio 5 expõe as rotas em `/gradio_api/call/<api_name>`. Veja a doc oficial
+em `https://<seu-space>.hf.space/?view=api` — o próprio Space gera a documentação
+e exemplos de `curl` para os dois endpoints.
+## Arquitetura e decisões
+### Por que este stack
+- **Gradio 5 em vez de FastAPI puro**: entrega UI + HTTP API de uma vez, com doc
+  automática. Para um endpoint privado de MVP, dobrar a utilidade sem dobrar o
+  código é o trade certo.
+- **Occlusion em vez de SHAP Partition**: o notebook gasta 12–15 s/nota em SHAP
+  textual, justamente porque explora combinações de subconjuntos. Para servir
+  em tempo real, leave-one-out por palavra dá um `Δ` por token em ~N+1 forward
+  passes — 1 a 2 s para notas típicas, resultado visualmente comparável.
+- **Fold único**: o notebook também usou fold único para SHAP textual. Ensemble
+  dos 5 folds é a extensão natural (listar `[(adapter_i, head_i) for i in 1..5]`,
+  mediar as sigmóides), mas não é obrigatório para o MVP.
+### O que muda se você quiser escalar
+- **Ensemble**: substituir `load_model()` por `load_models()` devolvendo uma lista
+  de pares `(encoder, head)`. `predict_batch` itera, mediana ou média das
+  probabilidades. Dobra VRAM e latência — só vale quando a performance marginal
+  justificar.
+- **Vizinhos semânticos** (como na seção 5 do notebook): exige embutir
+  `embeddings_qwen3_4b_finetuned.npz` (≈200 MB) e o dataset mestre para
+  recuperar texto + label. É uma extensão natural — crie um `artifacts/knn_index/`
+  com FAISS e adicione uma aba "Vizinhos" ao Gradio.
+- **Inference Endpoint** dedicado: se o Space virar gargalo, o mesmo repositório
+  de código pode ser deployado como **Inference Endpoint pago** da HF, que
+  aguenta paralelismo real e autoescala.
+## Limitações
+- O rótulo `helpful` mede **aceitabilidade bipartidária**, não qualidade editorial
+  — o notebook exemplifica casos em que vizinhos semânticos idênticos recebem
+  rótulos opostos por razões políticas, não textuais.
+- Textos longos são truncados em 256 tokens.
+- Predições são dependentes do fold servido; o notebook observou variação pequena
+  mas não nula entre folds.
+## Créditos
+Baseado no pipeline e no notebook de explicabilidade do projeto Notinhas.
+O código aqui é o protótipo funcional da função `predict_from_text` virado serviço.

app.py ADDED Viewed

	@@ -0,0 +1,364 @@

+"""Gradio app — endpoint de utilidade para community notes em PT-BR.
+Expõe:
+  - UI web com três abas: Prever / Explicar / Sobre.
+  - API HTTP em /gradio_api/call/predict e /gradio_api/call/explain (gerada
+    automaticamente pelo Gradio a partir dos api_name).
+Para clientes Python, use gradio_client:
+    from gradio_client import Client
+    c = Client("<user>/<space>", hf_token="hf_...")
+    score = c.predict("texto da nota...", api_name="/predict")
+"""
+from __future__ import annotations
+import html
+import logging
+import os
+import traceback
+import gradio as gr
+from config import (
+    CONFIDENCE_BOUNDS_ALTA,
+    CONFIDENCE_BOUNDS_MEDIA,
+    THRESHOLD_UTIL,
+)
+from inference import DEVICE, explain_occlusion, predict_one, warmup
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+log = logging.getLogger("app")
+# ---------------------------------------------------------------------------
+# Warm-up agressivo — queremos que o primeiro request não pague cold-start
+# ---------------------------------------------------------------------------
+MODEL_READY: bool
+MODEL_ERROR: str | None
+try:
+    warmup()
+    MODEL_READY = True
+    MODEL_ERROR = None
+    log.info("Modelo carregado no startup. Device=%s", DEVICE)
+except Exception as exc:  # noqa: BLE001 — queremos pegar qualquer falha de carregamento
+    MODEL_READY = False
+    MODEL_ERROR = f"{type(exc).__name__}: {exc}"
+    log.error("Falha ao carregar modelo no startup:\n%s", traceback.format_exc())
+# ---------------------------------------------------------------------------
+# Helpers de apresentação
+# ---------------------------------------------------------------------------
+def _confidence_band(p: float) -> str:
+    lo_a, hi_a = CONFIDENCE_BOUNDS_ALTA
+    lo_m, hi_m = CONFIDENCE_BOUNDS_MEDIA
+    if p <= lo_a or p >= hi_a:
+        return "Alta"
+    if p <= lo_m or p >= hi_m:
+        return "Média"
+    return "Baixa"
+def _label(p: float) -> str:
+    return "Útil" if p >= THRESHOLD_UTIL else "Não-útil"
+def _score_card_html(p: float) -> str:
+    """Card principal do resultado — badge de label + badge de confiança + probabilidade."""
+    lbl = _label(p)
+    band = _confidence_band(p)
+    lbl_colors = {"Útil": ("#d8f3dc", "#1b4332"), "Não-útil": ("#fde2e4", "#9d0208")}
+    band_colors = {
+        "Alta": ("#d8f3dc", "#1b4332"),
+        "Média": ("#fff3bf", "#7c5c00"),
+        "Baixa": ("#e9ecef", "#495057"),
+    }
+    lbg, lfg = lbl_colors[lbl]
+    bbg, bfg = band_colors[band]
+    return f"""
+    <div style="background:#fff;border:1px solid #e9ecef;border-radius:16px;
+                padding:18px 22px;box-shadow:0 4px 14px rgba(0,0,0,0.04);
+                font-family:system-ui, -apple-system, sans-serif;">
+      <div style="display:flex;justify-content:space-between;align-items:center;
+                  gap:12px;flex-wrap:wrap;">
+        <div style="display:flex;gap:8px;flex-wrap:wrap;">
+          <span style="background:{lbg};color:{lfg};padding:4px 12px;
+                       border-radius:999px;font-size:13px;font-weight:700;">{lbl}</span>
+          <span style="background:{bbg};color:{bfg};padding:4px 12px;
+                       border-radius:999px;font-size:13px;font-weight:700;">
+            Confiança {band}
+          </span>
+        </div>
+        <div style="text-align:right;">
+          <div style="font-size:12px;color:#6c757d;">P(útil)</div>
+          <div style="font-size:32px;font-weight:800;color:#2b2d42;
+                      font-variant-numeric:tabular-nums;">{p:.3f}</div>
+        </div>
+      </div>
+    </div>
+    """
+def _contrib_color(v: float, v_max: float) -> str:
+    if v_max <= 0:
+        return "transparent"
+    intensity = min(1.0, abs(v) / v_max)
+    alpha = 0.15 + 0.65 * intensity  # 0.15 .. 0.80
+    if v > 0:
+        return f"rgba(95, 168, 143, {alpha:.3f})"  # verde (PALETA['util'] do notebook)
+    return f"rgba(224, 123, 107, {alpha:.3f})"  # coral (PALETA['nao_util'])
+def _highlighted_text_html(tokens: list[str], contribs: list[float]) -> str:
+    if not tokens:
+        return "<em>(sem palavras para destacar)</em>"
+    v_max = max((abs(c) for c in contribs), default=1e-9) or 1e-9
+    spans = []
+    for tok, c in zip(tokens, contribs):
+        bg = _contrib_color(c, v_max)
+        spans.append(
+            f'<span style="background:{bg};padding:2px 4px;border-radius:4px;'
+            f'margin:0 1px;" title="Δ={c:+.4f}">{html.escape(tok)}</span>'
+        )
+    return (
+        '<div style="font-size:15px;line-height:2;color:#212529;'
+        'font-family:system-ui, -apple-system, sans-serif;padding:4px;">'
+        + " ".join(spans)
+        + "</div>"
+    )
+def _top_tokens_table_html(
+    tokens: list[str], contribs: list[float], k: int = 5
+) -> str:
+    pairs = list(zip(tokens, contribs))
+    pos = sorted([p for p in pairs if p[1] > 0], key=lambda x: -x[1])[:k]
+    neg = sorted([p for p in pairs if p[1] < 0], key=lambda x: x[1])[:k]
+    def _row(tok: str, v: float, side: str) -> str:
+        color = "#1b4332" if side == "pos" else "#9d0208"
+        sign = "+" if v > 0 else ""
+        return (
+            f'<tr><td style="padding:5px 8px;color:{color};">'
+            f"{html.escape(tok)}</td>"
+            f'<td style="padding:5px 8px;text-align:right;color:{color};'
+            f'font-variant-numeric:tabular-nums;">{sign}{v:.4f}</td></tr>'
+        )
+    empty = '<tr><td colspan="2" style="padding:6px;color:#9aa1aa;"><em>—</em></td></tr>'
+    pos_rows = "".join(_row(t, v, "pos") for t, v in pos) or empty
+    neg_rows = "".join(_row(t, v, "neg") for t, v in neg) or empty
+    return f"""
+    <div style="display:grid;grid-template-columns:1fr 1fr;gap:14px;margin-top:12px;
+                font-family:system-ui, -apple-system, sans-serif;">
+      <div style="background:#fcfcfd;border:1px solid #eef2f7;border-radius:12px;padding:12px;">
+        <div style="font-size:13px;font-weight:700;color:#1b4332;margin-bottom:6px;">
+          Empurram para útil
+        </div>
+        <table style="width:100%;border-collapse:collapse;font-size:13px;">{pos_rows}</table>
+      </div>
+      <div style="background:#fcfcfd;border:1px solid #eef2f7;border-radius:12px;padding:12px;">
+        <div style="font-size:13px;font-weight:700;color:#9d0208;margin-bottom:6px;">
+          Empurram para não-útil
+        </div>
+        <table style="width:100%;border-collapse:collapse;font-size:13px;">{neg_rows}</table>
+      </div>
+    </div>
+    """
+# ---------------------------------------------------------------------------
+# Handlers — retornam HTML para a UI + JSON para a API
+# ---------------------------------------------------------------------------
+def handle_predict(text: str):
+    text = (text or "").strip()
+    if not text:
+        return "<em>Forneça um texto.</em>", {"error": "empty_input"}
+    if not MODEL_READY:
+        err = MODEL_ERROR or "modelo indisponível"
+        return (
+            f"<em>Modelo indisponível: {html.escape(err)}</em>",
+            {"error": "model_unavailable", "detail": err},
+        )
+    p = predict_one(text)
+    return (
+        _score_card_html(p),
+        {
+            "proba_util": p,
+            "label": _label(p),
+            "confidence_band": _confidence_band(p),
+        },
+    )
+def handle_explain(text: str):
+    text = (text or "").strip()
+    if not text:
+        return "<em>Forneça um texto.</em>", "", "", {"error": "empty_input"}
+    if not MODEL_READY:
+        err = MODEL_ERROR or "modelo indisponível"
+        return (
+            f"<em>Modelo indisponível: {html.escape(err)}</em>",
+            "",
+            "",
+            {"error": "model_unavailable", "detail": err},
+        )
+    result = explain_occlusion(text)
+    p = result["proba_full"]
+    tokens = result["tokens"]
+    contribs = result["contributions"]
+    return (
+        _score_card_html(p),
+        _highlighted_text_html(tokens, contribs),
+        _top_tokens_table_html(tokens, contribs),
+        {
+            "proba_util": p,
+            "label": _label(p),
+            "confidence_band": _confidence_band(p),
+            "tokens": tokens,
+            "contributions": contribs,
+        },
+    )
+# ---------------------------------------------------------------------------
+# UI
+# ---------------------------------------------------------------------------
+EXAMPLE_UTIL = (
+    "Segundo dados oficiais do Ministério da Saúde, o número citado no tweet é falso. "
+    "A fonte correta pode ser conferida no link: https://www.gov.br/saude/..."
+)
+EXAMPLE_NAO = "Essa nota é claramente desnecessária, é opinião pessoal do autor."
+INTRO_MD = """
+# Notinhas — endpoint de utilidade (FT-Solo)
+Classificador de utilidade para **community notes em português**, baseado em
+**Qwen3-Embedding-4B + LoRA + cabeça linear** (modo fiel do FT-Solo, fold 01).
+- **Prever** — score + label + faixa de confiança.
+- **Explicar** — o mesmo + contribuição de cada palavra via leave-one-out.
+- **Sobre** — detalhes técnicos e limitações.
+"""
+with gr.Blocks(
+    title="Notinhas — endpoint de utilidade (FT-Solo)",
+    theme=gr.themes.Soft(primary_hue="emerald", neutral_hue="slate"),
+) as demo:
+    gr.Markdown(INTRO_MD)
+    if not MODEL_READY:
+        gr.Markdown(
+            f"""
+> ⚠️ **Modelo não carregou.** Detalhe: `{html.escape(MODEL_ERROR or '')}`
+>
+> Verifique que `artifacts/fold_01_adapter/` e `artifacts/fold_01_head.pt` estão presentes
+> no repositório do Space. Se o modelo base exigir autenticação, configure `HF_TOKEN` em
+> **Settings → Variables and secrets**.
+"""
+        )
+    with gr.Tab("Prever"):
+        with gr.Row():
+            with gr.Column(scale=2):
+                inp_p = gr.Textbox(
+                    label="Texto da nota",
+                    placeholder="Cole aqui o texto em português...",
+                    lines=7,
+                    max_lines=25,
+                )
+                btn_p = gr.Button("Prever", variant="primary")
+                gr.Examples(examples=[[EXAMPLE_UTIL], [EXAMPLE_NAO]], inputs=[inp_p])
+            with gr.Column(scale=3):
+                out_card_p = gr.HTML(label="Resultado")
+                out_json_p = gr.JSON(label="Resposta da API")
+        btn_p.click(
+            handle_predict,
+            inputs=[inp_p],
+            outputs=[out_card_p, out_json_p],
+            api_name="predict",
+        )
+    with gr.Tab("Explicar"):
+        with gr.Row():
+            with gr.Column(scale=2):
+                inp_e = gr.Textbox(
+                    label="Texto da nota",
+                    placeholder="Cole aqui o texto em português...",
+                    lines=7,
+                    max_lines=25,
+                )
+                btn_e = gr.Button("Explicar", variant="primary")
+                gr.Examples(examples=[[EXAMPLE_UTIL], [EXAMPLE_NAO]], inputs=[inp_e])
+            with gr.Column(scale=3):
+                out_card_e = gr.HTML(label="Resultado")
+                out_hl = gr.HTML(label="Contribuição por palavra")
+                out_tbl = gr.HTML(label="Top tokens por lado")
+                out_json_e = gr.JSON(label="Resposta da API")
+        btn_e.click(
+            handle_explain,
+            inputs=[inp_e],
+            outputs=[out_card_e, out_hl, out_tbl, out_json_e],
+            api_name="explain",
+        )
+    with gr.Tab("Sobre"):
+        gr.Markdown(
+            f"""
+### Detalhes técnicos
+- **Modelo base**: `Qwen/Qwen3-Embedding-4B` (embedding, 2.560 dims, last-token pooling).
+- **Adaptação**: LoRA treinado com alvo `label_binary_strict` (recorte A do projeto).
+- **Cabeça**: `nn.Linear(2560, 1)` → sigmoid.
+- **Prompt de instrução** (idêntico ao treino):
+  > `Instruct: Represent the following Brazilian Portuguese community note for binary classification of helpfulness.`
+  > `Query: <texto>`
+- **max_length**: 256 tokens.
+- **Dispositivo atual**: `{DEVICE}`.
+- **Fold servido**: 01 (melhor fold segundo o manifesto do pipeline).
+### Método de explicação
+A aba **Explicar** usa **occlusion word-level** (leave-one-out): para cada palavra
+separada por espaço, calculamos `Δ = P(texto completo) − P(texto sem a palavra)`.
+- Δ positivo ⇒ palavra puxando para **útil** (verde).
+- Δ negativo ⇒ palavra puxando para **não-útil** (coral).
+É uma aproximação rápida do SHAP Partition usado no notebook de explicabilidade
+(~1–2 s vs ~12–15 s em GPU), com resultados visualmente comparáveis para notas curtas.
+### Limitações
+- O rótulo `helpful` mede **aceitabilidade bipartidária**, não qualidade editorial.
+  A galeria curada do notebook mostra casos onde vizinhos semânticos idênticos
+  recebem rótulos opostos por razões políticas.
+- Textos são truncados em 256 tokens.
+- Este endpoint serve um único fold. Para produção com ganho marginal de robustez,
+  subir para ensemble dos 5 folds (média de probabilidades).
+"""
+        )
+if __name__ == "__main__":
+    demo.queue(default_concurrency_limit=1).launch(
+        server_name="0.0.0.0",
+        server_port=int(os.environ.get("PORT", 7860)),
+        show_api=True,
+    )

artifacts/fold_01_adapter/README.md ADDED Viewed

	@@ -0,0 +1,206 @@

+---
+base_model: Qwen/Qwen3-Embedding-4B
+library_name: peft
+tags:
+- base_model:adapter:Qwen/Qwen3-Embedding-4B
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.1

artifacts/fold_01_adapter/adapter_config.json ADDED Viewed

	@@ -0,0 +1,49 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "Qwen3Model",
+    "parent_library": "transformers.models.qwen3.modeling_qwen3"
+  },
+  "base_model_name_or_path": "Qwen/Qwen3-Embedding-4B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "q_proj",
+    "up_proj",
+    "gate_proj",
+    "k_proj",
+    "o_proj",
+    "v_proj"
+  ],
+  "target_parameters": null,
+  "task_type": null,
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

artifacts/fold_01_adapter/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:326493c0cc026b088e80be86dc28fe61e21db919e52b602250e11abb6bac59b5
+size 132184864

artifacts/fold_01_head.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7a66a6088bce2a00b93377ecc4f8243e061eccdc4679f4920fd691b35a0523ab
+size 12365

config.py ADDED Viewed

	@@ -0,0 +1,54 @@

+"""Constantes compartilhadas pelo Space.
+Mantemos tudo em um único módulo para facilitar trocas (ex: substituir o fold
+selecionado, apontar para um tokenizer diferente em debug, etc.).
+"""
+from __future__ import annotations
+import os
+from pathlib import Path
+# ---------------------------------------------------------------------------
+# Modelo base (baixado da Hugging Face no primeiro startup do Space)
+# ---------------------------------------------------------------------------
+MODEL_NAME = "Qwen/Qwen3-Embedding-4B"
+# ---------------------------------------------------------------------------
+# Inferência — parâmetros IDÊNTICOS aos do notebook (seção 6, predict_from_text)
+# ---------------------------------------------------------------------------
+MAX_LENGTH = 256
+BATCH_SIZE = 8
+# Este prompt é parte do contrato do modelo — foi usado no fine-tuning.
+# Mudá-lo quebra o alinhamento entre o que o adapter viu e o que recebe agora.
+TASK_PROMPT = (
+    "Represent the following Brazilian Portuguese community note "
+    "for binary classification of helpfulness."
+)
+# ---------------------------------------------------------------------------
+# Paths dos artefatos (resolvidos a partir da raiz do repo do Space)
+# ---------------------------------------------------------------------------
+ROOT = Path(__file__).resolve().parent
+ARTIFACTS_DIR = ROOT / "artifacts"
+# Obrigatórios para servir predição.
+ADAPTER_PATH = ARTIFACTS_DIR / "fold_01_adapter"
+HEAD_PATH = ARTIFACTS_DIR / "fold_01_head.pt"
+# ---------------------------------------------------------------------------
+# Classificação (thresholds de apresentação — não afetam a probabilidade em si)
+# ---------------------------------------------------------------------------
+THRESHOLD_UTIL = 0.5
+# Faixas de confiança em função de p diretamente (evita imprecisão float do |p-0.5|):
+#   Alta   → p ≤ 0.10 ou p ≥ 0.90
+#   Média  → p ≤ 0.30 ou p ≥ 0.70
+#   Baixa  → 0.30 < p < 0.70
+CONFIDENCE_BOUNDS_ALTA = (0.10, 0.90)   # fora desses limites = Alta
+CONFIDENCE_BOUNDS_MEDIA = (0.30, 0.70)  # fora desses limites = Média
+# ---------------------------------------------------------------------------
+# Secrets (opcionais — definir em Settings → Secrets no Space)
+# ---------------------------------------------------------------------------
+HF_TOKEN = os.environ.get("HF_TOKEN")  # só necessário se o modelo base virar gated

inference.py ADDED Viewed

	@@ -0,0 +1,225 @@

+"""Carregamento do modelo e inferência.
+Espelha o modo 'fiel' (faithful) do FT-Solo no notebook de explicabilidade:
+base Qwen3-Embedding-4B + LoRA do fold 01 + cabeça linear treinada no projeto.
+A função `predict_from_text` do notebook está reproduzida aqui com a mesma
+tokenização, mesmo pooling, mesmo dtype e mesmo prompt — para que as
+probabilidades retornadas sejam numericamente comparáveis às OOF salvas.
+"""
+from __future__ import annotations
+import logging
+from functools import lru_cache
+from typing import Iterable
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from peft import PeftModel
+from transformers import AutoModel, AutoTokenizer
+from config import (
+    ADAPTER_PATH,
+    BATCH_SIZE,
+    HEAD_PATH,
+    HF_TOKEN,
+    MAX_LENGTH,
+    MODEL_NAME,
+    TASK_PROMPT,
+)
+logger = logging.getLogger(__name__)
+# ---------------------------------------------------------------------------
+# Dispositivo e dtype — lógica direta do notebook
+# ---------------------------------------------------------------------------
+DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
+if DEVICE == "cuda":
+    AMP_DTYPE = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
+else:
+    # Em CPU usamos float16 nos pesos para caber em 16 GB de RAM (fp32 daria ~16 GB
+    # só nos pesos, sem sobrar para ativações). As operações em CPU rodam em fp32
+    # via upcast automático; o dtype aqui só controla o armazenamento.
+    # O autocast fica desligado (enabled=False abaixo) — fp16 ativo em CPU é instável.
+    AMP_DTYPE = torch.float16
+# ---------------------------------------------------------------------------
+# Utilitários — idênticos ao notebook (seção 6)
+# ---------------------------------------------------------------------------
+def build_instruction_text(text: str) -> str:
+    """Formata o texto no molde esperado pelo fine-tuning."""
+    if not isinstance(text, str):
+        text = ""
+    return f"Instruct: {TASK_PROMPT}\nQuery: {text}"
+def last_token_pool(
+    last_hidden_states: torch.Tensor, attention_mask: torch.Tensor
+) -> torch.Tensor:
+    """Extrai o embedding do último token real.
+    Com o tokenizer em padding_side='left', o último índice (-1) é sempre um
+    token real para todos os elementos do batch, então podemos usar o atalho.
+    Mantemos a branch de right-padding por paranoia.
+    """
+    left_padding = bool(
+        (attention_mask[:, -1].sum() == attention_mask.shape[0]).item()
+    )
+    if left_padding:
+        return last_hidden_states[:, -1]
+    sequence_lengths = attention_mask.sum(dim=1) - 1
+    return last_hidden_states[
+        torch.arange(last_hidden_states.shape[0], device=last_hidden_states.device),
+        sequence_lengths,
+    ]
+# ---------------------------------------------------------------------------
+# Carregamento preguiçoso e cacheado
+# ---------------------------------------------------------------------------
+@lru_cache(maxsize=1)
+def load_model():
+    """Retorna (tokenizer, encoder, head). Carregado uma única vez por processo."""
+    if not ADAPTER_PATH.exists():
+        raise FileNotFoundError(
+            f"Adapter LoRA não encontrado em {ADAPTER_PATH}. "
+            "Suba a pasta fold_01_adapter/ em artifacts/ antes de iniciar o Space."
+        )
+    if not HEAD_PATH.exists():
+        raise FileNotFoundError(
+            f"Cabeça classificadora não encontrada em {HEAD_PATH}. "
+            "Suba o fold_01_head.pt em artifacts/ antes de iniciar o Space."
+        )
+    logger.info("Carregando tokenizer de %s", MODEL_NAME)
+    tokenizer = AutoTokenizer.from_pretrained(
+        MODEL_NAME, padding_side="left", token=HF_TOKEN
+    )
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+    logger.info(
+        "Carregando encoder base %s (dtype=%s, device=%s)",
+        MODEL_NAME,
+        AMP_DTYPE,
+        DEVICE,
+    )
+    base_encoder = AutoModel.from_pretrained(
+        MODEL_NAME,
+        low_cpu_mem_usage=True,
+        torch_dtype=AMP_DTYPE,
+        token=HF_TOKEN,
+    ).to(DEVICE)
+    logger.info("Anexando adapter LoRA de %s", ADAPTER_PATH)
+    encoder = PeftModel.from_pretrained(
+        base_encoder, str(ADAPTER_PATH), is_trainable=False
+    ).to(DEVICE)
+    encoder.eval()
+    logger.info("Carregando cabeça linear de %s", HEAD_PATH)
+    head_payload = torch.load(HEAD_PATH, map_location="cpu")
+    # Suporta tanto {"state_dict": {...}} quanto o state_dict direto.
+    head_state = (
+        head_payload["state_dict"]
+        if isinstance(head_payload, dict) and "state_dict" in head_payload
+        else head_payload
+    )
+    in_feat = int(head_state["weight"].shape[1])
+    head = nn.Linear(in_feat, 1)
+    head.load_state_dict(head_state)
+    head = head.to(DEVICE).eval()
+    logger.info("Modelo pronto. In_features da cabeça: %d", in_feat)
+    return tokenizer, encoder, head
+def warmup() -> None:
+    """Força o carregamento agora. Útil para que o primeiro request não pague cold-start."""
+    load_model()
+# ---------------------------------------------------------------------------
+# Predição — lógica do predict_from_text do notebook, preservada
+# ---------------------------------------------------------------------------
+@torch.no_grad()
+def predict_batch(
+    texts: Iterable[str], batch_size: int = BATCH_SIZE
+) -> np.ndarray:
+    """Probabilidade de 'útil' para cada texto. Retorna np.array de shape (N,)."""
+    tokenizer, encoder, head = load_model()
+    if isinstance(texts, str):
+        texts = [texts]
+    texts = list(texts)
+    if not texts:
+        return np.zeros(0, dtype=np.float64)
+    preds = []
+    autocast_device = "cuda" if DEVICE == "cuda" else "cpu"
+    for i in range(0, len(texts), batch_size):
+        batch = texts[i : i + batch_size]
+        instr = [build_instruction_text(t) for t in batch]
+        toks = tokenizer(
+            instr,
+            padding=True,
+            truncation=True,
+            max_length=MAX_LENGTH,
+            return_tensors="pt",
+        ).to(DEVICE)
+        with torch.inference_mode(), torch.autocast(
+            device_type=autocast_device,
+            dtype=AMP_DTYPE,
+            enabled=(DEVICE == "cuda"),
+        ):
+            out = encoder(**toks)
+            emb = last_token_pool(out.last_hidden_state, toks["attention_mask"])
+            emb = F.normalize(emb, p=2, dim=1)
+            logits = head(emb).squeeze(-1)
+            p = torch.sigmoid(logits).float().cpu().numpy()
+        preds.append(p)
+    # Clip nos mesmos limites usados no notebook (evita proba exatamente 0 ou 1).
+    return np.clip(np.concatenate(preds).astype(np.float64), 1e-6, 1 - 1e-6)
+def predict_one(text: str) -> float:
+    """Atalho: retorna a probabilidade escalar para um único texto."""
+    return float(predict_batch([text])[0])
+# ---------------------------------------------------------------------------
+# Explicação — occlusion word-level (leave-one-out)
+# ---------------------------------------------------------------------------
+def explain_occlusion(text: str, batch_size: int = BATCH_SIZE) -> dict:
+    """Importância por palavra via deixar-uma-fora.
+    Para cada palavra separada por espaço: calcula Δ = P(texto) − P(texto sem a palavra).
+        Δ > 0 → a palavra estava puxando para 'útil'
+        Δ < 0 → a palavra estava puxando para 'não-útil'
+    Custo: (N + 1) forward passes — ~metade do SHAP Partition do notebook,
+    resultado visual comparável para notas curtas.
+    """
+    words = text.split()
+    if not words:
+        p = predict_one(text)
+        return {"proba_full": p, "tokens": [], "contributions": []}
+    variants = [" ".join(words[:i] + words[i + 1 :]) for i in range(len(words))]
+    all_texts = [text] + variants
+    probs = predict_batch(all_texts, batch_size=batch_size)
+    p_full = float(probs[0])
+    contribs = (p_full - probs[1:]).tolist()
+    return {
+        "proba_full": p_full,
+        "tokens": words,
+        "contributions": contribs,
+    }

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+# Gradio é gerenciado pelo Space SDK (campo sdk_version no header do README.md).
+# Torch é provido pelo runtime do HF Spaces conforme o hardware (CPU / T4 / A10G / ZeroGPU).
+# Por isso nenhum dos dois aparece aqui.
+transformers>=4.51.0
+peft>=0.15.0
+accelerate>=0.34.0
+safetensors>=0.4.3