Spaces:

Tefifi
/

aim-dashboard

Sleeping

App Files Files Community

Tefifi commited on Mar 11

Commit

7adf02c

0 Parent(s):

deploy inicial

Browse files

Files changed (19) hide show

.gitignore +8 -0
Dockerfile +54 -0
Modelo_Pymes.pkl +0 -0
README.md +141 -0
app.py +126 -0
assets/style.css +431 -0
callbacks/__init__.py +0 -0
callbacks/navegacion.py +421 -0
data/__init__.py +0 -0
data/definitions.py +272 -0
docker-compose.yml +30 -0
layouts/__init__.py +0 -0
layouts/pages.py +268 -0
logic/__init__.py +1 -0
logic/extractor.py +267 -0
logic/modelo.py +348 -0
logic/venn.py +117 -0
render.yaml +9 -0
requirements.txt +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,8 @@

+venv/
+__pycache__/
+*.pyc
+.env
+*.egg-info/
+dist/
+build/
+.DS_Store

Dockerfile ADDED Viewed

	@@ -0,0 +1,54 @@

+# ── AIM Dashboard — Dockerfile ──────────────────────────────────────────────
+# Imagen base: Python 3.11 slim para menor tamaño
+FROM python:3.11-slim
+# Metadatos
+LABEL maintainer="tu-equipo@empresa.cl"
+LABEL description="AIM Dashboard — Perfil de ciberseguridad para PyMEs"
+# Variables de entorno
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PORT=8050
+# Directorio de trabajo dentro del contenedor
+WORKDIR /app
+# Instalar dependencias del sistema (necesarias para torch y lxml)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc \
+    g++ \
+    libxml2-dev \
+    libxslt-dev \
+    && rm -rf /var/lib/apt/lists/*
+# Copiar solo el requirements primero (aprovecha cache de Docker)
+COPY requirements.txt .
+# Instalar dependencias Python
+# --no-cache-dir reduce el tamaño de imagen
+RUN pip install --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt && \
+    pip install --no-cache-dir gunicorn
+# Copiar el código del proyecto
+COPY . .
+# Descargar recursos NLTK necesarios durante el build
+RUN python -c "import nltk; nltk.download('punkt', quiet=True); nltk.download('punkt_tab', quiet=True)"
+# Puerto que expone la app
+EXPOSE 8050
+# Comando de inicio con Gunicorn
+# - 2 workers (ajustar según CPU disponibles: 2 * num_cpus + 1)
+# - timeout 300s porque el análisis NLP puede tardar varios minutos
+# - El objeto WSGI de Dash se llama "server" dentro de app.py
+CMD ["gunicorn", \
+     "--workers", "2", \
+     "--timeout", "300", \
+     "--bind", "0.0.0.0:8050", \
+     "--log-level", "info", \
+     "--access-logfile", "-", \
+     "--error-logfile", "-", \
+     "app:server"]

Modelo_Pymes.pkl ADDED Viewed

Binary file (1.81 kB). View file

README.md ADDED Viewed

	@@ -0,0 +1,141 @@

+# AIM Dashboard
+**Herramienta de clasificación de perfil de ciberseguridad para PyMEs** basada en la
+Tríada AIM (Awareness, Infrastructure, Management).
+---
+## Cómo verlo en localhost
+### Opción A — Python directo (más rápido para probar)
+**1. Descomprime el proyecto y entra a la carpeta**
+```bash
+cd aim_dashboard
+```
+**2. Crea un entorno virtual**
+```bash
+python -m venv venv
+# macOS / Linux:
+source venv/bin/activate
+# Windows:
+venv\Scripts\activate
+```
+**3. Instala las dependencias**
+```bash
+pip install -r requirements.txt
+```
+La primera vez tarda varios minutos porque descarga modelos de ~1.5 GB (PyTorch, BART, DeBERTa, MPNet).
+**4. Verifica que el modelo esté en la carpeta raíz**
+```
+aim_dashboard/
+├── app.py
+├── Modelo_Pymes.pkl   ← debe estar aquí
+└── ...
+```
+**5. Ejecuta**
+```bash
+python app.py
+```
+**6. Abre en el navegador:**
+```
+http://localhost:8050
+```
+---
+### Opción B — Docker (recomendado para producción)
+Requisitos: Docker Desktop instalado.
+```bash
+cd aim_dashboard
+docker-compose up --build
+```
+La primera vez tarda ~5-10 minutos. Luego abre `http://localhost:8050`.
+Para detener: `docker-compose down`
+---
+### Opción C — Gunicorn (producción sin Docker)
+```bash
+pip install gunicorn
+gunicorn --workers 2 --timeout 300 --bind 0.0.0.0:8050 app:server
+```
+---
+## Estructura del proyecto
+```
+aim_dashboard/
+│
+├── app.py                   <- Punto de entrada principal
+├── Dockerfile               <- Imagen Docker
+├── docker-compose.yml       <- Orquestación
+├── requirements.txt
+├── Modelo_Pymes.pkl         <- Modelo K-Means (NO subir a repositorios públicos)
+│
+├── assets/
+│   └── style.css            <- Estilos (Dash los carga automáticamente)
+│
+├── data/
+│   └── definitions.py       <- Constantes y textos estáticos
+│
+├── layouts/
+│   └── pages.py             <- Pantallas de home y perfiles
+│
+├── callbacks/
+│   └── navegacion.py        <- Análisis en background, barra de progreso, routing
+│
+└── logic/
+    ├── extractor.py         <- Scraping y clasificación semántica
+    ├── modelo.py            <- Vectorización NLP + predicción K-Means
+    └── venn.py              <- Diagrama de Venn
+```
+---
+## Flujo de la aplicación
+```
+URL ingresada
+    -> ExtractorMVD: scraping de páginas relevantes           (Paso 1)
+    -> clasificar_inteligente: clasifica en MISION/VISION/DESCRIPCION
+    -> traducción ES->EN                                       (Paso 2)
+    -> vectorización NLP: MPNet + DeBERTa + BART zero-shot    (Paso 3)
+    -> KMeans.predict: cluster 0-4 -> Perfil 1-5
+    -> Layout del perfil con fortalezas, debilidades y Venn
+```
+---
+## Problemas comunes
+| Problema | Solución |
+|---|---|
+| `FileNotFoundError: Modelo_Pymes.pkl` | Pon el `.pkl` junto a `app.py` |
+| La app tarda en arrancar | Normal: los modelos NLP se cargan al primer análisis |
+| Error de red al analizar sitio | Verifica que la URL incluya `https://` |
+| `Port 8050 already in use` | Cambia el puerto en `app.py` o mata el proceso: `lsof -ti:8050 \| xargs kill` |
+| Docker con poca memoria | Asigna al menos 4 GB de RAM en Docker Desktop > Settings > Resources |
+---
+## Notas para producción
+- El análisis NLP corre en un **thread separado** para no bloquear la UI. Con múltiples
+  usuarios simultáneos considera usar **Celery + Redis** para una cola de trabajos.
+- Los modelos de HuggingFace se cachean en `~/.cache/huggingface/`. En Docker se
+  descargan al construir la imagen.
+- Para exponer con dominio real, descomenta el servicio `nginx` en `docker-compose.yml`
+  y configura tu dominio.

app.py ADDED Viewed

	@@ -0,0 +1,126 @@

+"""
+AIM Dashboard — Punto de entrada principal.
+Ejecutar con: python app.py
+"""
+import logging
+import dash
+import dash_bootstrap_components as dbc
+from dash import dcc, html
+from layouts.pages import layout_home, all_layouts
+from callbacks.navegacion import registrar_callbacks
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+app = dash.Dash(
+    __name__,
+    suppress_callback_exceptions=True,
+    external_stylesheets=[dbc.themes.BOOTSTRAP],
+    meta_tags=[{"name": "viewport", "content": "width=device-width, initial-scale=1"}],
+)
+app.title = "AIM Dashboard"
+# ── Panel de progreso en el layout raíz ──────────────────────────────────────
+# Debe estar aquí (no en page-content) para que los callbacks siempre
+# puedan escribir en él sin importar qué página está cargada.
+# Se muestra como barra fija en la parte inferior de la pantalla.
+panel_progreso = html.Div(
+    id="panel-progreso",
+    style={"display": "none"},
+    children=[
+        html.Div(
+            style={
+                "position": "fixed",
+                "bottom": "0", "left": "0", "right": "0",
+                "zIndex": "1000",
+                "background": "#ffffff",
+                "borderTop": "3px solid #1c3160",
+                "boxShadow": "0 -4px 24px rgba(28,49,96,0.15)",
+                "padding": "18px 36px 20px",
+            },
+            children=[
+                html.Div(
+                    style={"maxWidth": "860px", "margin": "0 auto"},
+                    children=[
+                        html.Div(
+                            style={"display": "flex", "alignItems": "center",
+                                   "justifyContent": "space-between", "marginBottom": "10px"},
+                            children=[
+                                html.Span("Analizando perfil AIM", style={
+                                    "fontSize": "0.78rem", "fontWeight": "700",
+                                    "letterSpacing": "1.2px", "textTransform": "uppercase",
+                                    "color": "#1c3160",
+                                }),
+                                html.Span(id="progreso-pct", style={
+                                    "fontSize": "1rem", "fontWeight": "700",
+                                    "color": "#b87a2a",
+                                }),
+                            ],
+                        ),
+                        html.Div(
+                            style={
+                                "background": "#e8ecf2",
+                                "borderRadius": "999px",
+                                "height": "14px",
+                                "marginBottom": "14px",
+                                "overflow": "hidden",
+                                "border": "2px solid #b0bacb",
+                                "boxShadow": "inset 0 1px 3px rgba(0,0,0,0.1)",
+                            },
+                            children=[
+                                html.Div(
+                                    id="progreso-bar",
+                                    style={
+                                        "height": "100%", "borderRadius": "999px",
+                                        "background": "linear-gradient(90deg, #b87a2a, #d4a843)",
+                                        "width": "0%",
+                                        "transition": "width 0.8s ease",
+                                        "boxShadow": "0 1px 4px rgba(184,122,42,0.4)",
+                                    },
+                                )
+                            ],
+                        ),
+                        html.Div(id="progreso-container"),
+                        html.Div(id="error-msg", children=""),
+                    ],
+                )
+            ],
+        )
+    ],
+)
+# ── Layout raíz ───────────────────────────────────────────────────────────────
+app.layout = html.Div([
+    dcc.Location(id="url", refresh=False),
+    dcc.Store(id="store_name", storage_type="session"),
+    dcc.Store(id="store_link", storage_type="session"),
+    dcc.Interval(id="interval_progress", interval=600, n_intervals=0, disabled=True),
+    html.Div(id="page-content"),
+    panel_progreso,
+])
+# ── Routing ───────────────────────────────────────────────────────────────────
+@app.callback(
+    dash.Output("page-content", "children"),
+    dash.Input("url", "pathname"),
+)
+def mostrar_pagina(pathname: str):
+    rutas = {
+        "/profile_1": all_layouts[0],
+        "/profile_2": all_layouts[1],
+        "/profile_3": all_layouts[2],
+        "/profile_4": all_layouts[3],
+        "/profile_5": all_layouts[4],
+    }
+    return rutas.get(pathname, layout_home)
+registrar_callbacks(app)
+server = app.server  # para Gunicorn: gunicorn app:server
+if __name__ == "__main__":
+    app.run(debug=False, host="0.0.0.0", port=8050)

assets/style.css ADDED Viewed

	@@ -0,0 +1,431 @@

+/* ============================================================
+   AIM Dashboard — Rediseño accesible
+   Principios: alto contraste, tipografía generosa, espaciado amplio,
+   paleta sobria institucional (azul profundo + blanco cálido + ámbar)
+   ============================================================ */
+/* ---------- Fuente legible y serif elegante para encabezados ---------- */
+@import url('https://fonts.googleapis.com/css2?family=Source+Serif+4:wght@400;600;700&family=Source+Sans+3:wght@400;500;600;700&display=swap');
+*, *::before, *::after { box-sizing: border-box; }
+body {
+  margin: 0;
+  font-family: 'Source Sans 3', 'Segoe UI', Georgia, sans-serif;
+  background-color: #f4f1eb;   /* blanco cálido, no agresivo */
+  color: #1c2233;              /* azul muy oscuro, no negro puro */
+  min-height: 100vh;
+  font-size: 17px;             /* base más grande que el estándar */
+  line-height: 1.7;
+}
+/* ---------- Variables de paleta institucional ---------- */
+:root {
+  --bg:           #f4f1eb;      /* fondo principal: crema cálida */
+  --bg-card:      #ffffff;      /* tarjetas blancas */
+  --bg-card-alt:  #eef2f7;      /* tarjetas secundarias: azul muy pálido */
+  --border:       #c8d0df;      /* bordes suaves */
+  --border-dark:  #8a96aa;      /* bordes énfasis */
+  --navy:         #1c3160;      /* azul marino profundo — color primario */
+  --navy-mid:     #2d4f8a;      /* azul medio para hover */
+  --navy-light:   #dde6f5;      /* azul muy claro para fondos */
+  --amber:        #b87a2a;      /* ámbar cálido — acento positivo */
+  --amber-light:  #fdf3e3;      /* fondo ámbar suave */
+  --success:      #1e6b45;      /* verde bosque oscuro */
+  --success-bg:   #e6f4ed;
+  --success-border: #7dc4a0;
+  --danger:       #8b1c1c;      /* rojo ladrillo oscuro */
+  --danger-bg:    #fdeaea;
+  --danger-border: #d48585;
+  --text-primary: #1c2233;
+  --text-secondary:#3d4a60;
+  --text-muted:   #6b7a96;
+  --text-light:   #9aa3b5;
+  --radius:       10px;
+  --radius-lg:    14px;
+  --shadow-sm:    0 1px 4px rgba(28,49,96,0.08);
+  --shadow:       0 3px 16px rgba(28,49,96,0.12);
+  --shadow-lg:    0 8px 32px rgba(28,49,96,0.16);
+}
+/* ---------- Contenedor principal ---------- */
+.aim-container {
+  max-width: 1340px;
+  margin: 0 auto;
+  padding: 28px 36px;
+}
+/* ---------- Header ---------- */
+.aim-header {
+  text-align: center;
+  padding: 48px 0 40px;
+  border-bottom: 2px solid var(--border);
+  margin-bottom: 36px;
+}
+.aim-header h1 {
+  font-family: 'Source Serif 4', Georgia, serif;
+  font-size: 2.6rem;
+  font-weight: 700;
+  color: var(--navy);
+  letter-spacing: -0.3px;
+  margin: 0 0 10px;
+}
+.aim-header .subtitle {
+  color: var(--text-secondary);
+  font-size: 1.15rem;
+  margin: 0;
+  font-weight: 400;
+}
+/* ---------- Layout de dos columnas ---------- */
+.aim-two-col {
+  display: flex;
+  gap: 28px;
+  align-items: flex-start;
+}
+.aim-two-col .col-left  { flex: 1; min-width: 0; }
+.aim-two-col .col-right { flex: 1; min-width: 0; text-align: center; }
+@media (max-width: 960px) {
+  .aim-two-col { flex-direction: column; }
+  .aim-container { padding: 20px 18px; }
+}
+/* ---------- Cards ---------- */
+.aim-card {
+  background: var(--bg-card);
+  border: 1.5px solid var(--border);
+  border-radius: var(--radius-lg);
+  padding: 28px 32px;
+  margin-bottom: 20px;
+  box-shadow: var(--shadow-sm);
+}
+.aim-card-label {
+  font-size: 0.78rem;
+  font-weight: 700;
+  letter-spacing: 1.4px;
+  text-transform: uppercase;
+  color: var(--navy);
+  margin-bottom: 10px;
+  display: flex;
+  align-items: center;
+  gap: 8px;
+}
+.aim-card-label::before {
+  content: '';
+  display: inline-block;
+  width: 4px;
+  height: 16px;
+  background: var(--amber);
+  border-radius: 2px;
+}
+/* ---------- Inputs ---------- */
+.aim-input {
+  width: 100%;
+  padding: 14px 18px;
+  background: #fafbfd;
+  border: 2px solid var(--border);
+  border-radius: var(--radius);
+  color: var(--text-primary);
+  font-size: 1.05rem;
+  font-family: inherit;
+  transition: border-color 0.2s, box-shadow 0.2s;
+  margin-bottom: 18px;
+}
+.aim-input:focus {
+  outline: none;
+  border-color: var(--navy-mid);
+  box-shadow: 0 0 0 3px rgba(45,79,138,0.15);
+  background: #fff;
+}
+.aim-input::placeholder { color: var(--text-light); }
+/* Labels de formulario */
+label {
+  display: block;
+  font-weight: 600;
+  font-size: 1rem;
+  color: var(--text-secondary);
+  margin-bottom: 6px;
+}
+/* ---------- Botón principal ---------- */
+.aim-btn-primary {
+  width: 100%;
+  padding: 16px 28px;
+  background: var(--navy);
+  color: #ffffff;
+  border: none;
+  border-radius: var(--radius);
+  font-size: 1.08rem;
+  font-weight: 700;
+  font-family: inherit;
+  letter-spacing: 0.3px;
+  cursor: pointer;
+  transition: background 0.2s, box-shadow 0.2s, transform 0.1s;
+  margin-top: 8px;
+}
+.aim-btn-primary:hover {
+  background: var(--navy-mid);
+  box-shadow: 0 4px 18px rgba(28,49,96,0.28);
+  transform: translateY(-1px);
+}
+.aim-btn-primary:active { transform: translateY(0); }
+/* Botón secundario (volver) */
+.aim-btn-secondary {
+  padding: 10px 22px;
+  background: var(--bg-card);
+  color: var(--navy);
+  border: 2px solid var(--navy);
+  border-radius: var(--radius);
+  font-size: 0.97rem;
+  font-weight: 700;
+  font-family: inherit;
+  cursor: pointer;
+  transition: background 0.2s;
+}
+.aim-btn-secondary:hover {
+  background: var(--navy-light);
+}
+/* ---------- Tags de dominio ---------- */
+.domain-tag {
+  display: inline-block;
+  padding: 5px 13px;
+  border-radius: 999px;
+  font-size: 0.85rem;
+  font-weight: 600;
+  margin: 3px 3px;
+}
+.domain-tag.strength {
+  background: var(--success-bg);
+  color: var(--success);
+  border: 1.5px solid var(--success-border);
+}
+.domain-tag.weakness {
+  background: var(--danger-bg);
+  color: var(--danger);
+  border: 1.5px solid var(--danger-border);
+}
+/* ---------- Texto de definición ---------- */
+.aim-definition {
+  color: var(--text-secondary);
+  font-size: 1rem;
+  line-height: 1.8;
+}
+.aim-definition p { margin: 0 0 12px; }
+.aim-definition strong { color: var(--navy); }
+/* ---------- Imagen Venn ---------- */
+.venn-img {
+  width: 100%;
+  max-width: 100%;
+  height: auto;
+  border-radius: var(--radius);
+  box-shadow: var(--shadow);
+  border: 1.5px solid var(--border);
+  display: block;
+}
+/* ---------- Alerta de error ---------- */
+.aim-error {
+  background: var(--danger-bg);
+  border: 1.5px solid var(--danger-border);
+  border-radius: var(--radius);
+  padding: 14px 18px;
+  color: var(--danger);
+  font-size: 0.95rem;
+  margin-top: 12px;
+  font-weight: 500;
+}
+/* ---------- Navegación flotante (botón volver) ---------- */
+.aim-nav-home {
+  position: fixed;
+  top: 18px;
+  right: 24px;
+  z-index: 999;
+}
+/* ---------- Panel de progreso ---------- */
+#panel-progreso .aim-card {
+  border-left: 4px solid var(--navy);
+  background: var(--bg-card);
+}
+/* ---------- Acordeón Details/Summary ---------- */
+details {
+  border-radius: var(--radius);
+  margin-bottom: 10px;
+  overflow: hidden;
+  border: 1.5px solid var(--border);
+  transition: box-shadow 0.2s;
+}
+details:hover {
+  box-shadow: var(--shadow-sm);
+}
+details[open] {
+  box-shadow: var(--shadow);
+}
+details summary {
+  outline: none;
+  list-style: none;
+  padding: 14px 18px;
+  cursor: pointer;
+  font-weight: 600;
+  font-size: 0.98rem;
+  user-select: none;
+  transition: background 0.15s;
+}
+details summary::-webkit-details-marker { display: none; }
+details summary::marker { display: none; }
+details summary:hover {
+  filter: brightness(0.97);
+}
+details[open] > div {
+  animation: fadeSlide 0.22s ease;
+}
+@keyframes fadeSlide {
+  from { opacity: 0; transform: translateY(-5px); }
+  to   { opacity: 1; transform: translateY(0); }
+}
+/* ---------- Cabecera de columna de dominios ---------- */
+.aim-domain-col-header {
+  font-size: 0.78rem;
+  font-weight: 700;
+  letter-spacing: 1.3px;
+  text-transform: uppercase;
+  margin-bottom: 14px;
+  padding-bottom: 12px;
+  border-bottom: 2px solid;
+  display: flex;
+  align-items: center;
+  gap: 9px;
+}
+/* ---------- Grid 3 columnas responsive ---------- */
+@media (max-width: 1100px) {
+  .aim-profile-grid { grid-template-columns: 1fr 1fr !important; }
+}
+@media (max-width: 680px) {
+  .aim-profile-grid { grid-template-columns: 1fr !important; }
+  .aim-header h1 { font-size: 1.9rem; }
+}
+/* ---------- Scrollbar suavizado ---------- */
+::-webkit-scrollbar { width: 8px; }
+::-webkit-scrollbar-track { background: var(--bg); }
+::-webkit-scrollbar-thumb {
+  background: var(--border-dark);
+  border-radius: 4px;
+}
+::-webkit-scrollbar-thumb:hover { background: var(--navy-mid); }
+/* ---------- Selección de texto ---------- */
+::selection {
+  background: var(--navy-light);
+  color: var(--navy);
+}
+/* ---------- Zoom modal del diagrama Venn ---------- */
+.venn-clickable {
+  transition: transform 0.2s, box-shadow 0.2s;
+}
+.venn-clickable:hover {
+  transform: scale(1.02);
+  box-shadow: 0 6px 24px rgba(28,49,96,0.2);
+}
+.venn-modal-overlay {
+  position: fixed;
+  inset: 0;
+  background: rgba(10, 15, 30, 0.75);
+  z-index: 9999;
+  display: flex !important;
+  align-items: center;
+  justify-content: center;
+  padding: 24px;
+  backdrop-filter: blur(3px);
+  animation: fadeIn 0.2s ease;
+}
+.venn-modal-overlay[style*="display: none"] {
+  display: none !important;
+}
+@keyframes fadeIn {
+  from { opacity: 0; }
+  to   { opacity: 1; }
+}
+.venn-modal-box {
+  background: #fff;
+  border-radius: 14px;
+  padding: 24px;
+  max-width: 820px;
+  width: 100%;
+  box-shadow: 0 20px 60px rgba(0,0,0,0.4);
+  position: relative;
+  animation: scaleIn 0.2s ease;
+}
+@keyframes scaleIn {
+  from { transform: scale(0.92); opacity: 0; }
+  to   { transform: scale(1);    opacity: 1; }
+}
+.venn-modal-img {
+  width: 100%;
+  height: auto;
+  border-radius: 8px;
+  display: block;
+}
+.venn-modal-close {
+  position: absolute;
+  top: 12px;
+  right: 12px;
+  background: var(--navy);
+  color: #fff;
+  border: none;
+  border-radius: 8px;
+  padding: 8px 16px;
+  font-size: 0.9rem;
+  font-weight: 700;
+  font-family: inherit;
+  cursor: pointer;
+  transition: background 0.2s;
+}
+.venn-modal-close:hover {
+  background: var(--navy-mid);
+}

callbacks/__init__.py ADDED Viewed

File without changes

callbacks/navegacion.py ADDED Viewed

	@@ -0,0 +1,421 @@

+"""
+Callbacks del dashboard AIM.
+- error-msg, progreso-container, progreso-bar, progreso-pct viven en pages.py
+  dentro del panel inline bajo el botón.
+- El panel se muestra/oculta via Output("panel-progreso", "style").
+"""
+import logging
+import threading
+import time
+import dash
+from dash import ctx, html
+from dash.dependencies import Input, Output, State, ALL
+from logic.extractor import ExtractorMVD
+from logic.modelo import obtener_perfil
+logger = logging.getLogger(__name__)
+# ── Estado compartido ─────────────────────────────────────────────────────────
+_estado: dict = {"paso": 0, "pct": 0, "perfil": None, "error": None}
+_lock = threading.Lock()
+# Porcentaje base al inicio de cada paso
+_PCT_BASE  = {1: 0,  2: 30, 3: 50, 4: 100}
+# Porcentaje máximo al que puede llegar solo con el timer (sin avanzar de paso)
+_PCT_TECHO = {1: 28, 2: 48, 3: 95,  4: 100}
+PASOS_LABELS  = ["Extrayendo", "Procesando", "Clasificando", "Listo"]
+PASOS_DETALLE = [
+    "Extrayendo contenido del sitio web...",
+    "Traduciendo y procesando texto...",
+    "Clasificando perfil con el modelo NLP...",
+    "¡Análisis completado!",
+]
+_PANEL_VISIBLE = {"display": "block"}
+_PANEL_OCULTO  = {"display": "none"}
+# ── Thread principal de análisis ──────────────────────────────────────────────
+def _correr_analisis(link: str, nombre: str) -> None:
+    global _estado
+    try:
+        # Paso 1 — Extracción web
+        with _lock:
+            _estado = {"paso": 1, "pct": 0, "perfil": None, "error": None}
+        extractor = ExtractorMVD(url=link, nombre=nombre)
+        extractor.navegar_y_extraer()
+        texto = extractor.clasificar_inteligente()
+        if not texto:
+            with _lock:
+                _estado["error"] = {
+                    "tipo": "extraccion",
+                    "titulo": "No se encontró contenido relevante",
+                    "detalle": (
+                        "El sitio web no contiene texto relacionado con misión, visión "
+                        "o descripción organizacional que el modelo pueda analizar."
+                    ),
+                    "sugerencia": "Prueba con la URL de la página 'Quiénes somos' o 'Acerca de' de la empresa.",
+                }
+                _estado["paso"] = -1
+            return
+        # Paso 2 — Traducción
+        with _lock:
+            _estado["paso"] = 2
+            _estado["pct"]  = _PCT_BASE[2]
+        # Paso 3 — Clasificación NLP + K-Means
+        with _lock:
+            _estado["paso"] = 3
+            _estado["pct"]  = _PCT_BASE[3]
+        perfil = obtener_perfil(texto) + 1  # 0-4 → 1-5
+        # Paso 4 — Listo
+        with _lock:
+            _estado["paso"] = 4
+            _estado["pct"]  = 100
+            _estado["perfil"] = perfil
+        logger.info("Perfil: %d | '%s'", perfil, link)
+    except FileNotFoundError as e:
+        logger.error("Modelo no encontrado: %s", e)
+        with _lock:
+            _estado["error"] = {
+                "tipo": "modelo",
+                "titulo": "Modelo no encontrado",
+                "detalle": "El archivo Modelo_Pymes.pkl no está en la carpeta del proyecto.",
+                "sugerencia": "Verifica que Modelo_Pymes.pkl esté en la carpeta raíz junto a app.py.",
+            }
+            _estado["paso"]  = -1
+    except Exception as e:
+        logger.exception("Error procesando '%s'", link)
+        err_str = str(e)
+        # Clasificar el error según su contenido
+        if "connection" in err_str.lower() or "timeout" in err_str.lower() or "urlopen" in err_str.lower():
+            tipo    = "red"
+            titulo  = "Error de conexión"
+            detalle = f"No se pudo conectar al sitio: {err_str}"
+            sugiere = "Verifica que la URL sea correcta y que el sitio esté accesible desde tu red."
+        elif "ssl" in err_str.lower() or "certificate" in err_str.lower():
+            tipo    = "ssl"
+            titulo  = "Error de certificado SSL"
+            detalle = "El sitio tiene un certificado de seguridad inválido o expirado."
+            sugiere = "Intenta con la versión http:// en lugar de https://, o prueba con otro sitio."
+        elif "403" in err_str or "401" in err_str or "forbidden" in err_str.lower():
+            tipo    = "acceso"
+            titulo  = "Acceso denegado por el sitio"
+            detalle = "El servidor rechazó la solicitud (error 403/401)."
+            sugiere = "Este sitio bloquea el acceso automatizado. Prueba con otro sitio o con la URL de una subpágina."
+        elif "404" in err_str or "not found" in err_str.lower():
+            tipo    = "url"
+            titulo  = "Página no encontrada"
+            detalle = f"La URL ingresada no existe o fue movida: {err_str}"
+            sugiere = "Verifica que la URL sea correcta e incluya https://"
+        elif "runtime" in err_str.lower() or "tensor" in err_str.lower():
+            tipo    = "modelo"
+            titulo  = "Error en el modelo NLP"
+            detalle = "Ocurrió un error interno al procesar el texto con los modelos de lenguaje."
+            sugiere = "Intenta de nuevo. Si el error persiste, el texto extraído puede ser demasiado corto o inusual."
+        else:
+            tipo    = "desconocido"
+            titulo  = "Error inesperado"
+            detalle = err_str
+            sugiere = "Intenta de nuevo o prueba con un sitio web diferente."
+        with _lock:
+            _estado["error"] = {
+                "tipo": tipo, "titulo": titulo,
+                "detalle": detalle, "sugerencia": sugiere,
+            }
+            _estado["paso"]  = -1
+# ── Thread de animación de porcentaje ─────────────────────────────────────────
+def _animar_porcentaje() -> None:
+    """Incrementa el % gradualmente dentro del techo de cada paso."""
+    global _estado
+    while True:
+        time.sleep(0.8)
+        with _lock:
+            paso = _estado["paso"]
+            if paso <= 0 or paso == 4:
+                break
+            techo = _PCT_TECHO.get(paso, 95)
+            pct   = _estado["pct"]
+            if pct < techo:
+                # Avance más rápido al principio, más lento cerca del techo
+                incremento = max(1, int((techo - pct) * 0.07))
+                _estado["pct"] = min(pct + incremento, techo)
+# ── Componente visual de pasos ────────────────────────────────────────────────
+def _pasos_html(paso: int) -> html.Div:
+    items = []
+    for i, label in enumerate(PASOS_LABELS, start=1):
+        if paso == -1:
+            estado = "error" if i == 1 else "pendiente"
+        elif i < paso:
+            estado = "completado"
+        elif i == paso:
+            estado = "activo"
+        else:
+            estado = "pendiente"
+        color = {
+            "completado": "#1e6b45",   # verde bosque
+            "activo":     "#1c3160",   # azul marino
+            "pendiente":  "#6b7a96",   # gris medio visible en claro
+            "error":      "#8b1c1c",   # rojo ladrillo
+        }[estado]
+        icono = "✔" if estado == "completado" else ("✖" if estado == "error" else str(i))
+        items.append(html.Div(
+            style={"display": "flex", "flexDirection": "column",
+                   "alignItems": "center", "gap": "5px", "flex": "1"},
+            children=[
+                html.Div(icono, style={
+                    "width": "32px", "height": "32px", "borderRadius": "50%",
+                    "border": f"2px solid {color}", "color": color,
+                    "display": "flex", "alignItems": "center", "justifyContent": "center",
+                    "fontWeight": "700", "fontSize": "0.8rem",
+                    "background": "rgba(28,49,96,0.1)" if estado == "activo" else "transparent",
+                    "boxShadow": "0 0 0 4px rgba(28,49,96,0.12)" if estado == "activo" else "none",
+                }),
+                html.Span(label, style={
+                    "fontSize": "0.68rem", "color": color, "fontWeight": "600",
+                }),
+            ],
+        ))
+        if i < len(PASOS_LABELS):
+            items.append(html.Div(style={
+                "flex": "2", "height": "2px", "marginTop": "-17px",
+                "background": "#1e6b45" if i < paso else "#b0bacb",
+                "transition": "background 0.5s",
+            }))
+    desc_color = "#f87171" if paso == -1 else "#8892a4"
+    desc = ("⚠ Error durante el análisis" if paso == -1
+            else PASOS_DETALLE[paso - 1] if 1 <= paso <= 4
+            else "Iniciando...")
+    return html.Div([
+        html.Div(items, style={
+            "display": "flex", "alignItems": "center",
+            "padding": "4px 0 10px", "gap": "2px",
+        }),
+        html.P(desc, style={
+            "color": desc_color, "fontSize": "0.75rem",
+            "margin": "0", "textAlign": "center",
+        }),
+    ])
+# ── Registro de callbacks ─────────────────────────────────────────────────────
+def registrar_callbacks(app: dash.Dash) -> None:
+    # 1. Botón Analizar → lanza threads, muestra panel
+    @app.callback(
+        Output("interval_progress", "disabled"),
+        Output("panel-progreso", "style"),
+        Output("error-msg", "children"),
+        Input({"type": "btn", "index": ALL}, "n_clicks"),
+        State("store_name", "data"),
+        State("store_link", "data"),
+        prevent_initial_call=True,
+    )
+    def iniciar_analisis(n_clicks, nombre, link):
+        global _estado
+        if not ctx.triggered_id or not ctx.triggered[0]["value"]:
+            return dash.no_update, dash.no_update, ""
+        tid = ctx.triggered_id
+        if not (isinstance(tid, dict) and tid.get("type") == "btn"):
+            return dash.no_update, dash.no_update, ""
+        if tid.get("index") != 0:
+            return True, _PANEL_OCULTO, ""
+        if not link:
+            return True, _PANEL_VISIBLE, html.Div(
+                "⚠ Por favor ingresa el link del sitio web.",
+                style={"color": "#f87171", "fontSize": "0.82rem", "marginTop": "8px"},
+            )
+        with _lock:
+            _estado = {"paso": 0, "pct": 0, "perfil": None, "error": None}
+        threading.Thread(target=_correr_analisis,
+                         args=(link, nombre or "Organización"), daemon=True).start()
+        threading.Thread(target=_animar_porcentaje, daemon=True).start()
+        return False, _PANEL_VISIBLE, ""
+    # 2. Botones volver al inicio
+    @app.callback(
+        Output("url", "pathname"),
+        Input({"type": "btn", "index": ALL}, "n_clicks"),
+        prevent_initial_call=True,
+    )
+    def volver_inicio(n_clicks):
+        if not ctx.triggered_id or not ctx.triggered[0]["value"]:
+            return dash.no_update
+        tid = ctx.triggered_id
+        if isinstance(tid, dict) and tid.get("type") == "btn" and tid.get("index") != 0:
+            return "/"
+        return dash.no_update
+    # 3. Polling → actualiza barra %, pasos, y redirige al terminar
+    @app.callback(
+        Output("progreso-container", "children"),
+        Output("progreso-bar", "style"),
+        Output("progreso-pct", "children"),
+        Output("url", "pathname", allow_duplicate=True),
+        Output("interval_progress", "disabled", allow_duplicate=True),
+        Output("panel-progreso", "style", allow_duplicate=True),
+        Input("interval_progress", "n_intervals"),
+        prevent_initial_call=True,
+    )
+    def actualizar_progreso(n):
+        with _lock:
+            e = dict(_estado)
+        paso = e["paso"]
+        pct  = e.get("pct", 0)
+        bar_style = {
+            "height": "100%", "borderRadius": "999px",
+            "background": "linear-gradient(90deg, #b87a2a, #d4a843)",
+            "width": f"{pct}%",
+            "transition": "width 0.8s ease",
+        }
+        pct_txt = f"{pct}%"
+        # Error — mostrar panel descriptivo
+        if paso == -1:
+            err = e.get("error", {})
+            if isinstance(err, dict):
+                titulo   = err.get("titulo",   "Error durante el análisis")
+                detalle  = err.get("detalle",  "Ocurrió un problema inesperado.")
+                sugiere  = err.get("sugerencia", "")
+                tipo     = err.get("tipo", "desconocido")
+            else:
+                titulo, detalle, sugiere, tipo = str(err), "", "", "desconocido"
+            iconos = {
+                "red": "🌐", "ssl": "🔒", "acceso": "🚫",
+                "url": "🔗", "modelo": "🤖", "extraccion": "📄",
+                "desconocido": "⚠",
+            }
+            icono = iconos.get(tipo, "⚠")
+            panel_error = html.Div(
+                style={
+                    "marginTop": "12px",
+                    "background": "#fdeaea",
+                    "border": "1.5px solid #d48585",
+                    "borderRadius": "8px",
+                    "padding": "14px 16px",
+                },
+                children=[
+                    html.Div(
+                        style={"display": "flex", "alignItems": "center",
+                               "gap": "8px", "marginBottom": "6px"},
+                        children=[
+                            html.Span(icono, style={"fontSize": "1.1rem"}),
+                            html.Span(titulo, style={
+                                "fontWeight": "700", "color": "#8b1c1c",
+                                "fontSize": "0.95rem",
+                            }),
+                        ],
+                    ),
+                    html.P(detalle, style={
+                        "color": "#6b2020", "fontSize": "0.9rem",
+                        "margin": "0 0 6px", "lineHeight": "1.5",
+                    }),
+                    html.P(f"💡 {sugiere}", style={
+                        "color": "#3d4a60", "fontSize": "0.88rem",
+                        "margin": "0", "fontStyle": "italic",
+                    }) if sugiere else None,
+                    html.Button(
+                        "Intentar de nuevo",
+                        style={
+                            "marginTop": "12px", "padding": "10px 20px",
+                            "background": "#fff", "border": "2px solid #8b1c1c",
+                            "borderRadius": "8px", "color": "#8b1c1c",
+                            "cursor": "pointer", "fontSize": "0.95rem",
+                            "fontWeight": "700", "fontFamily": "inherit",
+                        },
+                        id={"type": "btn", "index": 0},
+                        n_clicks=0,
+                    ),
+                ],
+            )
+            return (
+                html.Div([_pasos_html(-1), panel_error]),
+                {**bar_style, "background": "#8b1c1c", "width": "100%", "boxShadow": "0 1px 4px rgba(139,28,28,0.4)"},
+                "Error",
+                dash.no_update, True, _PANEL_VISIBLE,
+            )
+        # Completado
+        if paso == 4 and e["perfil"]:
+            return (
+                _pasos_html(4),
+                {**bar_style, "width": "100%"},
+                "100%",
+                f"/profile_{e['perfil']}", True, _PANEL_OCULTO,
+            )
+        # En progreso
+        return _pasos_html(paso), bar_style, pct_txt, dash.no_update, dash.no_update, dash.no_update
+    # 4. Guardar datos en sesión
+    @app.callback(
+        Output("store_name", "data"),
+        Output("store_link", "data"),
+        Input("Nombre_org", "value"),
+        Input("Link_org", "value"),
+        prevent_initial_call=True,
+    )
+    def guardar_datos(nombre, link):
+        return nombre, link
+    # 5. Abrir modal Venn al hacer clic en la imagen
+    @app.callback(
+        Output({"type": "venn-modal", "index": ALL}, "style"),
+        Input({"type": "venn-thumb", "index": ALL}, "n_clicks"),
+        Input({"type": "venn-close", "index": ALL}, "n_clicks"),
+        prevent_initial_call=True,
+    )
+    def toggle_venn_modal(thumb_clicks, close_clicks):
+        _ABIERTO = {"display": "flex"}
+        _CERRADO = {"display": "none"}
+        if not ctx.triggered_id:
+            return [_CERRADO] * len(thumb_clicks)
+        tid = ctx.triggered_id
+        tipo = tid.get("type")
+        idx  = tid.get("index")
+        # Obtener cuántos perfiles hay
+        n = len(thumb_clicks)
+        result = [_CERRADO] * n
+        if tipo == "venn-thumb":
+            # Abrir el modal del perfil clicado (índice = número de perfil 1-5)
+            for j in range(n):
+                # Los índices son 1..5, la lista es 0..4
+                if j + 1 == idx:
+                    result[j] = _ABIERTO
+        # Si es venn-close, todos quedan cerrados (result ya es todo _CERRADO)
+        return result

data/__init__.py ADDED Viewed

File without changes

data/definitions.py ADDED Viewed

	@@ -0,0 +1,272 @@

+"""
+Definiciones, constantes y datos estáticos de la aplicación AIM Dashboard.
+Separado del código lógico para facilitar mantenimiento y localización.
+"""
+DEFINICION_TRIADA_AIM = """
+**La Tríada AIM (Awareness, Infrastructure, Management)** no es un nuevo modelo de madurez
+en ciberseguridad, sino una estrategia de priorización diseñada para guiar a las organizaciones.
+A través de nuestro servicio obtenemos el perfil correspondiente a su empresa a partir del
+link de su sitio web, para dar indicaciones acerca de qué estrategias de ciberseguridad
+serían adecuadas para su organización.
+"""
+DEFINICIONES_PERFILES = [
+    # ── Perfil 1 ── Gestión formalizada, visibilidad técnica limitada
+    """
+**Perfil 1 — El Gestor Formalizado**
+Su organización ha construido una base sólida en la dimensión de **Gestión**: cuenta con
+políticas documentadas, conciencia sobre obligaciones legales y una estructura básica para
+administrar activos y personal de seguridad. Entiende *qué* debe proteger y *quién* es
+responsable de hacerlo.
+Sin embargo, la brecha crítica está en la **visibilidad técnica**: carece de capacidad para
+detectar amenazas en tiempo real, su arquitectura de red no está diseñada pensando en
+seguridad y los vectores de ataque técnicos permanecen sin monitoreo activo.
+**¿Qué significa esto en la práctica?**
+Su empresa sabe que existe el riesgo, tiene los papeles en orden, pero no vería un ataque
+en curso hasta que ya causara daño. Es como tener un buen seguro pero ninguna alarma.
+**Próximos pasos recomendados:**
+Priorizar la implementación de herramientas de monitoreo (SIEM básico o EDR), revisar la
+segmentación de red y realizar un primer ejercicio de análisis de vulnerabilidades.
+    """,
+    # ── Perfil 2 ── Alta madurez técnica y de conciencia, gestión en desarrollo
+    """
+**Perfil 2 — El Técnico Avanzado**
+Su organización demuestra un nivel de madurez **excepcionalmente alto** en las dimensiones
+técnicas: tiene tecnología de seguridad bien implementada, arquitectura defensiva estructurada
+y una visibilidad del entorno de amenazas muy por encima del promedio para una PyME.
+Además posee una cultura de seguridad activa y capacidad de detección y respuesta ante
+incidentes. En términos del modelo AIM, cubre prácticamente todo el espectro de
+**Awareness** e **Infrastructure**.
+La oportunidad de mejora está en la **formalización de la gestión**: los procesos existen
+pero dependen de personas clave, la gestión del talento de seguridad no está sistematizada
+y el cumplimiento normativo puede estar rezagado frente al nivel técnico alcanzado.
+**Próximos pasos recomendados:**
+Documentar y transferir el conocimiento tácito a procesos formales, revisar brechas de
+cumplimiento regulatorio y estructurar un plan de sucesión para roles críticos de seguridad.
+    """,
+    # ── Perfil 3 ── Operaciones robustas, estrategia y cultura por desarrollar
+    """
+**Perfil 3 — El Operador Robusto**
+Su organización ha madurado en la ejecución operativa de la seguridad: administra
+correctamente sus activos, mantiene su fuerza laboral alineada con las responsabilidades
+de seguridad y ha desarrollado capacidades para identificar y mitigar vulnerabilidades técnicas.
+Existe un programa de seguridad que funciona en el día a día. La gestión del riesgo tiene
+presencia y la gestión del conocimiento es un punto diferenciador positivo.
+Las brechas se concentran en la **dirección estratégica y la cultura**: no existe una
+política de seguridad que unifique los esfuerzos, la seguridad no está integrada como
+valor organizacional y el cumplimiento normativo no ha sido abordado formalmente.
+**Próximos pasos recomendados:**
+Desarrollar una política de seguridad corporativa aprobada por la dirección, iniciar un
+programa de cultura de seguridad para el personal no técnico y mapear los requisitos
+regulatorios aplicables al sector.
+    """,
+    # ── Perfil 4 ── Cultura y conciencia desarrolladas, infraestructura rezagada
+    """
+**Perfil 4 — El Consciente Estratégico**
+Su organización tiene algo valioso y difícil de construir: una **cultura de seguridad genuina**
+y capacidad para detectar y responder a incidentes. El personal entiende los riesgos,
+existe conciencia situacional y la gestión del conocimiento en seguridad es un activo real.
+Esto la ubica por delante de la mayoría de las PyMEs, donde la mayor vulnerabilidad
+es precisamente el factor humano.
+La brecha está en la **infraestructura técnica**: la arquitectura no refleja el nivel de
+madurez cultural alcanzado, los vectores de ataque técnico no están completamente
+mitigados y el marco normativo-legal no ha sido formalizado.
+**Próximos pasos recomendados:**
+Traducir la cultura de seguridad existente en controles técnicos concretos: segmentación
+de red, gestión de vulnerabilidades y hardening de sistemas. Aprovechar la conciencia
+del equipo para acelerar la adopción de nuevas herramientas.
+    """,
+    # ── Perfil 5 ── Infraestructura y gestión sólidas, detección y cultura incipientes
+    """
+**Perfil 5 — El Arquitecto Estructurado**
+Su organización ha invertido en construir una **base técnica y de gestión coherente**:
+la arquitectura de seguridad está diseñada defensivamente, los activos están bajo control,
+existe una estrategia de seguridad formal y se gestiona el riesgo con criterios definidos.
+Es una organización que "construyó bien": su infraestructura digital refleja decisiones
+de diseño seguro y los procesos de gestión respaldan esas decisiones.
+Las áreas de mejora están en la **capacidad de detección activa y el factor humano**:
+aún no se ha desarrollado plenamente la capacidad para detectar y responder a incidentes
+en tiempo real, y la seguridad como valor cultural todavía no está arraigada en el
+comportamiento cotidiano del personal.
+**Próximos pasos recomendados:**
+Implementar capacidades de detección y respuesta (SOC básico o servicio MDR), desarrollar
+un programa de concientización continua para el personal y establecer ejercicios de
+simulación de incidentes (tabletop exercises).
+    """]
+DOMINIOS_DEFINICIONES = {
+    "Cultura y Sociedad": (
+        "Refleja los valores, creencias y comportamientos del equipo frente a la seguridad. "
+        "Una cultura madura convierte a cada persona en un sensor activo de riesgos, donde "
+        "reportar incidentes se ve como una responsabilidad compartida y no como una amenaza."
+    ),
+    "Conciencia Situacional": (
+        "Capacidad de detectar, correlacionar y comprender eventos de seguridad en tiempo real. "
+        "Incluye monitoreo centralizado (SIEM/XDR), inteligencia de amenazas y dashboards que "
+        "traducen datos técnicos en información útil para la toma de decisiones."
+    ),
+    "Estándares y Tecnología": (
+        "Adopción y operación disciplinada de estándares (NIST, ISO 27001, CIS) y herramientas "
+        "de seguridad. No basta con comprar tecnología: este dominio mide si está correctamente "
+        "configurada, integrada y mantenida."
+    ),
+    "Arquitectura": (
+        "Diseño estructural de la infraestructura digital pensado para minimizar el impacto de "
+        "una brecha. Incluye segmentación de red, modelos Zero Trust, zonas desmilitarizadas (DMZ) "
+        "y principios de mínimo privilegio aplicados desde el diseño."
+    ),
+    "Amenazas y Vulnerabilidades": (
+        "Ciclo de vida completo de identificación y mitigación de vulnerabilidades: desde "
+        "escaneos automáticos y pruebas de penetración hasta la gestión priorizada de parches "
+        "según el riesgo real para el negocio."
+    ),
+    "Programa": (
+        "Existencia de un programa formal de ciberseguridad con objetivos, presupuesto, "
+        "métricas y hoja de ruta. Asegura que las iniciativas de seguridad sean planificadas "
+        "y ejecutadas como un esfuerzo coherente y sostenido."
+    ),
+    "Capital Humano": (
+        "Gestión del talento de seguridad: roles definidos, responsabilidades claras, "
+        "capacitación continua y planificación de sucesión. Cubre tanto al equipo técnico "
+        "especializado como al personal general con obligaciones de seguridad."
+    ),
+    "Activos y Configuración": (
+        "Inventario actualizado de todos los activos digitales y físicos, con control de "
+        "cambios que previene modificaciones no autorizadas o inseguras. Sin saber qué "
+        "activos existen, es imposible protegerlos."
+    ),
+    "Marco Legal y Regulatorio": (
+        "Cumplimiento de leyes, regulaciones sectoriales y contratos que imponen obligaciones "
+        "de seguridad (ej. Ley 21.663 en Chile, GDPR si hay datos europeos). Implica mapear "
+        "los requisitos aplicables y traducirlos en controles operacionales."
+    ),
+    "Marco Legal y Regulatorio": (
+        "Cumplimiento de leyes, regulaciones sectoriales y contratos que imponen obligaciones "
+        "de seguridad. Implica mapear los requisitos aplicables y traducirlos en controles "
+        "operacionales concretos y auditables."
+    ),
+    "Detección y Respuesta": (
+        "Capacidad de detectar, contener, erradicar y recuperarse de incidentes de seguridad "
+        "de forma estructurada. Incluye playbooks de respuesta, equipos con roles definidos "
+        "y ejercicios de simulación que validan la preparación real."
+    ),
+    "Política y Estrategia": (
+        "Marco de políticas formales aprobadas por la dirección que definen el comportamiento "
+        "esperado, los controles requeridos y la estrategia de seguridad a mediano y largo "
+        "plazo. Proporciona el 'norte' que orienta todas las demás decisiones."
+    ),
+    "Conocimiento y Capacidades": (
+        "Base de conocimiento institucional en ciberseguridad: inteligencia de amenazas, "
+        "lecciones aprendidas, competencias técnicas especializadas y capacidad de investigación. "
+        "Permite anticipar tendencias y no solo reaccionar a ellas."
+    ),
+    "Riesgo": (
+        "Proceso sistemático para identificar, evaluar y priorizar riesgos según su probabilidad "
+        "e impacto en el negocio. Un registro de riesgos activo permite tomar decisiones de "
+        "inversión en seguridad basadas en evidencia, no en intuición."
+    ),
+}
+# Dominios débiles (áreas a mejorar) por perfil (índice 0 = Perfil 1)
+CONSEJOS_PERFILES_NEGATIVOS = [
+    ["Conciencia Situacional", "Arquitectura", "Amenazas y Vulnerabilidades"],
+    ["Marco Legal y Regulatorio", "Capital Humano", "Activos y Configuración"],
+    ["Política y Estrategia", "Cultura y Sociedad", "Marco Legal y Regulatorio"],
+    ["Arquitectura", "Amenazas y Vulnerabilidades", "Marco Legal y Regulatorio"],
+    ["Detección y Respuesta", "Cultura y Sociedad", "Conciencia Situacional"],
+]
+# Dominios fuertes (fortalezas) por perfil
+CONSEJOS_PERFILES_POSITIVO = [
+    ["Marco Legal y Regulatorio", "Capital Humano", "Activos y Configuración"],
+    ["Estándares y Tecnología", "Conciencia Situacional", "Arquitectura"],
+    ["Programa", "Amenazas y Vulnerabilidades", "Capital Humano"],
+    ["Detección y Respuesta", "Cultura y Sociedad", "Conciencia Situacional"],
+    ["Arquitectura", "Amenazas y Vulnerabilidades", "Activos y Configuración"],
+]
+# Fortalezas completas por perfil (para el diagrama Venn)
+FORTALEZAS_PERFIL = [
+    ["Política y Estrategia", "Riesgo", "Programa", "Detección y Respuesta",
+     "Marco Legal y Regulatorio", "Activos y Configuración", "Capital Humano"],
+    ["Política y Estrategia", "Conocimiento y Capacidades", "Detección y Respuesta",
+     "Estándares y Tecnología", "Arquitectura", "Amenazas y Vulnerabilidades",
+     "Cultura y Sociedad", "Conciencia Situacional"],
+    ["Conocimiento y Capacidades", "Riesgo", "Programa", "Activos y Configuración",
+     "Capital Humano", "Amenazas y Vulnerabilidades"],
+    ["Conocimiento y Capacidades", "Riesgo", "Detección y Respuesta",
+     "Cultura y Sociedad", "Conciencia Situacional"],
+    ["Política y Estrategia", "Riesgo", "Programa", "Marco Legal y Regulatorio",
+     "Activos y Configuración", "Arquitectura", "Amenazas y Vulnerabilidades"],
+]
+# --- Diagrama Venn ---
+COLORES_BASE = {
+    "Concientizacion":  "#e06060",
+    "Infraestructura":  "#4a8c5c",
+    "Gestion":          "#3a6ab0",
+    "Bridge":           "#c87e30",
+    "Core":             "#8a6d3b",
+    "Desactivado":      "#c8d0df",
+}
+VENN_SECTIONS = {
+    "100": "Concientizacion",
+    "010": "Infraestructura",
+    "001": "Gestion",
+    "110": "Bridge",
+    "101": "Bridge",
+    "011": "Bridge",
+    "111": "Core",
+}
+SUBCATEGORIAS = {
+    "A":     ["Cultura y Sociedad", "Conciencia Situacional"],
+    "I":     ["Arquitectura", "Amenazas y Vulnerabilidades"],
+    "M":     ["Marco Legal y Regulatorio", "Activos y Configuración", "Capital Humano"],
+    "A-I":   ["Estándares y Tecnología"],
+    "A-M":   ["Detección y Respuesta"],
+    "I-M":   ["Programa"],
+    "A-I-M": ["Política y Estrategia", "Conocimiento y Capacidades", "Riesgo"],
+}
+DOMINIOS_POR_ETAPA = {
+    "Core":           SUBCATEGORIAS["A-I-M"],
+    "Bridge":         SUBCATEGORIAS["A-I"] + SUBCATEGORIAS["A-M"] + SUBCATEGORIAS["I-M"],
+    "Concientización": SUBCATEGORIAS["A"],
+    "Infraestructura": SUBCATEGORIAS["I"],
+    "Gestión":        SUBCATEGORIAS["M"],
+}
+# Mapeo de zonas Venn a subcategorías
+VENN_ID_TO_SUBCATEGORIA = {
+    "100": SUBCATEGORIAS["A"],
+    "010": SUBCATEGORIAS["I"],
+    "001": SUBCATEGORIAS["M"],
+    "110": SUBCATEGORIAS["A-I"],
+    "101": SUBCATEGORIAS["A-M"],
+    "011": SUBCATEGORIAS["I-M"],
+    "111": SUBCATEGORIAS["A-I-M"],
+}

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,30 @@

+version: "3.9"
+services:
+  aim-dashboard:
+    build: .
+    container_name: aim_dashboard
+    ports:
+      - "8050:8050"        # Acceso en http://localhost:8050
+    volumes:
+      # Monta el modelo externo para no tener que reconstruir la imagen
+      # si actualizas el modelo K-Means
+      - ./Modelo_Pymes.pkl:/app/Modelo_Pymes.pkl:ro
+    environment:
+      - PYTHONUNBUFFERED=1
+    restart: unless-stopped
+    # Límite de memoria recomendado (los modelos NLP son pesados)
+    mem_limit: 4g
+  # ── Opcional: Nginx como proxy reverso (descomentar para producción) ──────
+  # nginx:
+  #   image: nginx:alpine
+  #   container_name: aim_nginx
+  #   ports:
+  #     - "80:80"
+  #     - "443:443"
+  #   volumes:
+  #     - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
+  #   depends_on:
+  #     - aim-dashboard
+  #   restart: unless-stopped

layouts/__init__.py ADDED Viewed

File without changes

layouts/pages.py ADDED Viewed

	@@ -0,0 +1,268 @@

+"""
+Módulo de layouts del dashboard AIM.
+"""
+from dash import dcc, html
+from data.definitions import (
+    DEFINICION_TRIADA_AIM,
+    DEFINICIONES_PERFILES,
+    DOMINIOS_DEFINICIONES,
+    CONSEJOS_PERFILES_NEGATIVOS,
+    CONSEJOS_PERFILES_POSITIVO,
+    FORTALEZAS_PERFIL,
+)
+from logic.venn import VENN_IMG_COMPLETO, generar_venn_base
+# ---------- Pantalla de inicio ------------------------------------------------
+def crear_layout_home() -> html.Div:
+    return html.Div(
+        className="aim-container",
+        children=[
+            html.Div(className="aim-header", children=[
+                html.H1("Tríada AIM"),
+                html.P("Perfil de ciberseguridad para PyMEs", className="subtitle"),
+            ]),
+            html.Div(className="aim-two-col", children=[
+                # ---- Columna izquierda ----
+                html.Div(className="col-left", children=[
+                    html.Div(className="aim-card", children=[
+                        html.Div("Ingrese sus datos", className="aim-card-label"),
+                        html.Label("Nombre de la organización",
+                                   style={"fontWeight": "600", "marginBottom": "4px"}),
+                        dcc.Input(
+                            id="Nombre_org", type="text",
+                            placeholder="Ej: Mi Empresa S.A.",
+                            className="aim-input",
+                        ),
+                        html.Label("Sitio web de la organización",
+                                   style={"fontWeight": "600", "marginBottom": "4px"}),
+                        dcc.Input(
+                            id="Link_org", type="text",
+                            placeholder="https://www.miempresa.cl",
+                            className="aim-input",
+                        ),
+                        html.Button(
+                            "Analizar perfil →",
+                            id={"type": "btn", "index": 0},
+                            n_clicks=0,
+                            className="aim-btn-primary",
+                        ),
+                    ]),
+                    # Descripción tríada
+                    html.Div(className="aim-card", children=[
+                        html.Div("¿Qué es la Tríada AIM?", className="aim-card-label"),
+                        dcc.Markdown(DEFINICION_TRIADA_AIM, className="aim-definition"),
+                    ]),
+                ]),
+                # ---- Columna derecha ----
+                html.Div(className="col-right", children=[
+                    html.Div(className="aim-card", children=[
+                        html.Div("Modelo de referencia", className="aim-card-label"),
+                        html.Img(src=VENN_IMG_COMPLETO, className="venn-img"),
+                    ]),
+                ]),
+            ]),
+        ],
+    )
+# ---------- Pantalla de perfil ------------------------------------------------
+def _tarjeta_dominio(nombre: str, tipo: str) -> html.Div:
+    """Tarjeta acordeón con descripción completa al expandir."""
+    is_strength = tipo == "strength"
+    color       = "var(--success)"    if is_strength else "var(--danger)"
+    bg_color    = "var(--success-bg)" if is_strength else "var(--danger-bg)"
+    border_col  = "var(--success-border)" if is_strength else "var(--danger-border)"
+    icono       = "✦" if is_strength else "◈"
+    desc        = DOMINIOS_DEFINICIONES.get(nombre, "")
+    return html.Details(
+        style={
+            "background": bg_color,
+            "border": f"1.5px solid {border_col}",
+            "borderRadius": "8px",
+            "marginBottom": "10px",
+            "overflow": "hidden",
+        },
+        children=[
+            html.Summary(
+                style={
+                    "padding": "14px 18px",
+                    "cursor": "pointer",
+                    "display": "flex",
+                    "alignItems": "center",
+                    "gap": "10px",
+                    "fontWeight": "700",
+                    "fontSize": "1rem",
+                    "color": color,
+                    "listStyle": "none",
+                    "userSelect": "none",
+                },
+                children=[
+                    html.Span(icono, style={"fontSize": "0.9rem", "flexShrink": "0"}),
+                    html.Span(nombre, style={"flex": "1"}),
+                    html.Span("▾  ver más", style={
+                        "fontSize": "0.78rem",
+                        "color": "var(--text-muted)",
+                        "fontWeight": "500",
+                        "whiteSpace": "nowrap",
+                    }),
+                ],
+            ),
+            html.Div(
+                style={
+                    "padding": "12px 18px 16px",
+                    "borderTop": f"1.5px solid {border_col}",
+                    "background": "#ffffff",
+                },
+                children=[
+                    html.P(
+                        desc,
+                        style={
+                            "margin": "0",
+                            "fontSize": "0.98rem",
+                            "color": "var(--text-secondary)",
+                            "lineHeight": "1.75",
+                        },
+                    )
+                ],
+            ),
+        ],
+    )
+def _columna_dominios(titulo: str, icono: str, color: str,
+                      dominios: list, tipo: str) -> html.Div:
+    return html.Div([
+        html.Div(
+            className="aim-domain-col-header",
+            style={"color": color, "borderBottomColor": color, "fontSize": "0.82rem"},
+            children=[
+                html.Span(icono, style={"fontSize": "1rem"}),
+                html.Span(titulo),
+            ],
+        ),
+        *[_tarjeta_dominio(d, tipo) for d in dominios],
+    ])
+def crear_layout_perfil(i: int) -> html.Div:
+    idx         = i - 1
+    venn_img    = f"data:image/png;base64,{generar_venn_base('Reporte', FORTALEZAS_PERFIL[idx])}"
+    fortalezas  = CONSEJOS_PERFILES_POSITIVO[idx]
+    debilidades = CONSEJOS_PERFILES_NEGATIVOS[idx]
+    return html.Div(
+        className="aim-container",
+        children=[
+            # Botón volver
+            html.Div(className="aim-nav-home", children=[
+                html.Button(
+                    "← Inicio",
+                    id={"type": "btn", "index": i},
+                    n_clicks=0,
+                    className="aim-btn-secondary",
+                    style={"width": "auto", "padding": "8px 18px"},
+                )
+            ]),
+            # Header
+            html.Div(className="aim-header", children=[
+                html.H1(f"Perfil {i}"),
+                html.P("Resultado del análisis de ciberseguridad AIM", className="subtitle"),
+            ]),
+            # Descripción del perfil (ancho completo)
+            html.Div(className="aim-card", style={"marginBottom": "20px"}, children=[
+                html.Div(f"Perfil {i} — Descripción", className="aim-card-label"),
+                dcc.Markdown(DEFINICIONES_PERFILES[idx], className="aim-definition"),
+            ]),
+            # Grid: fortalezas | debilidades | Venn (Venn más ancho)
+            html.Div(
+                style={
+                    "display": "grid",
+                    "gridTemplateColumns": "1fr 1fr 1.4fr",
+                    "gap": "20px",
+                    "alignItems": "start",
+                },
+                children=[
+                    # Fortalezas
+                    html.Div(className="aim-card", children=[
+                        _columna_dominios(
+                            "Fortalezas identificadas", "✦", "#34d399",
+                            fortalezas, "strength",
+                        ),
+                    ]),
+                    # Áreas de mejora
+                    html.Div(className="aim-card", children=[
+                        _columna_dominios(
+                            "Áreas de mejora", "◈", "#f87171",
+                            debilidades, "weakness",
+                        ),
+                    ]),
+                    # Diagrama Venn con zoom modal
+                    html.Div(className="aim-card", style={"textAlign": "center"}, children=[
+                        html.Div("Cobertura AIM", className="aim-card-label"),
+                        # Imagen clickeable
+                        html.Div(
+                            style={"position": "relative", "cursor": "zoom-in"},
+                            children=[
+                                html.Img(
+                                    src=venn_img,
+                                    className="venn-img venn-clickable",
+                                    id={"type": "venn-thumb", "index": i},
+                                    n_clicks=0,
+                                ),
+                                html.Div(
+                                    "🔍 Clic para ampliar",
+                                    style={
+                                        "position": "absolute", "bottom": "10px",
+                                        "right": "10px", "background": "rgba(28,49,96,0.75)",
+                                        "color": "#fff", "fontSize": "0.75rem",
+                                        "padding": "4px 10px", "borderRadius": "999px",
+                                        "fontWeight": "600", "pointerEvents": "none",
+                                    }
+                                ),
+                            ],
+                        ),
+                        html.P(
+                            "✔ Dominios cubiertos   ✗ Por desarrollar",
+                            style={"color": "var(--text-muted)", "fontSize": "0.92rem",
+                                   "marginTop": "12px", "fontWeight": "500"},
+                        ),
+                        # Modal overlay
+                        html.Div(
+                            id={"type": "venn-modal", "index": i},
+                            style={"display": "none"},
+                            className="venn-modal-overlay",
+                            children=[
+                                html.Div(className="venn-modal-box", children=[
+                                    html.Button(
+                                        "✕ Cerrar",
+                                        id={"type": "venn-close", "index": i},
+                                        className="venn-modal-close",
+                                        n_clicks=0,
+                                    ),
+                                    html.Img(src=venn_img, className="venn-modal-img"),
+                                ]),
+                            ],
+                        ),
+                    ]),
+                ],
+            ),
+        ],
+    )
+layout_home = crear_layout_home()
+all_layouts = [crear_layout_perfil(i) for i in range(1, 6)]

logic/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # logic package

logic/extractor.py ADDED Viewed

	@@ -0,0 +1,267 @@

+"""
+Módulo de extracción y clasificación de texto desde sitios web.
+Clase ExtractorMVD: navega el sitio, extrae contenido relevante y
+clasifica oraciones en categorías MISIÓN, VISIÓN y DESCRIPCIÓN.
+"""
+import re
+import logging
+import requests
+from bs4 import BeautifulSoup
+import trafilatura
+import nltk
+from urllib.parse import urljoin
+logger = logging.getLogger(__name__)
+# Descarga de recursos NLTK solo si no están disponibles
+def _asegurar_nltk():
+    for recurso in ("punkt", "punkt_tab"):
+        try:
+            nltk.data.find(f"tokenizers/{recurso}")
+        except LookupError:
+            nltk.download(recurso, quiet=True)
+_asegurar_nltk()
+class ExtractorMVD:
+    """
+    Extrae y clasifica el contenido de texto de un sitio web corporativo.
+    Uso:
+        extractor = ExtractorMVD(url="https://empresa.cl", nombre="Empresa S.A.")
+        extractor.navegar_y_extraer()
+        resultado = extractor.clasificar_inteligente()
+    """
+    # Rutas de fallback cuando no se encuentran links de navegación
+    RUTAS_FALLBACK = [
+        "/nosotros", "/pages/nosotros", "/quienes-somos", "/somos",
+        "/about", "/mision", "/conocenos", "/about-us",
+        "/empresa.htm", "/nosotros-2", "/nuestra-empresa",
+    ]
+    def __init__(self, url: str, nombre: str):
+        self.url_base = url
+        self.nombre = nombre
+        self.paginas_candidatas: list[str] = []
+        self.datos_crudos: list[dict] = []
+        # Keywords para detectar páginas "Acerca de" en la navegación
+        self._nav_keywords = [
+            "desarrollo", "ubicada", "nosotros", "nosotras", "quienes",
+            "misión", "vision", "historia", "origen", "equipo", "about",
+            "propósito", "manifiesto", "impacto", "sobre",
+            "we", "us", "who", "mission", "vision", "history", "origin",
+            "team", "purpose", "manifesto", "impact",
+        ]
+        # Patrones de clasificación semántica
+        self._patrones = {
+            "MISION": {
+                "fuerte": [
+                    r"nos enfocamos", r"buscamos hacer", r"nuestra misión",
+                    r"nuestro propósito", r"nuestro objetivo", r"la misión es",
+                    r"razón de ser", r"por qué existimos", r"nos dedicamos a",
+                    r"nuestro compromiso es", r"our mision", r"our purpose",
+                    r"our objective", r"mision is", r"we exist",
+                    r"we are dedicated to", r"our commitment is",
+                ],
+                "debil": [
+                    r"trabajamos para", r"buscamos", r"ayudar a", r"solucionar",
+                    r"dar una alternativa", r"reemplazar", r"permitir", r"entregar",
+                    r"proveer", r"facilitar", r"impulsar", r"fomentar", r"promover",
+                    r"asegurar", r"garantizar", r"contribuir", r"aportar",
+                    r"generar valor", r"we work for", r"we search", r"help",
+                    r"solve", r"give an alternative", r"replace", r"allow",
+                    r"deliver", r"provide", r"facilitate", r"encourage",
+                    r"assure", r"guarantee", r"contribute", r"create value",
+                ],
+            },
+            "VISION": {
+                "fuerte": [
+                    r"nuestra visión", r"queremos ser", r"seremos", r"proyectamos",
+                    r"hacia el futuro", r"nuestro sueño", r"soñamos con",
+                    r"aspiramos a", r"horizonte", r"queremos llevar",
+                    r"our vision", r"we want to be", r"we will be", r"we project",
+                    r"into the future", r"our dream", r"we dream of",
+                    r"we aspire to", r"horizon", r"we want to take",
+                ],
+                "debil": [
+                    r"convertirnos en", r"liderar", r"referente", r"mundo",
+                    r"global", r"internacional", r"latinoamérica", r"impacto real",
+                    r"cambio es necesario", r"otra forma de vida", r"revolucionar",
+                    r"transformar", r"redefinir", r"innovación constante",
+                    r"vanguardia", r"consolidarnos", r"reconocidos por",
+                    r"largo plazo", r"transform", r"become", r"lead", r"world",
+                    r"international", r"latin america", r"real impact",
+                    r"necessary change", r"another way of life", r"revolutionize",
+                    r"redefine", r"constant innovation", r"vanguard",
+                    r"consolidate", r"recognized by", r"long term",
+                ],
+            },
+            "DESCRIPCION": {
+                "fuerte": [
+                    r"somos", r"fundada en", r"experiencia", r"historia comienza",
+                    r"nació en", r"comenzó como", r"trayectoria", r"nuestros inicios",
+                    r"quienes somos", r"equipo de",
+                    r"as in our name", r"we are", r"founded in", r"experience",
+                    r"history start", r"born in", r"started as", r"trajectory",
+                    r"our begginings", r"who we are", r"team of",
+                ],
+                "debil": [
+                    r"empresa", r"compañía", r"consultora", r"organización",
+                    r"startup", r"agencia", r"firma", r"ofrecemos", r"servicios",
+                    r"productos", r"plataforma", r"soluciones", r"herramientas",
+                    r"ubicados en", r"especialistas en", r"expertos en",
+                    r"más de \d+ años", r"experiencia en", r"presencia en",
+                    r"company", r"consultant", r"organization", r"agency",
+                    r"sign", r"we offer", r"services", r"products", r"platform",
+                    r"solutions", r"tools", r"located in", r"specialists in",
+                    r"experts in", r"more than \d+ years", r"experience in",
+                    r"presence in",
+                ],
+            },
+        }
+        self._blacklisted_words = [
+            "leer más", "read more", "ver más", "cookies", "derechos reservados",
+            "copyright", "iniciar sesión", "carrito", "despachos", "envíos",
+            "vacaciones", "feriado", "horario de atención", "subscribe", "boletín",
+            "plastic free july", "sumate", "síguenos", "formulario", "censura",
+            "see more", "all rights reserved", "fifa", "concurso", "incumplimiento",
+            "teléfono", "link", "horario", "log in", "shopping cart", "shipping",
+            "deliveries", "holidays", "opening hours", "newsletter", "join us",
+            "follow us", "cookie",
+        ]
+    # ------------------------------------------------------------------
+    # Navegación y extracción
+    # ------------------------------------------------------------------
+    def navegar_y_extraer(self) -> None:
+        """Recorre el sitio web y almacena el texto extraído de cada página."""
+        try:
+            headers = {"User-Agent": "Mozilla/5.0"}
+            response = requests.get(self.url_base, headers=headers, timeout=20)
+            response.raise_for_status()
+            soup = BeautifulSoup(response.content, "html.parser")
+            self._encontrar_paginas_candidatas(soup)
+        except requests.RequestException as e:
+            logger.warning("Error al acceder a %s: %s", self.url_base, e)
+        # Siempre incluir la página raíz
+        if self.url_base not in self.paginas_candidatas:
+            self.paginas_candidatas.append(self.url_base)
+        self._extraer_textos()
+    def _encontrar_paginas_candidatas(self, soup: BeautifulSoup) -> None:
+        found_links: set[str] = set()
+        for link in soup.find_all("a", href=True):
+            href = link["href"]
+            texto_link = link.get_text().lower()
+            es_relevante = any(kw in texto_link for kw in self._nav_keywords) or \
+                           any(kw in href.lower() for kw in self._nav_keywords)
+            if es_relevante:
+                full_url = urljoin(self.url_base, href)
+                if self.url_base in full_url and full_url not in found_links:
+                    found_links.add(full_url)
+                    self.paginas_candidatas.append(full_url)
+        if not self.paginas_candidatas:
+            for ruta in self.RUTAS_FALLBACK:
+                self.paginas_candidatas.append(urljoin(self.url_base, ruta))
+    def _extraer_textos(self) -> None:
+        for url in self.paginas_candidatas:
+            try:
+                downloaded = trafilatura.fetch_url(url)
+                if downloaded:
+                    texto = trafilatura.extract(
+                        downloaded,
+                        include_comments=False,
+                        include_tables=False,
+                        include_links=False,
+                        favor_precision=True,
+                    )
+                    if texto:
+                        self.datos_crudos.append({"url": url, "texto": texto})
+            except Exception as e:
+                logger.debug("No se pudo extraer texto de %s: %s", url, e)
+    # ------------------------------------------------------------------
+    # Limpieza y validación
+    # ------------------------------------------------------------------
+    def _limpiar_texto(self, oracion: str) -> str:
+        oracion = re.sub(r"(?i)(leer\s+m[áa]s|read\s+more|ver\s+m[áa]s|ver\s+detalle)\.*", "", oracion)
+        oracion = re.sub(r"\.{2,}", "", oracion)
+        return oracion.strip()
+    def _validar_integridad(self, oracion: str) -> bool:
+        if not re.search(r'[.!?"]$', oracion):
+            return False
+        letras = [c for c in oracion if c.isalpha()]
+        if len(letras) > 10:
+            mayusculas = [c for c in letras if c.isupper()]
+            if len(mayusculas) / len(letras) > 0.6:
+                return False
+        return True
+    # ------------------------------------------------------------------
+    # Clasificación semántica
+    # ------------------------------------------------------------------
+    def clasificar_inteligente(self) -> dict | None:
+        """
+        Clasifica el texto extraído en categorías MISIÓN, VISIÓN y DESCRIPCIÓN.
+        Returns:
+            Dict {nombre_empresa: {"MISION": [...], "VISION": [...], "DESCRIPCION": [...]}}
+            o None si no hay datos.
+        """
+        if not self.datos_crudos:
+            logger.warning("No se encontraron datos para clasificar en %s", self.url_base)
+            return None
+        resultados: dict[str, list] = {"MISION": [], "VISION": [], "DESCRIPCION": []}
+        oraciones_procesadas: set[str] = set()
+        for fuente in self.datos_crudos:
+            texto_raw = fuente["texto"]
+            texto_limpio = re.sub(r"\s+", " ", texto_raw)
+            oraciones = nltk.sent_tokenize(texto_limpio)
+            for oracion in oraciones:
+                oracion = self._limpiar_texto(oracion)
+                if len(oracion) < 25 or oracion in oraciones_procesadas:
+                    continue
+                oracion_lower = oracion.lower()
+                if any(bad in oracion_lower for bad in self._blacklisted_words):
+                    continue
+                if not self._validar_integridad(oracion):
+                    continue
+                mejor_cat = None
+                max_puntaje = 0
+                for categoria, tipos in self._patrones.items():
+                    puntaje = sum(
+                        10 for p in tipos["fuerte"] if re.search(p, oracion_lower)
+                    ) + sum(
+                        3 for p in tipos["debil"] if re.search(p, oracion_lower)
+                    )
+                    if puntaje > max_puntaje:
+                        max_puntaje = puntaje
+                        mejor_cat = categoria
+                if mejor_cat and max_puntaje >= 3:
+                    resultados[mejor_cat].append(oracion)
+                    oraciones_procesadas.add(oracion)
+        return {self.nombre: resultados}

logic/modelo.py ADDED Viewed

	@@ -0,0 +1,348 @@

+"""
+Módulo de clasificación mediante modelos de NLP y K-Means.
+Contiene la lógica de vectorización semántica y predicción de perfil AIM.
+"""
+import logging
+from pathlib import Path
+import numpy as np
+import pandas as pd
+import torch
+import joblib
+from sentence_transformers import SentenceTransformer, CrossEncoder, util
+from transformers import pipeline
+from deep_translator import GoogleTranslator
+logger = logging.getLogger(__name__)
+# Ruta del modelo K-Means (relativa al directorio de este archivo)
+MODEL_PATH = Path(__file__).parent.parent / "Modelo_Pymes.pkl"
+# Pesos del comité de expertos
+W_BART   = 0.50
+W_MPNET  = 0.30
+W_NLI    = 0.20
+# Umbrales de corte
+SCORE_MINIMO  = 0.15
+UMBRAL_CORTE  = 0.08
+DOMINIOS_NUCLEO = ["Risk", "Policy and Strategy", "Knowledge and Capabilities"]
+# Herencia AIM: qué pilares influyen a cada dominio
+HERENCIA_AIM = {
+    "Risk":                          ["AWARENESS", "INFRASTRUCTURE", "MANAGEMENT"],
+    "Policy and Strategy":           ["AWARENESS", "INFRASTRUCTURE", "MANAGEMENT"],
+    "Knowledge and Capabilities":    ["AWARENESS", "INFRASTRUCTURE", "MANAGEMENT"],
+    "Incident Detection and Response": ["AWARENESS", "MANAGEMENT"],
+    "Program":                       ["MANAGEMENT", "INFRASTRUCTURE"],
+    "Standards and Technology":      ["AWARENESS", "INFRASTRUCTURE"],
+    "Culture and Society":           ["AWARENESS"],
+    "Situational Awareness":         ["AWARENESS"],
+    "Architecture":                  ["INFRASTRUCTURE"],
+    "Threat and Vulnerability":      ["INFRASTRUCTURE"],
+    "Legal and regulatory Framework": ["MANAGEMENT"],
+    "Workforce":                     ["MANAGEMENT"],
+    "Asset, Change, and Configuration": ["MANAGEMENT"],
+}
+# ----------------------------------------------------------------------------------
+# Definiciones de dominios y pilares (textos largos de referencia para embeddings)
+# Extraídos del archivo original sin modificación de contenido.
+# ----------------------------------------------------------------------------------
+BASE_DOMINIOS_AMPLIADOS = {
+    "Culture and Society": """
+### DOMAIN: CULTURE AND SOCIETY
+This domain encapsulates the collective set of values, beliefs, perceptions, and behavioral norms
+that determine how an institution and its stakeholders approach the protection of information assets.
+It functions as the organization's informal operating system, governing the unwritten rules of conduct
+that dictate whether official security directives are internalized as a shared responsibility or viewed
+as bureaucratic impediments. Unlike technical controls that enforce limitations, this dimension focuses
+on the willingness of human actors to adhere to safe practices even in the absence of direct supervision.
+""",
+    "Situational Awareness": """
+### DOMAIN: SITUATIONAL AWARENESS
+This domain defines the organization's dynamic capacity to perceive, synthesize, and interpret the
+status of its security environment in real-time. It bridges the semantic gap between technical anomalies
+and business context, aggregating fragmented telemetry from disparate sources to construct a unified
+Common Operating Picture. It answers: What is happening now? Who is the adversary? Which critical
+functions are implicated?
+""",
+    "Standards and Technology": """
+### DOMAIN: STANDARDS AND TECHNOLOGY
+This domain constitutes the technical realization of cybersecurity: the rigorous selection, implementation,
+and maintenance of the hardware, software, and configuration frameworks that enforce protection.
+Standards refer to externally validated frameworks (NIST CSF, ISO 27001, CIS Benchmarks).
+Technology refers to the specific operational tools deployed to execute those standards.
+""",
+    "Architecture": """
+### DOMAIN: ARCHITECTURE
+This domain defines the structural design, organization, and interconnection of an institution's digital
+ecosystem. It translates abstract security principles such as defense-in-depth, least privilege, and
+resilience into concrete, enforceable topologies. The fundamental objective is to limit the blast radius
+of a potential compromise through network segmentation, Zero Trust models, and cloud landing zones.
+""",
+    "Threat and Vulnerability": """
+### DOMAIN: THREAT AND VULNERABILITY
+This domain encapsulates the organization's dynamic capability to proactively identify, evaluate, and
+mitigate security weaknesses before they can be exploited. It governs the operational lifecycle of a flaw:
+from detection (scanning/reporting) to assessment (scoring based on exploitability and asset criticality)
+and finally to remediation or compensating controls.
+""",
+    "Program": """
+### DOMAIN: PROGRAM
+This domain refers to the strategic planning and execution of cybersecurity as a formal organizational
+program. It ensures that security initiatives are funded, staffed, sequenced, and tracked as a coherent
+portfolio of work aligned with business objectives and risk tolerance.
+""",
+    "Workforce": """
+### DOMAIN: WORKFORCE
+This domain encompasses the people dimension of cybersecurity: recruiting, retaining, and developing
+security talent; defining roles and responsibilities; and ensuring that all staff have the skills and
+authority required to execute their security functions effectively.
+""",
+    "Asset, Change and Configuration": """
+### DOMAIN: ASSET, CHANGE AND CONFIGURATION
+This domain refers to the governance and control of the organization's digital and physical assets,
+including inventory management, configuration baselines, and change control processes that prevent
+unauthorized or insecure modifications to the technology estate.
+""",
+    "Legal and Regulatory Framework": """
+### DOMAIN: LEGAL AND REGULATORY FRAMEWORK
+This domain refers to the laws, regulations, contractual obligations, and industry standards that govern
+the organization's security posture. It ensures that the organization meets its compliance obligations
+while translating external mandates into internal controls and policies.
+""",
+    "Incident Detection and Response": """
+### DOMAIN: INCIDENT DETECTION AND RESPONSE
+This domain refers to the organization's capability to detect, analyze, contain, eradicate, and recover
+from security incidents in a timely and effective manner. It encompasses the people, processes, and
+technology that form the incident lifecycle, from initial alert triage to post-incident review.
+""",
+    "Policy and Strategy": """
+### DOMAIN: POLICY AND STRATEGY
+This domain refers to the capacity of an organization to establish formal policies, standards, and a
+coherent security strategy that aligns protection investments with business objectives and risk appetite.
+It provides the governing framework within which all other security activities operate.
+""",
+    "Knowledge and Capabilities": """
+### DOMAIN: KNOWLEDGE AND CAPABILITIES
+This domain refers to the organization's institutional knowledge base and the specialized competencies
+required to execute its security strategy. It encompasses threat intelligence, security research, and
+the continuous development of skills that keep the organization ahead of the evolving threat landscape.
+""",
+    "Risk": """
+### DOMAIN: RISK
+This domain refers to the systematic process of identifying, assessing, prioritizing, and managing
+threats to the organization's information assets. It provides the analytical framework for converting
+technical vulnerabilities and threat intelligence into business impact language, enabling
+defensible resource allocation decisions.
+""",
+}
+BASE_PILARES = {
+    "AWARENESS": """
+### PILLAR: AWARENESS
+Awareness constitutes the cognitive and behavioral layer of the organization's cybersecurity posture.
+It represents the internalization of risk management into the daily heuristics of the workforce,
+transforming the human element from a potential vulnerability into a sophisticated sensor network.
+It includes security champions, phishing simulations, role-based training, and reporting mechanisms.
+""",
+    "INFRASTRUCTURE": """
+### PILLAR: INFRASTRUCTURE
+Infrastructure represents the tangible, operative reality of cybersecurity: the collection of hardware,
+software, networks, and architectural mechanisms that materially enforce protection. It encompasses
+network segmentation, endpoint detection, hardening baselines, encryption, and resilience testing.
+It ensures that Defense in Depth is an operational fact rather than a theoretical concept.
+""",
+    "MANAGEMENT": """
+### PILLAR: MANAGEMENT
+Management constitutes the executive and strategic brain of the cybersecurity ecosystem. It encompasses
+governance structures, risk registers, security budgets, policy frameworks, and executive accountability
+mechanisms that ensure security is managed as a critical business function aligned with fiduciary duties.
+""",
+}
+# ----------------------------------------------------------------------------------
+# Funciones de traducción y vectorización
+# ----------------------------------------------------------------------------------
+def _traducir_texto_largo(texto: dict) -> str:
+    """Traduce el diccionario de texto clasificado al inglés, en chunks si es necesario."""
+    translator = GoogleTranslator(source="es", target="en")
+    limite = 4_000
+    partes_traducidas = []
+    for llave, texto_original in texto.items():
+        texto_original = str(texto_original)
+        if len(texto_original) <= limite:
+            try:
+                partes_traducidas.append(translator.translate(texto_original))
+            except Exception:
+                partes_traducidas.append(texto_original)
+        else:
+            for i in range(0, len(texto_original), limite):
+                chunk = texto_original[i : i + limite]
+                try:
+                    partes_traducidas.append(translator.translate(chunk))
+                except Exception:
+                    partes_traducidas.append(chunk)
+    return " ".join(partes_traducidas).strip()
+def _calcular_similitud(
+    texto: str,
+    nombres: list,
+    definiciones: list,
+    embeddings_ref,
+    model_A,
+    model_B,
+    model_C,
+) -> dict:
+    """Calcula similitud fusionada usando MPNet + DeBERTa + BART."""
+    # MPNet (semántico)
+    emb_texto = model_A.encode(texto, convert_to_tensor=True)
+    scores_A = util.cos_sim(emb_texto, embeddings_ref)[0].cpu().numpy()
+    if scores_A.max() > scores_A.min():
+        scores_A = (scores_A - scores_A.min()) / (scores_A.max() - scores_A.min())
+    # DeBERTa (lógico / NLI)
+    pares = [[texto, d] for d in definiciones]
+    scores_B_logits = model_B.predict(pares)
+    scores_B = np.max(scores_B_logits, axis=1) if len(scores_B_logits.shape) > 1 else scores_B_logits
+    if scores_B.max() > scores_B.min():
+        scores_B = (scores_B - scores_B.min()) / (scores_B.max() - scores_B.min())
+    # BART (zero-shot contextual)
+    res_C = model_C(texto, nombres, multi_label=True)
+    mapa_C = dict(zip(res_C["labels"], res_C["scores"]))
+    scores_C = np.array([mapa_C[n] for n in nombres])
+    finales = (scores_C * W_BART) + (scores_A * W_MPNET) + (scores_B * W_NLI)
+    return {
+        nombre: {
+            "final": finales[i],
+            "bart":  scores_C[i],
+            "mpnet": scores_A[i],
+            "nli":   scores_B[i],
+        }
+        for i, nombre in enumerate(nombres)
+    }
+def _vectorizar(texto: dict) -> pd.DataFrame:
+    """
+    Vectoriza el texto clasificado y devuelve un DataFrame con scores por dominio.
+    Carga los modelos NLP bajo demanda (solo cuando se llama).
+    """
+    device = 0 if torch.cuda.is_available() else -1
+    logger.info("Cargando modelos NLP...")
+    model_A = SentenceTransformer("all-mpnet-base-v2")
+    model_B = CrossEncoder("cross-encoder/nli-deberta-v3-base")
+    model_C = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=device)
+    nombres_dominios   = list(BASE_DOMINIOS_AMPLIADOS.keys())
+    defs_dominios      = list(BASE_DOMINIOS_AMPLIADOS.values())
+    emb_dominios       = model_A.encode(defs_dominios, convert_to_tensor=True)
+    nombres_pilares    = list(BASE_PILARES.keys())
+    defs_pilares       = list(BASE_PILARES.values())
+    emb_pilares        = model_A.encode(defs_pilares, convert_to_tensor=True)
+    texto_clean = _traducir_texto_largo(texto)
+    scores_dominios = _calcular_similitud(
+        texto_clean, nombres_dominios, defs_dominios, emb_dominios,
+        model_A, model_B, model_C,
+    )
+    scores_pilares = _calcular_similitud(
+        texto_clean, nombres_pilares, defs_pilares, emb_pilares,
+        model_A, model_B, model_C,
+    )
+    scores_pilares_simple = {k: v["final"] for k, v in scores_pilares.items()}
+    P_BASE     = 0.60
+    P_HERENCIA = 0.40
+    datos_tabla = []
+    for dominio, detalle in scores_dominios.items():
+        padres = HERENCIA_AIM.get(dominio, [])
+        score_herencia = (
+            sum(scores_pilares_simple[p] for p in padres) / len(padres)
+            if padres else 0.0
+        )
+        bono = 0.10 if len(padres) == 3 else 0.0
+        score_final = (detalle["final"] * P_BASE) + (score_herencia * P_HERENCIA) + bono
+        datos_tabla.append({
+            "Categoría": dominio,
+            "Final":     score_final,
+            "BART":      detalle["bart"],
+            "MPNet":     detalle["mpnet"],
+            "NLI":       detalle["nli"],
+            "Base":      detalle["final"],
+            "Herencia":  score_herencia,
+        })
+    df = (
+        pd.DataFrame(datos_tabla)
+        .sort_values(by="Final", ascending=False)
+        .reset_index(drop=True)
+    )
+    # Calcular saltos para el criterio de corte
+    df["Salto"] = df["Final"].diff(periods=-1).fillna(0)
+    indice_corte = len(df)
+    for idx, row in df.iterrows():
+        siguiente_dominio = df.iloc[idx + 1]["Categoría"] if idx + 1 < len(df) else ""
+        score_siguiente   = df.iloc[idx + 1]["Final"]     if idx + 1 < len(df) else 0
+        if row["Final"] < SCORE_MINIMO:
+            indice_corte = idx
+            break
+        if row["Salto"] > UMBRAL_CORTE:
+            nucleo_rescatable = (
+                siguiente_dominio in DOMINIOS_NUCLEO and score_siguiente >= SCORE_MINIMO
+            )
+            if not nucleo_rescatable:
+                indice_corte = idx + 1
+                break
+    return df * 100  # Convertir a porcentajes
+def obtener_perfil(texto: dict) -> int:
+    """
+    Clasifica el texto extraído y retorna el índice del perfil (0-4).
+    Args:
+        texto: Diccionario {nombre_empresa: {MISION: [...], VISION: [...], DESCRIPCION: [...]}}.
+    Returns:
+        Entero entre 0 y 4 (índice de cluster K-Means).
+    Raises:
+        FileNotFoundError: Si no se encuentra el archivo del modelo.
+        ValueError: Si el texto es None o vacío.
+    """
+    if not texto:
+        raise ValueError("El texto de entrada está vacío o es None.")
+    if not MODEL_PATH.exists():
+        raise FileNotFoundError(
+            f"No se encontró el modelo en: {MODEL_PATH}\n"
+            "Asegúrate de que 'Modelo_Pymes.pkl' esté en la carpeta raíz del proyecto."
+        )
+    vector = _vectorizar(texto)
+    kmeans = joblib.load(MODEL_PATH)
+    perfil = kmeans.predict(vector.iloc[:, 1].values.reshape(1, -1))
+    return int(perfil[0])

logic/venn.py ADDED Viewed

	@@ -0,0 +1,117 @@

+"""
+Módulo de generación del diagrama de Venn para la Tríada AIM.
+Genera imágenes base64 a partir de matplotlib + matplotlib_venn.
+"""
+import io
+import base64
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+from matplotlib_venn import venn3
+from data.definitions import (
+    COLORES_BASE,
+    VENN_SECTIONS,
+    VENN_ID_TO_SUBCATEGORIA,
+)
+VENN_IDS = ["100", "010", "001", "110", "101", "011", "111"]
+def generar_venn_base(foco=None, seleccionados_usuario=None) -> str:
+    """
+    Genera el diagrama de Venn y retorna la imagen codificada en base64.
+    Args:
+        foco: Modo de visualización. Opciones:
+              "Reporte"  → muestra fortalezas/debilidades por perfil.
+              "Completo" → muestra todos los dominios con color.
+              "Simple"   → solo color, sin texto.
+              str        → destaca únicamente la zona con ese nombre (Core, Bridge, ...).
+        seleccionados_usuario: Lista de dominios activos (solo relevante en modo "Reporte").
+    Returns:
+        String base64 PNG de la imagen generada.
+    """
+    if seleccionados_usuario is None:
+        seleccionados_usuario = []
+    plt.figure(figsize=(11, 11))
+    venn = venn3(
+        subsets=(3, 2, 3, 1, 1, 1, 3),
+        set_labels=("Concientizacion", "Infraestructura", "Gestion"),
+    )
+    # Limpiar etiquetas por defecto
+    for vid in VENN_IDS:
+        label = venn.get_label_by_id(vid)
+        if label:
+            label.set_text("")
+    for vid in VENN_IDS:
+        patch = venn.get_patch_by_id(vid)
+        if not patch:
+            continue
+        section_name = VENN_SECTIONS[vid]
+        dominios_zona = VENN_ID_TO_SUBCATEGORIA[vid]
+        if foco == "Reporte":
+            hay_seleccion = any(d in seleccionados_usuario for d in dominios_zona)
+            color = COLORES_BASE[section_name] if hay_seleccion else COLORES_BASE["Desactivado"]
+            alpha = 0.8 if hay_seleccion else 0.3
+            texto_etiqueta = [
+                f"✔ {d}" if d in seleccionados_usuario else f"✗ {d}"
+                for d in dominios_zona
+            ]
+            label = venn.get_label_by_id(vid)
+            if label:
+                label.set_text("\n".join(texto_etiqueta))
+                label.set_fontsize(8)
+        elif foco == "Completo":
+            color = COLORES_BASE[section_name]
+            alpha = 0.8
+            label = venn.get_label_by_id(vid)
+            if label:
+                label.set_text("\n".join(dominios_zona))
+        elif foco == "Simple":
+            color = COLORES_BASE[section_name]
+            alpha = 0.8
+        else:
+            # Foco específico: destaca solo la zona indicada
+            es_foco = section_name == foco
+            color = COLORES_BASE[section_name] if es_foco else COLORES_BASE["Desactivado"]
+            alpha = 0.9 if es_foco else 0.3
+            if es_foco:
+                label = venn.get_label_by_id(vid)
+                if label:
+                    label.set_text("\n".join(dominios_zona))
+        patch.set_facecolor(color)
+        patch.set_edgecolor("black")
+        patch.set_linewidth(1.5)
+        patch.set_alpha(alpha)
+    if foco == "Reporte":
+        titulo = "Tríada AIM — Perfil de cobertura"
+    elif foco in ("Completo", "Simple"):
+        titulo = "Tríada AIM"
+    else:
+        titulo = f"Zona: {foco.upper()}" if foco else "Tríada AIM"
+    plt.title(titulo, fontsize=13, fontweight="bold")
+    buf = io.BytesIO()
+    plt.savefig(buf, format="png", bbox_inches="tight", dpi=150)
+    plt.close()
+    buf.seek(0)
+    return base64.b64encode(buf.getvalue()).decode("utf-8")
+# Pre-renderizado del Venn completo para la pantalla de inicio (se genera una sola vez)
+VENN_IMG_COMPLETO = f"data:image/png;base64,{generar_venn_base('Completo')}"

render.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+services:
+  - type: web
+    name: aim-dashboard
+    runtime: python
+    buildCommand: pip install -r requirements.txt
+    startCommand: gunicorn app:server
+    envVars:
+      - key: PYTHON_VERSION
+        value: 3.13.2

requirements.txt ADDED Viewed

File without changes