Spaces:
Sleeping
Sleeping
Kesheratmex
commited on
Commit
·
199293f
1
Parent(s):
5d8f144
**Add GPT‑4 Vision support with detailed prompt and fallback**
Browse filesImplement GPT‑4 Vision (or Qwen2‑VL) image analysis for turbine blade inspection, using a Spanish prompt that compares YOLO detections and evaluates visual defects. Add a fallback technical analysis when vision models are unavailable. Include a new `setup_qwen_vision.md` file for Qwen configuration.
- app.py +125 -53
- gptoss_wrapper.py +206 -0
- setup_qwen_vision.md +111 -0
app.py
CHANGED
|
@@ -242,7 +242,7 @@ def _extract_path(d):
|
|
| 242 |
|
| 243 |
def analyze_image_with_gpt(image_path, detections_summary=""):
|
| 244 |
"""
|
| 245 |
-
Analiza una imagen directamente con GPT para obtener observaciones
|
| 246 |
que el modelo YOLO podría haber perdido.
|
| 247 |
"""
|
| 248 |
try:
|
|
@@ -250,84 +250,156 @@ def analyze_image_with_gpt(image_path, detections_summary=""):
|
|
| 250 |
if not GPTClass:
|
| 251 |
return "Análisis de IA no disponible (GPT wrapper no configurado)"
|
| 252 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 253 |
# Obtener características visuales básicas de la imagen
|
| 254 |
visual_features = compute_visual_features(image_path, [])
|
| 255 |
|
| 256 |
-
# Construir descripción
|
| 257 |
-
|
| 258 |
if visual_features:
|
| 259 |
brightness = visual_features.get("brightness", 0)
|
| 260 |
contrast = visual_features.get("contrast", 0)
|
| 261 |
blur = visual_features.get("blur", 0)
|
| 262 |
dominant_rgb = visual_features.get("dominant_rgb", [])
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 269 |
if dominant_rgb:
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
|
| 278 |
DETECCIONES AUTOMÁTICAS DEL MODELO YOLO:
|
| 279 |
{detections_summary if detections_summary else "No se detectaron defectos automáticamente"}
|
| 280 |
|
| 281 |
-
|
| 282 |
-
1. Describe lo que observas en la superficie de la pala (color, textura, condiciones generales)
|
| 283 |
-
2. Identifica cualquier anomalía, defecto o área de preocupación que puedas ver visualmente
|
| 284 |
-
3. Menciona específicamente si observas algo que el modelo automático YOLO podría haber perdido
|
| 285 |
-
4. Evalúa el estado general de la pala (excelente, bueno, regular, malo, crítico)
|
| 286 |
-
5. Proporciona recomendaciones específicas de mantenimiento
|
| 287 |
-
|
| 288 |
-
ÁREAS ESPECÍFICAS A REVISAR:
|
| 289 |
-
- Borde de ataque (leading edge)
|
| 290 |
-
- Borde de salida (trailing edge)
|
| 291 |
-
- Superficie de la pala
|
| 292 |
-
- Uniones y conexiones
|
| 293 |
-
- Grietas, erosión, decoloración
|
| 294 |
-
- Daños por rayos, impactos de aves
|
| 295 |
-
- Acumulación de suciedad o hielo
|
| 296 |
|
| 297 |
-
|
| 298 |
-
- Responde SOLO en español
|
| 299 |
-
- Sé específico sobre ubicaciones y tipos de defectos
|
| 300 |
-
- Si no ves defectos obvios, menciona las características positivas
|
| 301 |
-
- Compara tus observaciones con las detecciones automáticas
|
| 302 |
|
| 303 |
Formato de respuesta:
|
| 304 |
-
## 🔍 Análisis
|
| 305 |
-
|
| 306 |
-
**Estado General:** [tu evaluación del estado]
|
| 307 |
|
| 308 |
-
**
|
| 309 |
-
[describe lo que ves en la superficie, colores, texturas]
|
| 310 |
|
| 311 |
-
**
|
| 312 |
-
[
|
| 313 |
|
| 314 |
-
**
|
| 315 |
-
[
|
| 316 |
|
| 317 |
**Recomendaciones:**
|
| 318 |
[acciones específicas recomendadas]
|
| 319 |
"""
|
| 320 |
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
# Generar análisis
|
| 325 |
-
analysis = wrapper.generate(prompt, max_tokens=1000, temperature=0.2)
|
| 326 |
-
|
| 327 |
-
return analysis
|
| 328 |
|
| 329 |
except Exception as e:
|
| 330 |
-
return f"Error en
|
| 331 |
|
| 332 |
|
| 333 |
|
|
|
|
| 242 |
|
| 243 |
def analyze_image_with_gpt(image_path, detections_summary=""):
|
| 244 |
"""
|
| 245 |
+
Analiza una imagen directamente con GPT-4 Vision para obtener observaciones visuales
|
| 246 |
que el modelo YOLO podría haber perdido.
|
| 247 |
"""
|
| 248 |
try:
|
|
|
|
| 250 |
if not GPTClass:
|
| 251 |
return "Análisis de IA no disponible (GPT wrapper no configurado)"
|
| 252 |
|
| 253 |
+
# Construir prompt en español para análisis visual directo con GPT-4 Vision
|
| 254 |
+
prompt = f"""Eres un experto en inspección de palas de aerogeneradores. Analiza visualmente esta imagen de una pala de aerogenerador y proporciona un análisis detallado en español.
|
| 255 |
+
|
| 256 |
+
DETECCIONES AUTOMÁTICAS DEL MODELO YOLO:
|
| 257 |
+
{detections_summary if detections_summary else "No se detectaron defectos automáticamente"}
|
| 258 |
+
|
| 259 |
+
INSTRUCCIONES PARA TU ANÁLISIS VISUAL:
|
| 260 |
+
Observa cuidadosamente la imagen y describe:
|
| 261 |
+
|
| 262 |
+
1. **Condición general de la superficie**: Color, textura, acabado, limpieza
|
| 263 |
+
2. **Borde de ataque (leading edge)**: Estado, erosión, daños, desgaste
|
| 264 |
+
3. **Borde de salida (trailing edge)**: Integridad, grietas, deformaciones
|
| 265 |
+
4. **Superficie principal**: Grietas, decoloración, impactos, reparaciones previas
|
| 266 |
+
5. **Elementos estructurales**: Uniones, tornillos, conexiones visibles
|
| 267 |
+
6. **Contaminación**: Suciedad, hielo, vegetación, residuos
|
| 268 |
+
7. **Daños específicos**: Impactos de rayos, aves, granizo, desgaste UV
|
| 269 |
+
|
| 270 |
+
COMPARACIÓN CON DETECCIONES AUTOMÁTICAS:
|
| 271 |
+
- Confirma o refuta las detecciones del modelo YOLO
|
| 272 |
+
- Identifica defectos que YOLO pudo haber perdido
|
| 273 |
+
- Evalúa la severidad de los defectos detectados
|
| 274 |
+
|
| 275 |
+
CONTEXTO DE DEFECTOS COMUNES:
|
| 276 |
+
- **Dirt/Suciedad**: Acumulación que reduce eficiencia aerodinámica
|
| 277 |
+
- **Erosion**: Desgaste del borde de ataque por partículas
|
| 278 |
+
- **Cracks/Grietas**: Fisuras estructurales críticas
|
| 279 |
+
- **Lightning damage**: Daños por descargas eléctricas
|
| 280 |
+
- **Ice**: Formación de hielo estacional
|
| 281 |
+
- **Bird strikes**: Impactos de aves
|
| 282 |
+
- **UV degradation**: Decoloración por radiación solar
|
| 283 |
+
|
| 284 |
+
IMPORTANTE:
|
| 285 |
+
- Responde SOLO en español
|
| 286 |
+
- Describe específicamente lo que VES en la imagen
|
| 287 |
+
- Sé preciso sobre ubicaciones (izquierda, derecha, centro, bordes)
|
| 288 |
+
- Menciona colores, texturas, patrones específicos
|
| 289 |
+
- Evalúa la severidad de cada problema observado
|
| 290 |
+
|
| 291 |
+
Formato de respuesta:
|
| 292 |
+
## 🔍 Análisis Visual Directo de la Pala
|
| 293 |
+
|
| 294 |
+
**Estado General:** [tu evaluación visual del estado]
|
| 295 |
+
|
| 296 |
+
**Observaciones Específicas:**
|
| 297 |
+
[describe detalladamente lo que ves en cada área]
|
| 298 |
+
|
| 299 |
+
**Defectos Identificados Visualmente:**
|
| 300 |
+
[lista específica de problemas que observas]
|
| 301 |
+
|
| 302 |
+
**Comparación con Detección Automática:**
|
| 303 |
+
[confirma/refuta/complementa las detecciones YOLO]
|
| 304 |
+
|
| 305 |
+
**Severidad y Prioridades:**
|
| 306 |
+
[evalúa qué problemas son más críticos]
|
| 307 |
+
|
| 308 |
+
**Recomendaciones de Mantenimiento:**
|
| 309 |
+
[acciones específicas basadas en lo observado]
|
| 310 |
+
"""
|
| 311 |
+
|
| 312 |
+
# Configurar modelo de visión
|
| 313 |
+
vision_model_id = os.getenv("VISION_MODEL_ID", "Qwen/Qwen2-VL-7B-Instruct")
|
| 314 |
+
model_id = os.getenv("MODEL_ID", vision_model_id)
|
| 315 |
+
wrapper = GPTClass(model=model_id)
|
| 316 |
+
|
| 317 |
+
# Intentar usar análisis de imágenes (GPT-4 Vision o Qwen2-VL)
|
| 318 |
+
try:
|
| 319 |
+
print(f"DEBUG: Intentando análisis de imagen con modelo: {model_id}")
|
| 320 |
+
analysis = wrapper.analyze_image(image_path, prompt, max_tokens=1200, temperature=0.2)
|
| 321 |
+
return analysis
|
| 322 |
+
except RuntimeError as vision_error:
|
| 323 |
+
# Si el análisis de visión no está disponible, usar análisis basado en características
|
| 324 |
+
print(f"DEBUG: Análisis de visión no disponible: {vision_error}")
|
| 325 |
+
return _fallback_technical_analysis(image_path, detections_summary, wrapper)
|
| 326 |
+
|
| 327 |
+
except Exception as e:
|
| 328 |
+
return f"Error en el análisis de IA: {str(e)}"
|
| 329 |
+
|
| 330 |
+
def _fallback_technical_analysis(image_path, detections_summary, wrapper):
|
| 331 |
+
"""
|
| 332 |
+
Análisis de respaldo basado en características técnicas cuando GPT-4 Vision no está disponible.
|
| 333 |
+
"""
|
| 334 |
+
try:
|
| 335 |
# Obtener características visuales básicas de la imagen
|
| 336 |
visual_features = compute_visual_features(image_path, [])
|
| 337 |
|
| 338 |
+
# Construir descripción técnica detallada
|
| 339 |
+
technical_desc = "Análisis basado en características técnicas de la imagen:\n"
|
| 340 |
if visual_features:
|
| 341 |
brightness = visual_features.get("brightness", 0)
|
| 342 |
contrast = visual_features.get("contrast", 0)
|
| 343 |
blur = visual_features.get("blur", 0)
|
| 344 |
dominant_rgb = visual_features.get("dominant_rgb", [])
|
| 345 |
+
width = visual_features.get("width", 0)
|
| 346 |
+
height = visual_features.get("height", 0)
|
| 347 |
+
|
| 348 |
+
technical_desc += f"- Resolución: {width}x{height} píxeles\n"
|
| 349 |
+
technical_desc += f"- Brillo promedio: {brightness:.1f}/255 "
|
| 350 |
+
technical_desc += ("(imagen brillante)" if brightness > 130 else "(imagen tenue)" if brightness < 80 else "(iluminación normal)")
|
| 351 |
+
technical_desc += f"\n- Contraste: {contrast:.1f} "
|
| 352 |
+
technical_desc += ("(alto contraste)" if contrast > 60 else "(bajo contraste)" if contrast < 30 else "(contraste normal)")
|
| 353 |
+
technical_desc += f"\n- Nitidez: {blur:.1f} "
|
| 354 |
+
technical_desc += ("(imagen nítida)" if blur > 100 else "(imagen borrosa)")
|
| 355 |
if dominant_rgb:
|
| 356 |
+
technical_desc += f"\n- Color dominante: RGB{dominant_rgb}"
|
| 357 |
+
|
| 358 |
+
# Interpretar colores dominantes
|
| 359 |
+
r, g, b = dominant_rgb
|
| 360 |
+
if r > 150 and g > 150 and b > 150:
|
| 361 |
+
technical_desc += " (tonos claros/blancos - superficie limpia)"
|
| 362 |
+
elif r < 100 and g < 100 and b < 100:
|
| 363 |
+
technical_desc += " (tonos oscuros - posible suciedad o sombras)"
|
| 364 |
+
elif r > g and r > b:
|
| 365 |
+
technical_desc += " (tonos rojizos - posible oxidación)"
|
| 366 |
+
elif g > r and g > b:
|
| 367 |
+
technical_desc += " (tonos verdosos - posible vegetación/algas)"
|
| 368 |
+
elif b > r and b > g:
|
| 369 |
+
technical_desc += " (tonos azulados - superficie normal)"
|
| 370 |
+
|
| 371 |
+
# Prompt modificado para análisis técnico
|
| 372 |
+
fallback_prompt = f"""Eres un experto en inspección de palas de aerogeneradores. Basándote en los datos técnicos de la imagen y las detecciones automáticas, proporciona un análisis detallado en español.
|
| 373 |
+
|
| 374 |
+
{technical_desc}
|
| 375 |
|
| 376 |
DETECCIONES AUTOMÁTICAS DEL MODELO YOLO:
|
| 377 |
{detections_summary if detections_summary else "No se detectaron defectos automáticamente"}
|
| 378 |
|
| 379 |
+
NOTA: Este análisis se basa en características técnicas extraídas de la imagen ya que el análisis visual directo no está disponible.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 380 |
|
| 381 |
+
Proporciona un análisis experto interpretando estos datos técnicos en el contexto de inspección de palas de aerogeneradores.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 382 |
|
| 383 |
Formato de respuesta:
|
| 384 |
+
## 🔍 Análisis Técnico de la Pala
|
|
|
|
|
|
|
| 385 |
|
| 386 |
+
**Estado General:** [evaluación basada en datos técnicos]
|
|
|
|
| 387 |
|
| 388 |
+
**Interpretación de Características:**
|
| 389 |
+
[qué indican los valores técnicos sobre la condición]
|
| 390 |
|
| 391 |
+
**Análisis de Detecciones:**
|
| 392 |
+
[interpretación de cada defecto detectado por YOLO]
|
| 393 |
|
| 394 |
**Recomendaciones:**
|
| 395 |
[acciones específicas recomendadas]
|
| 396 |
"""
|
| 397 |
|
| 398 |
+
analysis = wrapper.generate(fallback_prompt, max_tokens=800, temperature=0.3)
|
| 399 |
+
return f"⚠️ **Análisis técnico** (análisis visual directo no disponible)\n\n{analysis}"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 400 |
|
| 401 |
except Exception as e:
|
| 402 |
+
return f"Error en an��lisis de respaldo: {str(e)}"
|
| 403 |
|
| 404 |
|
| 405 |
|
gptoss_wrapper.py
CHANGED
|
@@ -22,6 +22,7 @@ This file intentionally uses only the requests stdlib-friendly HTTP approach to
|
|
| 22 |
import os
|
| 23 |
import time
|
| 24 |
import requests
|
|
|
|
| 25 |
from typing import Optional
|
| 26 |
|
| 27 |
|
|
@@ -89,6 +90,29 @@ class GPTOSSWrapper:
|
|
| 89 |
"No API key configured for GPT wrapper. Set OPENAI_API_KEY or HUGGINGFACE_API_TOKEN in the environment."
|
| 90 |
)
|
| 91 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
def _generate_openai(self, prompt: str, max_tokens: int, temperature: float) -> str:
|
| 93 |
if not self.openai_key:
|
| 94 |
raise RuntimeError("OPENAI_API_KEY not set in environment.")
|
|
@@ -209,6 +233,188 @@ class GPTOSSWrapper:
|
|
| 209 |
except Exception as e:
|
| 210 |
raise RuntimeError(f"Hugging Face API call failed: {e}")
|
| 211 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 212 |
|
| 213 |
# Backwards-compatible factory in case caller expects a function or attribute
|
| 214 |
def GPTOSSWrapperFactory(model: Optional[str] = None, provider: Optional[str] = None):
|
|
|
|
| 22 |
import os
|
| 23 |
import time
|
| 24 |
import requests
|
| 25 |
+
import base64
|
| 26 |
from typing import Optional
|
| 27 |
|
| 28 |
|
|
|
|
| 90 |
"No API key configured for GPT wrapper. Set OPENAI_API_KEY or HUGGINGFACE_API_TOKEN in the environment."
|
| 91 |
)
|
| 92 |
|
| 93 |
+
def analyze_image(self, image_path: str, prompt: str, max_tokens: int = 512, temperature: float = 0.2) -> str:
|
| 94 |
+
"""
|
| 95 |
+
Analyze an image using vision models (OpenAI GPT-4 Vision or Hugging Face Qwen2-VL).
|
| 96 |
+
|
| 97 |
+
Args:
|
| 98 |
+
image_path: Path to the image file
|
| 99 |
+
prompt: Text prompt for analysis
|
| 100 |
+
max_tokens: Maximum tokens in response
|
| 101 |
+
temperature: Temperature for generation
|
| 102 |
+
|
| 103 |
+
Returns:
|
| 104 |
+
Analysis text from vision model
|
| 105 |
+
|
| 106 |
+
Raises:
|
| 107 |
+
RuntimeError if no vision model is available or if the call fails
|
| 108 |
+
"""
|
| 109 |
+
if self.provider == "openai":
|
| 110 |
+
return self._analyze_image_openai(image_path, prompt, max_tokens, temperature)
|
| 111 |
+
elif self.provider == "hf":
|
| 112 |
+
return self._analyze_image_hf(image_path, prompt, max_tokens, temperature)
|
| 113 |
+
else:
|
| 114 |
+
raise RuntimeError("Image analysis requires either OpenAI API key or Hugging Face token. Set OPENAI_API_KEY or HUGGINGFACE_API_TOKEN.")
|
| 115 |
+
|
| 116 |
def _generate_openai(self, prompt: str, max_tokens: int, temperature: float) -> str:
|
| 117 |
if not self.openai_key:
|
| 118 |
raise RuntimeError("OPENAI_API_KEY not set in environment.")
|
|
|
|
| 233 |
except Exception as e:
|
| 234 |
raise RuntimeError(f"Hugging Face API call failed: {e}")
|
| 235 |
|
| 236 |
+
def _analyze_image_openai(self, image_path: str, prompt: str, max_tokens: int, temperature: float) -> str:
|
| 237 |
+
"""
|
| 238 |
+
Analyze an image using OpenAI GPT-4 Vision API.
|
| 239 |
+
"""
|
| 240 |
+
if not self.openai_key:
|
| 241 |
+
raise RuntimeError("OPENAI_API_KEY not set in environment.")
|
| 242 |
+
|
| 243 |
+
# Encode image to base64
|
| 244 |
+
try:
|
| 245 |
+
with open(image_path, "rb") as image_file:
|
| 246 |
+
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
|
| 247 |
+
except Exception as e:
|
| 248 |
+
raise RuntimeError(f"Failed to read image file {image_path}: {e}")
|
| 249 |
+
|
| 250 |
+
url = "https://api.openai.com/v1/chat/completions"
|
| 251 |
+
headers = {
|
| 252 |
+
"Authorization": f"Bearer {self.openai_key}",
|
| 253 |
+
"Content-Type": "application/json",
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
# Use GPT-4 Vision model
|
| 257 |
+
vision_model = "gpt-4-vision-preview"
|
| 258 |
+
|
| 259 |
+
# Build payload for vision API
|
| 260 |
+
payload = {
|
| 261 |
+
"model": vision_model,
|
| 262 |
+
"messages": [
|
| 263 |
+
{
|
| 264 |
+
"role": "system",
|
| 265 |
+
"content": "You are an expert inspection assistant for wind turbine blade images/videos. Analyze images in detail and provide comprehensive assessments in Spanish."
|
| 266 |
+
},
|
| 267 |
+
{
|
| 268 |
+
"role": "user",
|
| 269 |
+
"content": [
|
| 270 |
+
{
|
| 271 |
+
"type": "text",
|
| 272 |
+
"text": prompt
|
| 273 |
+
},
|
| 274 |
+
{
|
| 275 |
+
"type": "image_url",
|
| 276 |
+
"image_url": {
|
| 277 |
+
"url": f"data:image/jpeg;base64,{base64_image}",
|
| 278 |
+
"detail": "high"
|
| 279 |
+
}
|
| 280 |
+
}
|
| 281 |
+
]
|
| 282 |
+
}
|
| 283 |
+
],
|
| 284 |
+
"max_tokens": max_tokens,
|
| 285 |
+
"temperature": float(temperature),
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
try:
|
| 289 |
+
r = requests.post(url, headers=headers, json=payload, timeout=60) # Longer timeout for vision
|
| 290 |
+
r.raise_for_status()
|
| 291 |
+
data = r.json()
|
| 292 |
+
|
| 293 |
+
choices = data.get("choices", [])
|
| 294 |
+
if not choices:
|
| 295 |
+
raise RuntimeError(f"OpenAI Vision returned empty choices: {data}")
|
| 296 |
+
|
| 297 |
+
msg = choices[0].get("message", {}).get("content")
|
| 298 |
+
if msg is None:
|
| 299 |
+
return str(data)
|
| 300 |
+
return msg.strip()
|
| 301 |
+
|
| 302 |
+
except Exception as e:
|
| 303 |
+
raise RuntimeError(f"OpenAI Vision API call failed: {e}")
|
| 304 |
+
|
| 305 |
+
def _analyze_image_hf(self, image_path: str, prompt: str, max_tokens: int, temperature: float) -> str:
|
| 306 |
+
"""
|
| 307 |
+
Analyze an image using Hugging Face vision models (like Qwen2-VL).
|
| 308 |
+
"""
|
| 309 |
+
if not self.hf_token:
|
| 310 |
+
raise RuntimeError("HUGGINGFACE_API_TOKEN not set in environment.")
|
| 311 |
+
|
| 312 |
+
# Encode image to base64
|
| 313 |
+
try:
|
| 314 |
+
with open(image_path, "rb") as image_file:
|
| 315 |
+
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
|
| 316 |
+
except Exception as e:
|
| 317 |
+
raise RuntimeError(f"Failed to read image file {image_path}: {e}")
|
| 318 |
+
|
| 319 |
+
# Use Qwen2-VL model for vision analysis
|
| 320 |
+
vision_model = os.getenv("VISION_MODEL_ID", "Qwen/Qwen2-VL-7B-Instruct")
|
| 321 |
+
|
| 322 |
+
# Check if we should use the router
|
| 323 |
+
use_router = False
|
| 324 |
+
if self.hf_token:
|
| 325 |
+
hf_use_router_val = os.getenv("HF_USE_ROUTER", "").lower()
|
| 326 |
+
if hf_use_router_val not in ("0", "false", "no"):
|
| 327 |
+
use_router = True
|
| 328 |
+
|
| 329 |
+
try:
|
| 330 |
+
if use_router:
|
| 331 |
+
# Router endpoint for vision models
|
| 332 |
+
url = "https://router.huggingface.co/v1/chat/completions"
|
| 333 |
+
headers = {"Authorization": f"Bearer {self.hf_token}", "Content-Type": "application/json"}
|
| 334 |
+
|
| 335 |
+
payload = {
|
| 336 |
+
"model": vision_model,
|
| 337 |
+
"messages": [
|
| 338 |
+
{
|
| 339 |
+
"role": "system",
|
| 340 |
+
"content": "You are an expert inspection assistant for wind turbine blade images/videos. Analyze images in detail and provide comprehensive assessments in Spanish."
|
| 341 |
+
},
|
| 342 |
+
{
|
| 343 |
+
"role": "user",
|
| 344 |
+
"content": [
|
| 345 |
+
{
|
| 346 |
+
"type": "text",
|
| 347 |
+
"text": prompt
|
| 348 |
+
},
|
| 349 |
+
{
|
| 350 |
+
"type": "image_url",
|
| 351 |
+
"image_url": {
|
| 352 |
+
"url": f"data:image/jpeg;base64,{base64_image}"
|
| 353 |
+
}
|
| 354 |
+
}
|
| 355 |
+
]
|
| 356 |
+
}
|
| 357 |
+
],
|
| 358 |
+
"max_tokens": max_tokens,
|
| 359 |
+
"temperature": float(temperature),
|
| 360 |
+
}
|
| 361 |
+
|
| 362 |
+
r = requests.post(url, headers=headers, json=payload, timeout=120)
|
| 363 |
+
r.raise_for_status()
|
| 364 |
+
data = r.json()
|
| 365 |
+
|
| 366 |
+
choices = data.get("choices", [])
|
| 367 |
+
if choices and isinstance(choices, list):
|
| 368 |
+
first = choices[0]
|
| 369 |
+
msg = first.get("message", {}).get("content") if isinstance(first, dict) else None
|
| 370 |
+
if not msg:
|
| 371 |
+
msg = first.get("text") or first.get("content")
|
| 372 |
+
if msg:
|
| 373 |
+
return msg.strip()
|
| 374 |
+
return str(data)
|
| 375 |
+
|
| 376 |
+
else:
|
| 377 |
+
# Direct Hugging Face Inference API for vision models
|
| 378 |
+
url = f"https://api-inference.huggingface.co/models/{vision_model}"
|
| 379 |
+
headers = {"Authorization": f"Bearer {self.hf_token}"}
|
| 380 |
+
|
| 381 |
+
# For vision models, we need to send both text and image
|
| 382 |
+
payload = {
|
| 383 |
+
"inputs": {
|
| 384 |
+
"text": prompt,
|
| 385 |
+
"image": base64_image
|
| 386 |
+
},
|
| 387 |
+
"parameters": {
|
| 388 |
+
"max_new_tokens": max_tokens,
|
| 389 |
+
"temperature": float(temperature),
|
| 390 |
+
},
|
| 391 |
+
"options": {"wait_for_model": True},
|
| 392 |
+
}
|
| 393 |
+
|
| 394 |
+
r = requests.post(url, headers=headers, json=payload, timeout=120)
|
| 395 |
+
r.raise_for_status()
|
| 396 |
+
data = r.json()
|
| 397 |
+
|
| 398 |
+
# Handle different response formats
|
| 399 |
+
if isinstance(data, list) and len(data) > 0:
|
| 400 |
+
if isinstance(data[0], dict):
|
| 401 |
+
if "generated_text" in data[0]:
|
| 402 |
+
return data[0]["generated_text"].strip()
|
| 403 |
+
elif "text" in data[0]:
|
| 404 |
+
return data[0]["text"].strip()
|
| 405 |
+
elif isinstance(data, dict):
|
| 406 |
+
if "generated_text" in data:
|
| 407 |
+
return data["generated_text"].strip()
|
| 408 |
+
elif "text" in data:
|
| 409 |
+
return data["text"].strip()
|
| 410 |
+
elif "error" in data:
|
| 411 |
+
raise RuntimeError(f"Hugging Face error: {data['error']}")
|
| 412 |
+
|
| 413 |
+
return str(data)
|
| 414 |
+
|
| 415 |
+
except Exception as e:
|
| 416 |
+
raise RuntimeError(f"Hugging Face Vision API call failed: {e}")
|
| 417 |
+
|
| 418 |
|
| 419 |
# Backwards-compatible factory in case caller expects a function or attribute
|
| 420 |
def GPTOSSWrapperFactory(model: Optional[str] = None, provider: Optional[str] = None):
|
setup_qwen_vision.md
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Configuración de Qwen2-VL para Análisis de Imágenes
|
| 2 |
+
|
| 3 |
+
## 🎯 Qwen2-VL: Modelo de Visión Gratuito
|
| 4 |
+
|
| 5 |
+
Qwen2-VL es un modelo de visión gratuito y potente que puede analizar imágenes directamente. Es una excelente alternativa a GPT-4 Vision.
|
| 6 |
+
|
| 7 |
+
## 📋 Configuración Rápida
|
| 8 |
+
|
| 9 |
+
### 1. Obtener Token de Hugging Face (GRATIS)
|
| 10 |
+
|
| 11 |
+
1. Ve a [huggingface.co](https://huggingface.co)
|
| 12 |
+
2. Crea una cuenta gratuita
|
| 13 |
+
3. Ve a Settings → Access Tokens
|
| 14 |
+
4. Crea un nuevo token con permisos de lectura
|
| 15 |
+
5. Copia el token
|
| 16 |
+
|
| 17 |
+
### 2. Configurar Variables de Entorno
|
| 18 |
+
|
| 19 |
+
```bash
|
| 20 |
+
# Windows (PowerShell)
|
| 21 |
+
$env:HUGGINGFACE_API_TOKEN = "hf_tu_token_aqui"
|
| 22 |
+
$env:VISION_MODEL_ID = "Qwen/Qwen2-VL-7B-Instruct"
|
| 23 |
+
$env:HF_USE_ROUTER = "true"
|
| 24 |
+
|
| 25 |
+
# Linux/Mac
|
| 26 |
+
export HUGGINGFACE_API_TOKEN="hf_tu_token_aqui"
|
| 27 |
+
export VISION_MODEL_ID="Qwen/Qwen2-VL-7B-Instruct"
|
| 28 |
+
export HF_USE_ROUTER="true"
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
### 3. Modelos Disponibles
|
| 32 |
+
|
| 33 |
+
**Qwen2-VL (Recomendado):**
|
| 34 |
+
- `Qwen/Qwen2-VL-7B-Instruct` - Modelo principal
|
| 35 |
+
- `Qwen/Qwen2-VL-2B-Instruct` - Versión más ligera
|
| 36 |
+
|
| 37 |
+
**Otros modelos de visión gratuitos:**
|
| 38 |
+
- `microsoft/kosmos-2-patch14-224`
|
| 39 |
+
- `Salesforce/blip2-opt-2.7b`
|
| 40 |
+
- `llava-hf/llava-1.5-7b-hf`
|
| 41 |
+
|
| 42 |
+
## 🚀 Uso
|
| 43 |
+
|
| 44 |
+
Una vez configurado, la aplicación automáticamente:
|
| 45 |
+
|
| 46 |
+
1. **Detectará** que tienes Hugging Face configurado
|
| 47 |
+
2. **Usará Qwen2-VL** para análisis visual directo
|
| 48 |
+
3. **Proporcionará** análisis detallado en español
|
| 49 |
+
|
| 50 |
+
## 🔍 Capacidades de Qwen2-VL
|
| 51 |
+
|
| 52 |
+
- ✅ Análisis visual directo de imágenes
|
| 53 |
+
- ✅ Detección de defectos y anomalías
|
| 54 |
+
- ✅ Descripción detallada de superficies
|
| 55 |
+
- ✅ Comparación con detecciones YOLO
|
| 56 |
+
- ✅ Recomendaciones de mantenimiento
|
| 57 |
+
- ✅ Respuestas en español
|
| 58 |
+
|
| 59 |
+
## 🛠️ Solución de Problemas
|
| 60 |
+
|
| 61 |
+
### Error: "Model loading"
|
| 62 |
+
```bash
|
| 63 |
+
# Espera unos minutos, el modelo se está cargando por primera vez
|
| 64 |
+
# Los modelos de HF pueden tardar en "despertar"
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
### Error: "Token inválido"
|
| 68 |
+
```bash
|
| 69 |
+
# Verifica que el token sea correcto
|
| 70 |
+
echo $HUGGINGFACE_API_TOKEN
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### Usar modelo alternativo
|
| 74 |
+
```bash
|
| 75 |
+
# Si Qwen2-VL no funciona, prueba:
|
| 76 |
+
$env:VISION_MODEL_ID = "llava-hf/llava-1.5-7b-hf"
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
## 📊 Comparación
|
| 80 |
+
|
| 81 |
+
| Modelo | Costo | Calidad | Velocidad | Configuración |
|
| 82 |
+
|--------|-------|---------|-----------|---------------|
|
| 83 |
+
| GPT-4 Vision | 💰 Pago | 🌟🌟🌟🌟🌟 | 🚀🚀🚀 | Fácil |
|
| 84 |
+
| Qwen2-VL | 🆓 Gratis | 🌟🌟🌟🌟 | 🚀🚀 | Fácil |
|
| 85 |
+
| Análisis técnico | 🆓 Gratis | 🌟🌟 | 🚀🚀🚀 | Automático |
|
| 86 |
+
|
| 87 |
+
## 🎯 Resultado Esperado
|
| 88 |
+
|
| 89 |
+
Con Qwen2-VL configurado, obtendrás análisis como:
|
| 90 |
+
|
| 91 |
+
```markdown
|
| 92 |
+
## 🔍 Análisis Visual Directo de la Pala
|
| 93 |
+
|
| 94 |
+
**Estado General:** Bueno con mantenimiento menor requerido
|
| 95 |
+
|
| 96 |
+
**Observaciones Específicas:**
|
| 97 |
+
- Superficie: Color gris uniforme, acabado mate normal
|
| 98 |
+
- Borde de ataque: Erosión leve visible en zona superior
|
| 99 |
+
- Suciedad: Dos áreas de acumulación claramente visibles
|
| 100 |
+
|
| 101 |
+
**Defectos Identificados Visualmente:**
|
| 102 |
+
- Dirt/suciedad: Confirmado en 2 ubicaciones
|
| 103 |
+
- Erosión menor en borde de ataque
|
| 104 |
+
- Decoloración UV leve
|
| 105 |
+
|
| 106 |
+
**Recomendaciones:**
|
| 107 |
+
- Limpieza programada (prioridad media)
|
| 108 |
+
- Inspección de erosión (seguimiento)
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
¡Qwen2-VL te dará análisis visual real y gratuito! 🎉
|