UAP-Data-Analysis-Tool / codex_app_analysis.md
Ashoka74's picture
Deploy current work to HF Space (slim)
a1aef88
|
Raw
History Blame Contribute Delete
6.92 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade

UAP Analysis App: Improvement and Debugging Notes

Scope Reviewed

  • Backend API: backend/main.py
  • Frontend app/store/pages: frontend/src/**
  • Project positioning/features: README.md

High-Priority Debugging and Reliability Issues

1) Global in-memory backend state is shared across all users/sessions

  • File: backend/main.py
  • Current behavior: state is a single process-level dictionary storing dataset, filtered data, analysis results, and query context.
  • Risk:
    • Cross-user data leakage.
    • Race conditions (one user can overwrite another user’s data mid-session).
    • Non-deterministic behavior in multi-worker deployment.
  • Improvement:
    • Introduce per-session/project IDs and isolate state in Redis or a DB cache keyed by session.
    • Add dataset ownership + TTL cleanup.

2) Dashboard “Analysis Runs” counter is effectively broken

  • File: backend/main.py
  • Current behavior: /api/dashboard/summary reports analyzed_columns from state["col_names"], but col_names is never populated in run_analysis.
  • Impact: dashboard can show incorrect/always-zero run stats.
  • Improvement:
    • Set state["col_names"] = req.columns (validated columns only) during analysis.
    • Consider storing analysis timestamp and run count.

3) Numeric filtering ignores partial ranges

  • Files: backend/main.py, frontend/src/components/data/FilterPanel.tsx
  • Current behavior: numeric filter only applies when both min_val and max_val are provided.
  • Impact: user-provided minimum-only or maximum-only constraints do nothing.
  • Improvement:
    • Support all combinations: min-only, max-only, and bounded range.

4) Filtering error handling is silent on frontend

  • File: frontend/src/components/data/DataExplorer.tsx
  • Current behavior: handleFilter catches errors and suppresses them.
  • Impact: user cannot tell whether filtering failed or returned no matches.
  • Improvement:
    • Surface backend error in UI (same pattern used by upload/load errors).

5) Backend query endpoint has expensive prompt construction and token-risk

  • File: backend/main.py
  • Current behavior: concatenates up to 500 rows of full text into one prompt string.
  • Risks:
    • Large latency/cost spikes.
    • Prompt truncation or model failure on long text columns.
  • Improvement:
    • Add token-aware chunking/sampling and optional map-reduce summarization.
    • Enforce max characters/tokens per request with clear user feedback.

6) CORS config is overly permissive and potentially invalid for credentials

  • File: backend/main.py
  • Current behavior: allow_origins=["*"] with allow_credentials=True.
  • Risk: browser credential behavior can be inconsistent; security posture is weak for production.
  • Improvement:
    • Use explicit origin allowlist per environment.
    • Keep credentials disabled unless required.

Data-Analysis Quality Gaps (Core Product)

7) Analysis pipeline is currently mock/simulated, not aligned with README claims

  • File: backend/main.py
  • Current behavior: analysis uses value counts + random 2D points + correlation proxy for “XGBoost-like” output.
  • Impact: users may interpret synthetic outputs as real model outputs.
  • Improvement:
    • Label this mode explicitly as demo/mock in API/UI.
    • Add a production pipeline path using real embeddings + UMAP/HDBSCAN + trained model artifacts.

8) Cluster assignment is too coarse for high-cardinality text fields

  • File: backend/main.py
  • Current behavior: top 32 frequent values become “clusters”; all others mapped to Other.
  • Impact: weak signal extraction for long-tail UAP narratives.
  • Improvement:
    • Use embedding-based similarity clustering with min-cluster-size tuning.
    • Keep top terms as labels only, not as cluster definitions.

9) “XGBoost” results are correlation-based placeholders

  • File: backend/main.py
  • Current behavior: feature importance derived from absolute correlation among category codes, with random “accuracy”.
  • Impact: misleading ML interpretation.
  • Improvement:
    • Either rename section (Association Importance) or run real train/validation with metrics and confidence intervals.

10) Cramer’s V stability safeguards are minimal

  • File: backend/main.py
  • Current behavior: exceptions are swallowed to 0.0 values.
  • Impact: matrix can hide data-quality problems.
  • Improvement:
    • Return diagnostics (insufficient contingency size, sparse table warning, low sample counts).

UX and Feature Improvements for Analysis Workflows

11) Add reproducibility controls

  • Current gap: random projections are generated without surfaced seed controls; pipeline details are hidden.
  • Improvement:
    • UI inputs for random seed and analysis config profile.
    • Persist configuration alongside results.

12) Add time/location-first analysis modules

  • Context: UAP datasets are usually spatiotemporal.
  • Improvement ideas:
    • Temporal anomaly detection (daily/weekly trend breaks).
    • Geo heatmaps + hotspot evolution over time.
    • Co-occurrence matrices for shape/light/motion features.

13) Add model/result provenance panel

  • Improvement:
    • Track dataset hash, row count after filters, analysis timestamp, selected columns, pipeline version.
    • Show this metadata in Analysis and export payloads.

14) Improve filter capabilities for real EDA

  • Current gap: categorical filter relies on top-values list and cannot easily search rare categories.
  • Improvement:
    • Add searchable categorical picker and “include nulls/exclude nulls”.
    • Add reusable saved filter presets.

15) Add export/reporting features

  • Improvement:
    • Export filtered dataset, correlation matrix, and feature-importance JSON/CSV.
    • One-click markdown/PDF report with charts and configuration metadata.

Engineering Quality and Maintainability

16) Add automated tests for core API behaviors

  • Suggested minimal suite:
    • /api/data/load, /api/data/filter, /api/analyze/run, /api/dashboard/summary, /api/query/gemini failure paths.
    • Numeric filter edge cases (min-only/max-only).
    • State isolation once sessionized.

17) Add request/analysis observability

  • Improvement:
    • Structured logging + request IDs.
    • Timing metrics per stage (load/filter/analyze/query).
    • Distinguish user errors (4xx) from pipeline errors (5xx).

18) Clarify mode separation: demo vs production

  • Improvement:
    • Feature flags/environment variable to select mock vs full analysis backend.
    • UI badges and warning copy to prevent scientific misinterpretation.

Suggested Implementation Order

  1. Fix state isolation and dashboard run-count correctness.
  2. Fix filtering behavior + frontend error surfacing.
  3. Mark current analysis mode as mock and rename misleading outputs.
  4. Add reproducibility/provenance metadata and exports.
  5. Introduce real embedding + clustering + model pipeline behind a feature flag.