3GPP-Innovation-Extractor / README_PROJECT.md
heymenn's picture
Rename README.md to README_PROJECT.md
3c7c478 verified
GHBanner

Run Locally

Prerequisites: Node.js

  1. Install dependencies: npm install
  2. Set the GEMINI_API_KEY in .env.local to your Gemini API key
  3. Run the app: npm run dev

Workflow Overview

flowchart TD
    subgraph S1 [Phase 1: Data Ingestion]
        A[User Selects Working Group] -->|SA1-6, RAN1-2| B[Fetch Meetings via POST]
        B --> C[User Selects Meeting]
        C --> D[Filter Docs by Metadata]
        D --> E[Extract Raw Text]
    end

    subgraph S2 [Phase 2: Refinement & Caching]
        E --> F{Text in Cache?}
        F -- Yes --> G[Retrieve Cached Refinement]
        F -- No --> H[LLM Processing]
        H --> I[Task: Dense Chunking & 'What's New']
        I --> J[Store in Dataset]
        J --> G
    end

    subgraph S3 [Phase 3: Pattern Analysis]
        G --> K[User Selects Pattern/Prompt]
        K --> L{Result in Cache?}
        L -- Yes --> M[Retrieve Analysis]
        L -- No --> N[Execute Pattern]
        N --> O[Multi-Model Verification]
        O --> P[Store Result]
    end

    S1 --> S2 --> S3

Detailed Process Specification

Phase 1: Data Ingestion & Extraction

The user navigates a strict hierarchy to isolate relevant source text.

  1. Working Group Selection: User selects one group from the allowlist: ['SA1', 'SA2', 'SA3', 'SA4', 'SA5', 'SA6', 'RAN1', 'RAN2'].
  2. Meeting Retrieval: System executes a POST request to the endpoint using the selected Working Group to retrieve the meeting list.
  3. Document Filtering: User selects a meeting, then filters the resulting file list using available metadata.
  4. Text Extraction: System extracts raw content from the filtered files into a text list.

Phase 2: Content Refinement (with Caching)

Raw text is processed into high-value summaries to reduce noise.

  • Cache Check: Before processing, check the dataset for existing (text_hash, refined_output) pairs to prevent duplicate processing.
  • LLM Processing: If not cached, pass text to the selected LLM (default provided, user-changeable).
  • Prompt Objective:
    1. Create information-dense chunks (minimize near-duplicates).
    2. Generate a "What's New" paragraph wrapped in SUGGESTION START and SUGGESTION END tags.
  • Storage: Save the input text and the LLM output to the dataset.

Phase 3: Pattern Analysis & Verification

Refined text is analyzed using specific user-defined patterns.

  • Pattern Selection: User applies a specific prompt/pattern to the refined documents.
  • Cache Check: Check the results database for existing (document_id, pattern_id) results.
  • Execution & Verification:
    • Run the selected pattern against the documents.
    • Verifier Mode: Optionally execute the same input across multiple models simultaneously to compare results and ensure accuracy.
  • Storage: Save the final analysis in the database to prevent future re-computation.