Spaces:

OrganizedProgrammers
/

3GPP-Innovation-Extractor

Sleeping

File size: 3,129 Bytes

4b1a31e

<div align="center">
<img width="1200" height="475" alt="GHBanner" src="https://github.com/user-attachments/assets/0aa67016-6eaf-458a-adb2-6e31a0763ed6" />
</div>

## Run Locally

**Prerequisites:**  Node.js


1. Install dependencies:
   `npm install`
2. Set the `GEMINI_API_KEY` in [.env.local](.env.local) to your Gemini API key
3. Run the app:
   `npm run dev`


Workflow Overview

```mermaid
flowchart TD
    subgraph S1 [Phase 1: Data Ingestion]
        A[User Selects Working Group] -->|SA1-6, RAN1-2| B[Fetch Meetings via POST]
        B --> C[User Selects Meeting]
        C --> D[Filter Docs by Metadata]
        D --> E[Extract Raw Text]
    end

    subgraph S2 [Phase 2: Refinement & Caching]
        E --> F{Text in Cache?}
        F -- Yes --> G[Retrieve Cached Refinement]
        F -- No --> H[LLM Processing]
        H --> I[Task: Dense Chunking & 'What's New']
        I --> J[Store in Dataset]
        J --> G
    end

    subgraph S3 [Phase 3: Pattern Analysis]
        G --> K[User Selects Pattern/Prompt]
        K --> L{Result in Cache?}
        L -- Yes --> M[Retrieve Analysis]
        L -- No --> N[Execute Pattern]
        N --> O[Multi-Model Verification]
        O --> P[Store Result]
    end

    S1 --> S2 --> S3
```

### Detailed Process Specification

#### Phase 1: Data Ingestion & Extraction
The user navigates a strict hierarchy to isolate relevant source text.
1.  **Working Group Selection:** User selects one group from the allowlist: `['SA1', 'SA2', 'SA3', 'SA4', 'SA5', 'SA6', 'RAN1', 'RAN2']`.
2.  **Meeting Retrieval:** System executes a `POST` request to the endpoint using the selected Working Group to retrieve the meeting list.
3.  **Document Filtering:** User selects a meeting, then filters the resulting file list using available metadata.
4.  **Text Extraction:** System extracts raw content from the filtered files into a text list.

#### Phase 2: Content Refinement (with Caching)
Raw text is processed into high-value summaries to reduce noise.
*   **Cache Check:** Before processing, check the dataset for existing `(text_hash, refined_output)` pairs to prevent duplicate processing.
*   **LLM Processing:** If not cached, pass text to the selected LLM (default provided, user-changeable).
*   **Prompt Objective:**
    1.  Create information-dense chunks (minimize near-duplicates).
    2.  Generate a "What's New" paragraph wrapped in `SUGGESTION START` and `SUGGESTION END` tags.
*   **Storage:** Save the input text and the LLM output to the dataset.

#### Phase 3: Pattern Analysis & Verification
Refined text is analyzed using specific user-defined patterns.
*   **Pattern Selection:** User applies a specific prompt/pattern to the refined documents.
*   **Cache Check:** Check the results database for existing `(document_id, pattern_id)` results.
*   **Execution & Verification:**
    *   Run the selected pattern against the documents.
    *   **Verifier Mode:** Optionally execute the same input across multiple models simultaneously to compare results and ensure accuracy.
*   **Storage:** Save the final analysis in the database to prevent future re-computation.