Spaces:

OrganizedProgrammers
/

3GPP-Innovation-Extractor

Sleeping

App Files Files Community

3GPP-Innovation-Extractor / README_PROJECT.md

heymenn

Rename README.md to README_PROJECT.md

3c7c478 verified 20 days ago

preview code

raw

history blame contribute delete

3.13 kB

Run Locally

Prerequisites: Node.js

Install dependencies: npm install
Set the GEMINI_API_KEY in .env.local to your Gemini API key
Run the app: npm run dev

Workflow Overview

flowchart TD
    subgraph S1 [Phase 1: Data Ingestion]
        A[User Selects Working Group] -->|SA1-6, RAN1-2| B[Fetch Meetings via POST]
        B --> C[User Selects Meeting]
        C --> D[Filter Docs by Metadata]
        D --> E[Extract Raw Text]
    end

    subgraph S2 [Phase 2: Refinement & Caching]
        E --> F{Text in Cache?}
        F -- Yes --> G[Retrieve Cached Refinement]
        F -- No --> H[LLM Processing]
        H --> I[Task: Dense Chunking & 'What's New']
        I --> J[Store in Dataset]
        J --> G
    end

    subgraph S3 [Phase 3: Pattern Analysis]
        G --> K[User Selects Pattern/Prompt]
        K --> L{Result in Cache?}
        L -- Yes --> M[Retrieve Analysis]
        L -- No --> N[Execute Pattern]
        N --> O[Multi-Model Verification]
        O --> P[Store Result]
    end

    S1 --> S2 --> S3

Detailed Process Specification

Phase 1: Data Ingestion & Extraction

The user navigates a strict hierarchy to isolate relevant source text.

Working Group Selection: User selects one group from the allowlist: ['SA1', 'SA2', 'SA3', 'SA4', 'SA5', 'SA6', 'RAN1', 'RAN2'].
Meeting Retrieval: System executes a POST request to the endpoint using the selected Working Group to retrieve the meeting list.
Document Filtering: User selects a meeting, then filters the resulting file list using available metadata.
Text Extraction: System extracts raw content from the filtered files into a text list.

Phase 2: Content Refinement (with Caching)

Raw text is processed into high-value summaries to reduce noise.

Cache Check: Before processing, check the dataset for existing (text_hash, refined_output) pairs to prevent duplicate processing.
LLM Processing: If not cached, pass text to the selected LLM (default provided, user-changeable).
Prompt Objective:
1. Create information-dense chunks (minimize near-duplicates).
2. Generate a "What's New" paragraph wrapped in SUGGESTION START and SUGGESTION END tags.
Storage: Save the input text and the LLM output to the dataset.

Phase 3: Pattern Analysis & Verification

Refined text is analyzed using specific user-defined patterns.

Pattern Selection: User applies a specific prompt/pattern to the refined documents.
Cache Check: Check the results database for existing (document_id, pattern_id) results.
Execution & Verification:
- Run the selected pattern against the documents.
- Verifier Mode: Optionally execute the same input across multiple models simultaneously to compare results and ensure accuracy.
Storage: Save the final analysis in the database to prevent future re-computation.