Spaces:

OrganizedProgrammers
/

3GPP-Innovation-Extractor

Sleeping

App Files Files Community

3GPP-Innovation-Extractor / README_PROJECT.md

heymenn

Rename README.md to README_PROJECT.md

3c7c478 verified 21 days ago

preview code

raw

history blame contribute delete

3.13 kB

	<div align="center">
	<img width="1200" height="475" alt="GHBanner" src="https://github.com/user-attachments/assets/0aa67016-6eaf-458a-adb2-6e31a0763ed6" />
	</div>

	## Run Locally

	Prerequisites: Node.js


	1. Install dependencies:
	`npm install`
	2. Set the `GEMINI_API_KEY` in [.env.local](.env.local) to your Gemini API key
	3. Run the app:
	`npm run dev`


	Workflow Overview

	```mermaid
	flowchart TD
	subgraph S1 [Phase 1: Data Ingestion]
	A[User Selects Working Group] -->\|SA1-6, RAN1-2\| B[Fetch Meetings via POST]
	B --> C[User Selects Meeting]
	C --> D[Filter Docs by Metadata]
	D --> E[Extract Raw Text]
	end

	subgraph S2 [Phase 2: Refinement & Caching]
	E --> F{Text in Cache?}
	F -- Yes --> G[Retrieve Cached Refinement]
	F -- No --> H[LLM Processing]
	H --> I[Task: Dense Chunking & 'What's New']
	I --> J[Store in Dataset]
	J --> G
	end

	subgraph S3 [Phase 3: Pattern Analysis]
	G --> K[User Selects Pattern/Prompt]
	K --> L{Result in Cache?}
	L -- Yes --> M[Retrieve Analysis]
	L -- No --> N[Execute Pattern]
	N --> O[Multi-Model Verification]
	O --> P[Store Result]
	end

	S1 --> S2 --> S3
	```

	### Detailed Process Specification

	#### Phase 1: Data Ingestion & Extraction
	The user navigates a strict hierarchy to isolate relevant source text.
	1. Working Group Selection: User selects one group from the allowlist: `['SA1', 'SA2', 'SA3', 'SA4', 'SA5', 'SA6', 'RAN1', 'RAN2']`.
	2. Meeting Retrieval: System executes a `POST` request to the endpoint using the selected Working Group to retrieve the meeting list.
	3. Document Filtering: User selects a meeting, then filters the resulting file list using available metadata.
	4. Text Extraction: System extracts raw content from the filtered files into a text list.

	#### Phase 2: Content Refinement (with Caching)
	Raw text is processed into high-value summaries to reduce noise.
	* Cache Check: Before processing, check the dataset for existing `(text_hash, refined_output)` pairs to prevent duplicate processing.
	* LLM Processing: If not cached, pass text to the selected LLM (default provided, user-changeable).
	* Prompt Objective:
	1. Create information-dense chunks (minimize near-duplicates).
	2. Generate a "What's New" paragraph wrapped in `SUGGESTION START` and `SUGGESTION END` tags.
	* Storage: Save the input text and the LLM output to the dataset.

	#### Phase 3: Pattern Analysis & Verification
	Refined text is analyzed using specific user-defined patterns.
	* Pattern Selection: User applies a specific prompt/pattern to the refined documents.
	* Cache Check: Check the results database for existing `(document_id, pattern_id)` results.
	* Execution & Verification:
	* Run the selected pattern against the documents.
	* Verifier Mode: Optionally execute the same input across multiple models simultaneously to compare results and ensure accuracy.
	* Storage: Save the final analysis in the database to prevent future re-computation.