Spaces:

Nitish-py
/

Evaluator-core

Sleeping

jayeshdiro

Initial commit

facefda 2 months ago

2.56 kB

	# Evaluator-core System Description

	## 1. Overview

	Evaluator-core is a lightweight AI-assisted evaluation MVP built with:

	- Streamlit
	- Hugging Face Inference APIs
	- Milvus
	- Whisper

	The system is designed to ingest multiple submission artefacts, store them in a shared evidence layer, and generate a structured evaluation output grounded in retrieved evidence.

	## 2. Current Goal

	The current MVP aims to:

	> ingest multiple artefacts, build a unified submission context, and return evidence-backed evaluation JSON

	## 3. Supported Inputs

	The current system supports:

	1. Documents
	- `.txt`
	- `.md`
	- `.pdf`
	- `.pptx`
	2. Code files
	3. URLs
	4. YouTube demo videos

	All artefacts uploaded under one username are stored in a single Milvus collection and evaluated together.

	## 4. Core Flow

	1. User logs in with a username.
	2. Artefacts are uploaded or linked through the UI.
	3. Text is extracted from each artefact.
	4. Extracted text is chunked and embedded.
	5. Chunks are stored in Milvus with source metadata.
	6. Evaluation retrieves evidence from the unified collection.
	7. A Hugging Face-hosted model returns structured JSON.

	## 5. What The Evaluator Produces

	The current output includes:

	- `project_summary`
	- `sources_used`
	- `claims_detected`
	- `capabilities_detected`
	- `evidence`
	- `gaps_or_risks`
	- `scores`
	- `overall_assessment`

	The scoring rubric currently includes:

	- Problem Understanding
	- Technical Approach
	- Implementation Quality
	- Innovation / Originality
	- Communication & Demo Clarity
	- Claim vs Reality Alignment
	- Prototype Functionality

	## 6. Current Strengths

	- Unified evidence storage across source types
	- Retrieval-backed evaluation
	- Structured JSON output
	- Basic claim extraction
	- Rubric-based scoring
	- Source inventory before evaluation

	## 7. Current Limitations

	- Prototype URL validation is still text-based, not interaction-based
	- Claim validation is prompt-driven, not a dedicated cross-artifact engine
	- Code ingestion is file-upload based, not full repository ingestion
	- Code chunking is still text-based rather than syntax-aware
	- Scores and confidence are model-generated rather than calibrated

	## 8. Architecture Direction

	This MVP is no longer a source-specific chatbot. It is now closer to an evidence-layer evaluator:

	> multi-source ingestion -> shared vector store -> retrieved evidence -> structured evaluation

	That makes it a practical early version of the assignment’s intended system, while still leaving prototype validation and stronger cross-checking as future work.