Evaluator-core / description.text
jayeshdiro
Initial commit
facefda
# Evaluator-core System Description
## 1. Overview
Evaluator-core is a lightweight AI-assisted evaluation MVP built with:
- Streamlit
- Hugging Face Inference APIs
- Milvus
- Whisper
The system is designed to ingest multiple submission artefacts, store them in a shared evidence layer, and generate a structured evaluation output grounded in retrieved evidence.
## 2. Current Goal
The current MVP aims to:
> ingest multiple artefacts, build a unified submission context, and return evidence-backed evaluation JSON
## 3. Supported Inputs
The current system supports:
1. Documents
- `.txt`
- `.md`
- `.pdf`
- `.pptx`
2. Code files
3. URLs
4. YouTube demo videos
All artefacts uploaded under one username are stored in a single Milvus collection and evaluated together.
## 4. Core Flow
1. User logs in with a username.
2. Artefacts are uploaded or linked through the UI.
3. Text is extracted from each artefact.
4. Extracted text is chunked and embedded.
5. Chunks are stored in Milvus with source metadata.
6. Evaluation retrieves evidence from the unified collection.
7. A Hugging Face-hosted model returns structured JSON.
## 5. What The Evaluator Produces
The current output includes:
- `project_summary`
- `sources_used`
- `claims_detected`
- `capabilities_detected`
- `evidence`
- `gaps_or_risks`
- `scores`
- `overall_assessment`
The scoring rubric currently includes:
- Problem Understanding
- Technical Approach
- Implementation Quality
- Innovation / Originality
- Communication & Demo Clarity
- Claim vs Reality Alignment
- Prototype Functionality
## 6. Current Strengths
- Unified evidence storage across source types
- Retrieval-backed evaluation
- Structured JSON output
- Basic claim extraction
- Rubric-based scoring
- Source inventory before evaluation
## 7. Current Limitations
- Prototype URL validation is still text-based, not interaction-based
- Claim validation is prompt-driven, not a dedicated cross-artifact engine
- Code ingestion is file-upload based, not full repository ingestion
- Code chunking is still text-based rather than syntax-aware
- Scores and confidence are model-generated rather than calibrated
## 8. Architecture Direction
This MVP is no longer a source-specific chatbot. It is now closer to an evidence-layer evaluator:
> multi-source ingestion -> shared vector store -> retrieved evidence -> structured evaluation
That makes it a practical early version of the assignment’s intended system, while still leaving prototype validation and stronger cross-checking as future work.