Spaces:
Sleeping
Sleeping
| # Evaluator-core System Description | |
| ## 1. Overview | |
| Evaluator-core is a lightweight AI-assisted evaluation MVP built with: | |
| - Streamlit | |
| - Hugging Face Inference APIs | |
| - Milvus | |
| - Whisper | |
| The system is designed to ingest multiple submission artefacts, store them in a shared evidence layer, and generate a structured evaluation output grounded in retrieved evidence. | |
| ## 2. Current Goal | |
| The current MVP aims to: | |
| > ingest multiple artefacts, build a unified submission context, and return evidence-backed evaluation JSON | |
| ## 3. Supported Inputs | |
| The current system supports: | |
| 1. Documents | |
| - `.txt` | |
| - `.md` | |
| - `.pdf` | |
| - `.pptx` | |
| 2. Code files | |
| 3. URLs | |
| 4. YouTube demo videos | |
| All artefacts uploaded under one username are stored in a single Milvus collection and evaluated together. | |
| ## 4. Core Flow | |
| 1. User logs in with a username. | |
| 2. Artefacts are uploaded or linked through the UI. | |
| 3. Text is extracted from each artefact. | |
| 4. Extracted text is chunked and embedded. | |
| 5. Chunks are stored in Milvus with source metadata. | |
| 6. Evaluation retrieves evidence from the unified collection. | |
| 7. A Hugging Face-hosted model returns structured JSON. | |
| ## 5. What The Evaluator Produces | |
| The current output includes: | |
| - `project_summary` | |
| - `sources_used` | |
| - `claims_detected` | |
| - `capabilities_detected` | |
| - `evidence` | |
| - `gaps_or_risks` | |
| - `scores` | |
| - `overall_assessment` | |
| The scoring rubric currently includes: | |
| - Problem Understanding | |
| - Technical Approach | |
| - Implementation Quality | |
| - Innovation / Originality | |
| - Communication & Demo Clarity | |
| - Claim vs Reality Alignment | |
| - Prototype Functionality | |
| ## 6. Current Strengths | |
| - Unified evidence storage across source types | |
| - Retrieval-backed evaluation | |
| - Structured JSON output | |
| - Basic claim extraction | |
| - Rubric-based scoring | |
| - Source inventory before evaluation | |
| ## 7. Current Limitations | |
| - Prototype URL validation is still text-based, not interaction-based | |
| - Claim validation is prompt-driven, not a dedicated cross-artifact engine | |
| - Code ingestion is file-upload based, not full repository ingestion | |
| - Code chunking is still text-based rather than syntax-aware | |
| - Scores and confidence are model-generated rather than calibrated | |
| ## 8. Architecture Direction | |
| This MVP is no longer a source-specific chatbot. It is now closer to an evidence-layer evaluator: | |
| > multi-source ingestion -> shared vector store -> retrieved evidence -> structured evaluation | |
| That makes it a practical early version of the assignment’s intended system, while still leaving prototype validation and stronger cross-checking as future work. | |