Spaces:

ssaiteja16
/

RagBenchCapstone10

Runtime error

App Files Files Community

RagBenchCapstone10 / report /finalreport.md

swaroop-uddandarao

modified reports

fed116a about 1 year ago

preview code

raw

history blame contribute delete

1.91 kB

	# Performance Analysis Report

	1. Retrieval Time:

	- Milvus + LLaMA: 0.132s
	- Weaviate + Mistral: 0.157s
	- Milvus + Mistral: NaN

	2. Context Relevance (higher is better):

	- Milvus + LLaMA: 0.640
	- Weaviate + Mistral: 0.591
	- Milvus + Mistral: 0.518

	3. Context Utilization (higher is better):

	- Milvus + LLaMA: 0.673
	- Weaviate + Mistral: 0.619
	- Milvus + Mistral: 0.614

	4. AUCROC (Area Under ROC Curve):

	- Milvus + LLaMA: 0.912
	- Weaviate + Mistral: 0.750
	- Milvus + Mistral: 0.844

	5. RMSE (Root Mean Square Error):
	- Milvus + LLaMA:
	- Context Relevance RMSE: 0.179
	- Context Utilization RMSE: 0.302
	- Weaviate + Mistral:
	- Context Relevance RMSE: 0.414
	- Context Utilization RMSE: 0.482
	- Milvus + Mistral:
	- Context Relevance RMSE: 0.167
	- Context Utilization RMSE: 0.258

	## Analysis

	1. Best Overall Performance: Milvus + LLaMA

	- Highest AUCROC score (0.912)
	- Best context relevance (0.640) and utilization (0.673)
	- Fast retrieval time (0.132s)
	- Moderate RMSE scores

	2. Runner-up: Milvus + Mistral

	- Second-best AUCROC (0.844)
	- Lowest RMSE scores overall
	- Lower context relevance and utilization
	- Retrieval time data unavailable

	3. Third Place: Weaviate + Mistral
	- Lowest AUCROC (0.750)
	- Highest RMSE scores
	- Slowest retrieval time (0.157s)
	- Moderate context metrics

	## Recommendation

	Based on the comprehensive analysis of all metrics, Milvus + LLaMA emerges as the optimal choice for overall performance. It demonstrates:

	- Superior accuracy (highest AUCROC)
	- Better context handling capabilities
	- Efficient retrieval speed
	- Reasonable error rates

	However, if minimizing error (RMSE) is the primary objective, Milvus + Mistral could be a viable alternative due to its lower error rates in both context relevance and utilization metrics.