iris_backend / experimental_results.tex
Saandraahh's picture
Implemented clustering
4b3a33f
\section{Experimental Results}
\label{sec:experimental_results}
In this section, we present the empirical evaluation of the IRIS system, focusing on two key dimensions: computational efficiency (latency and throughput) and retrieval accuracy.
\subsection{Computational Efficiency}
The efficiency of the entity extraction and embedding pipeline was evaluated using a dataset of 50 candidate profiles. The pipeline consists of extracting specific entities—Headline, Summary, Skills, and Experience—and generating their corresponding embeddings using the BGE-M3 model.
Table~\ref{tab:latency_results} summarizes the mean latency and standard deviation for each entity type.
\begin{table}[h]
\centering
\caption{Mean Latency and Standard Deviation per Entity Extraction (N=50)}
\label{tab:latency_results}
\begin{tabular}{lrr}
\hline
\textbf{Entity Type} & \textbf{Mean Latency (ms)} & \textbf{Std. Dev. (ms)} \\ \hline
Headline & 965.78 & 2969.16 \\
Summary & 785.70 & 141.60 \\
Skills (List) & 780.01 & 160.76 \\
Experience (List) & 1005.30 & 185.11 \\ \hline
\textbf{Total per Profile} & \textbf{3536.80} & -- \\ \hline
\end{tabular}
\end{table}
The average total processing time per profile is approximately 3.54 seconds, resulting in a throughput of \textbf{0.283 profiles per second}. While the Headline extraction shows high variance, possibly due to network latency or cold-start issues in the embedding service, the overall pipeline maintains a consistent performance suitable for near-real-time recruitment tasks.
\subsection{Retrieval Performance}
We compared the proposed IRIS matching methods against standard baselines using Mean Reciprocal Rank (MRR) and Recall@K ($R@k$). The evaluation included:
\begin{itemize}
\item \textbf{Jaccard Baseline}: A keyword-based overlap method.
\item \textbf{BERT Flattened}: Dense retrieval using BERT embeddings on concatenated profile text.
\item \textbf{BGE Flattened}: Dense retrieval using BGE-M3 embeddings on concatenated profile text.
\item \textbf{BGE Granular Weighted}: Our proposed method using weighted cosine similarity across specific entities.
\end{itemize}
Table~\ref{tab:retrieval_results} presents the results of this comparison.
\begin{table}[h]
\centering
\caption{Comparison of Retrieval Accuracy Metrics}
\label{tab:retrieval_results}
\begin{tabular}{lccc}
\hline
\textbf{Method} & \textbf{MRR} & \textbf{R@1} & \textbf{R@3} \\ \hline
Jaccard Baseline & 0.0755 & 0.016 & 0.048 \\
BERT Flattened & 0.1708 & 0.048 & \textbf{0.144} \\
BGE Flattened & \textbf{0.1729} & \textbf{0.048} & \textbf{0.144} \\
BGE Granular Weighted & 0.0749 & 0.016 & 0.040 \\ \hline
\end{tabular}
\end{table}
The results indicate that the \textbf{BGE Flattened} approach achieves the highest MRR (0.1729) and Recall@1/Recall@3. Notably, the granular weighted approach currently underperforms compared to the flattened embedding methods, suggesting that the aggregation logic or weight distribution for specific entities requires further optimization.