\section{Experimental Results}
\label{sec:experimental_results}

In this section, we present the empirical evaluation of the IRIS system, focusing on two key dimensions: computational efficiency (latency and throughput) and retrieval accuracy.

\subsection{Computational Efficiency}
The efficiency of the entity extraction and embedding pipeline was evaluated using a dataset of 50 candidate profiles. The pipeline consists of extracting specific entities—Headline, Summary, Skills, and Experience—and generating their corresponding embeddings using the BGE-M3 model.

Table~\ref{tab:latency_results} summarizes the mean latency and standard deviation for each entity type.

\begin{table}[h]
\centering
\caption{Mean Latency and Standard Deviation per Entity Extraction (N=50)}
\label{tab:latency_results}
\begin{tabular}{lrr}
\hline
\textbf{Entity Type} & \textbf{Mean Latency (ms)} & \textbf{Std. Dev. (ms)} \\ \hline
Headline             & 965.78                     & 2969.16                 \\
Summary              & 785.70                     & 141.60                  \\
Skills (List)        & 780.01                     & 160.76                  \\
Experience (List)    & 1005.30                    & 185.11                  \\ \hline
\textbf{Total per Profile} & \textbf{3536.80}           & --                      \\ \hline
\end{tabular}
\end{table}

The average total processing time per profile is approximately 3.54 seconds, resulting in a throughput of \textbf{0.283 profiles per second}. While the Headline extraction shows high variance, possibly due to network latency or cold-start issues in the embedding service, the overall pipeline maintains a consistent performance suitable for near-real-time recruitment tasks.

\subsection{Retrieval Performance}
We compared the proposed IRIS matching methods against standard baselines using Mean Reciprocal Rank (MRR) and Recall@K ($R@k$). The evaluation included:
\begin{itemize}
    \item \textbf{Jaccard Baseline}: A keyword-based overlap method.
    \item \textbf{BERT Flattened}: Dense retrieval using BERT embeddings on concatenated profile text.
    \item \textbf{BGE Flattened}: Dense retrieval using BGE-M3 embeddings on concatenated profile text.
    \item \textbf{BGE Granular Weighted}: Our proposed method using weighted cosine similarity across specific entities.
\end{itemize}

Table~\ref{tab:retrieval_results} presents the results of this comparison.

\begin{table}[h]
\centering
\caption{Comparison of Retrieval Accuracy Metrics}
\label{tab:retrieval_results}
\begin{tabular}{lccc}
\hline
\textbf{Method}             & \textbf{MRR}    & \textbf{R@1}   & \textbf{R@3}   \\ \hline
Jaccard Baseline            & 0.0755          & 0.016          & 0.048          \\
BERT Flattened              & 0.1708          & 0.048          & \textbf{0.144} \\
BGE Flattened               & \textbf{0.1729} & \textbf{0.048} & \textbf{0.144} \\
BGE Granular Weighted       & 0.0749          & 0.016          & 0.040          \\ \hline
\end{tabular}
\end{table}

The results indicate that the \textbf{BGE Flattened} approach achieves the highest MRR (0.1729) and Recall@1/Recall@3. Notably, the granular weighted approach currently underperforms compared to the flattened embedding methods, suggesting that the aggregation logic or weight distribution for specific entities requires further optimization.