\section{Experimental Results} \label{sec:experimental_results} In this section, we present the empirical evaluation of the IRIS system, focusing on two key dimensions: computational efficiency (latency and throughput) and retrieval accuracy. \subsection{Computational Efficiency} The efficiency of the entity extraction and embedding pipeline was evaluated using a dataset of 50 candidate profiles. The pipeline consists of extracting specific entities—Headline, Summary, Skills, and Experience—and generating their corresponding embeddings using the BGE-M3 model. Table~\ref{tab:latency_results} summarizes the mean latency and standard deviation for each entity type. \begin{table}[h] \centering \caption{Mean Latency and Standard Deviation per Entity Extraction (N=50)} \label{tab:latency_results} \begin{tabular}{lrr} \hline \textbf{Entity Type} & \textbf{Mean Latency (ms)} & \textbf{Std. Dev. (ms)} \\ \hline Headline & 965.78 & 2969.16 \\ Summary & 785.70 & 141.60 \\ Skills (List) & 780.01 & 160.76 \\ Experience (List) & 1005.30 & 185.11 \\ \hline \textbf{Total per Profile} & \textbf{3536.80} & -- \\ \hline \end{tabular} \end{table} The average total processing time per profile is approximately 3.54 seconds, resulting in a throughput of \textbf{0.283 profiles per second}. While the Headline extraction shows high variance, possibly due to network latency or cold-start issues in the embedding service, the overall pipeline maintains a consistent performance suitable for near-real-time recruitment tasks. \subsection{Retrieval Performance} We compared the proposed IRIS matching methods against standard baselines using Mean Reciprocal Rank (MRR) and Recall@K ($R@k$). The evaluation included: \begin{itemize} \item \textbf{Jaccard Baseline}: A keyword-based overlap method. \item \textbf{BERT Flattened}: Dense retrieval using BERT embeddings on concatenated profile text. \item \textbf{BGE Flattened}: Dense retrieval using BGE-M3 embeddings on concatenated profile text. \item \textbf{BGE Granular Weighted}: Our proposed method using weighted cosine similarity across specific entities. \end{itemize} Table~\ref{tab:retrieval_results} presents the results of this comparison. \begin{table}[h] \centering \caption{Comparison of Retrieval Accuracy Metrics} \label{tab:retrieval_results} \begin{tabular}{lccc} \hline \textbf{Method} & \textbf{MRR} & \textbf{R@1} & \textbf{R@3} \\ \hline Jaccard Baseline & 0.0755 & 0.016 & 0.048 \\ BERT Flattened & 0.1708 & 0.048 & \textbf{0.144} \\ BGE Flattened & \textbf{0.1729} & \textbf{0.048} & \textbf{0.144} \\ BGE Granular Weighted & 0.0749 & 0.016 & 0.040 \\ \hline \end{tabular} \end{table} The results indicate that the \textbf{BGE Flattened} approach achieves the highest MRR (0.1729) and Recall@1/Recall@3. Notably, the granular weighted approach currently underperforms compared to the flattened embedding methods, suggesting that the aggregation logic or weight distribution for specific entities requires further optimization.