Spaces:

EQUES
/

Response-Quality-Assessment

Running

src/subclaim_processor/strategies/
├── aggregation/
│   ├── base.py        → AggregationStrategy (ABC)
│   ├── mean.py        → MeanAggregation
│   └── max.py         → MaxAggregation
└── scoring/
    ├── base.py        → ScoringStrategy (ABC)
    └── product.py     → ProductScoreStrategy

実際の動作（`subclaim_scorer.py:46-68`）

AGGREGATION_STRATEGIES: Dict[str, Callable] = {
    "max": MaxAggregation,
    "mean": MeanAggregation,
}
SCORING_STRATEGIES: Dict[str, Callable] = {
    "product": ProductScoreStrategy,
}

def score(self, claim, retrieved_docs, aggregation_strategy, scoring_strategy):
    agg_func = AGGREGATION_STRATEGIES[aggregation_strategy]()   # "mean" or "max"
    scoring_func = SCORING_STRATEGIES[scoring_strategy]()       # "product"

    for doc in retrieved_docs:
        score = scoring_func.compute_score(...)  # 差し替え可能

    return agg_func.aggregate(doc_scores)        # 差し替え可能

問題点

Strategy の選択が文字列キーの辞書で行われており、型安全性がない。"product" を typo しても実行時まで気づけない。

改善案

# Enum で型安全にする
from enum import Enum

class AggregationMethod(Enum):
    MEAN = "mean"
    MAX = "max"

# または Protocol を使う（Python 3.8+）
from typing import Protocol

class AggregationStrategy(Protocol):
    def aggregate(self, scores: list[float]) -> float: ...

2. Template Method パターン

目的

処理の骨格を親クラスで定め、詳細を子クラスに委ねる。

適用箇所

src/data_processor/raw_data_processor.py  → IRawDataProcessor, DatasetProcessor (ABC)
src/calibration/base_calibration.py       → ICalibration (ABC)
src/common/llm/llm_agent.py               → LLMAgent (ABC)

現在の実装（`raw_data_processor.py`）

class IRawDataProcessor(ABC):
    @abstractmethod
    def get_queries(self, input_file, output_file): pass

    @abstractmethod
    def get_documents(self, query_file, output_file): pass

class DatasetProcessor(ABC):
    @abstractmethod
    def process_queries(self, input_file, **kwargs) -> list: pass

    @abstractmethod
    def process_documents(self, query_file, db, **kwargs) -> dict: pass

問題点

抽象クラスが2つある（IRawDataProcessor と DatasetProcessor）。

クラス	役割
`IRawDataProcessor`	ファイルI/O込みのインターフェース
`DatasetProcessor`	純粋な処理ロジックのインターフェース

QueryProcessor は IRawDataProcessor を継承しつつ、内部で DatasetProcessor を利用するというダブル構造になっている。

真の Template Method パターンなら、親クラスに「骨格メソッド」が実装されているはずだが、現在の実装では QueryProcessor.get_queries() が骨格を担っている。

3. Factory / Dispatcher パターン

目的

入力に基づいて適切なオブジェクトを生成・委譲する。

実際のコード（`query_processor.py:26-31`）

self.processors = {
    "fact_score": FactScoreProcessor(),
    "hotpot_qa": HotpotQAProcessor(),
    "pop_qa": PopQAProcessor(),
    "medlf_qa": MedLFQAProcessor(),
}

# 使用時
processor = self.processors.get(dataset)

問題点

これは「Factory パターン」ではなく「Registry パターン」に近い。

パターン	特徴
Factory（正）	必要時にオブジェクトを生成する
Registry（現在）	起動時に全オブジェクトを一括生成して辞書に保持

QueryProcessor() を作った時点で4データセット全プロセッサがインスタンス化される。実害は少ないが、設計の意図とずれている。

4. Manager パターン

目的

リソースのライフサイクル（生成・使用・破棄）を一元管理する。

各 Manager の評価

Manager	管理対象	実際の責任数	評価
`ConfigManager`	YAMLファイル	設定読み込み + ログセットアップ	△ 2責任
`FAISSIndexManager`	FAISSインデックス	インデックス管理 + 検索 + 応答生成	✗ 3責任
`FileManager`	ドキュメント	ファイル読み込み + チャンキング	△ 2責任
`OpenAIManager`	Embeddings API	API呼び出しのみ	✓ 適切

最大の問題：`FAISSIndexManager`（`faiss_manager.py:245-286`）

class FAISSIndexManager:
    def upsert_file_to_faiss(...)              # インデックス管理 ✓
    def search_faiss_index(...)                # 検索 ✓
    def generate_response_from_context(...)    # ← LLM応答生成（無関係）
    def parse_result(...)                      # ← 文字列パース（雑多）

generate_response_from_context はインデックス管理とは無関係なのに FAISSIndexManager に含まれている。この責任は OpenAIRAGAgent に移すべき。

5. Pipeline アーキテクチャ

目的

処理を段階的に分けて順次実行する。

現在のパイプライン

main.py
  1. DataLoader         → 生データ取得
  2. QueryProcessor     → データ正規化
  3. FAISSIndexManager  → インデックス構築
  4. SubclaimProcessor  → 応答生成 → サブクレーム抽出 → スコアリング → アノテーション
  5. SplitConformalCalibration → 統計的キャリブレーション

問題点：ステージ間の文字列シリアライズ（`faiss_manager.py:196-216`）

# 検索結果を文字列として返す
results.append(
    f"{text} indice={idx} fileposition={relative_idx} score={dist:.4f}"
    # TODO reformat this  ← 本人コメントあり
)

その後 parse_result() で正規表現パースし直している：

pattern = re.compile(
    r"page_content='(.*?)'\smetadata=(\{.*?\})\sindice=(\d+)\sfileposition=(\d+)\sscore=([\d.]+)",
    re.DOTALL,
)

改善案

dataclass でステージ間のデータ型を定義すれば parse_result() が不要になる：

@dataclass
class SearchResult:
    text: str
    indice: int
    fileposition: int
    score: float

6. Dependency Injection（DI）

目的

依存オブジェクトを外から注入してテスト容易性を高める。

現在の実装（`subclaim_scorer.py:29-44`）

class SubclaimScorer(IDocumentScorer):
    def __init__(self, index_truncation_config, embedding_model, index_path, ...):
        self.faiss_manager = FAISSIndexManager(...)      # ← 内部で直接生成
        self.gen = OpenAIAtomicFactGenerator()            # ← 内部で直接生成
        self.openai_client = OpenAI()                     # ← 内部で直接生成

問題点

これは DI ではない。本当の DI は依存オブジェクトを外部から受け取る：

# DI（正）
class SubclaimScorer:
    def __init__(
        self,
        faiss_manager: FAISSIndexManager,
        fact_generator: OpenAIAtomicFactGenerator,
        openai_client: OpenAI,
    ):
        self.faiss_manager = faiss_manager

現在の設計では SubclaimScorer の単体テストに本物の OpenAI API と FAISS インデックスが必要になる。

設計全体の評価サマリー

パターン	意図の正確さ	実装の品質	改善優先度
Strategy	✓ 正しい	△ 文字列キーで型安全でない	低（機能はしている）
Template Method	△ 混乱あり	△ 2つの抽象クラスが混在	中
Factory/Dispatcher	△ Registry に近い	△ 起動時に全数生成	低
Manager	△ 責任過多	✗ FAISSがLLM応答生成を担当	高
Pipeline	✓ 正しい	✗ 文字列シリアライズ＋正規表現パース	高
Dependency Injection	✗ 名前だけ	✗ 内部生成で外部注入不可	中

優先度「高」の改善項目（構造変更不要）

FAISSIndexManager.generate_response_from_context の移動
- 移動先：OpenAIRAGAgent（すでに存在する）
- 効果：Manager の単一責任原則回復
パイプライン間データ型の構造化
- SearchResult dataclass の導入
- parse_result() の正規表現解析の廃止
- 効果：バグ発生箇所の削減、デバッグ容易性の向上

設計パターン詳細分析

概要

1. Strategy パターン

目的

適用箇所

実際の動作（subclaim_scorer.py:46-68）

問題点

改善案

2. Template Method パターン

目的

適用箇所

現在の実装（raw_data_processor.py）

問題点

3. Factory / Dispatcher パターン

目的

実際のコード（query_processor.py:26-31）

問題点

4. Manager パターン

目的

各 Manager の評価

最大の問題：FAISSIndexManager（faiss_manager.py:245-286）

5. Pipeline アーキテクチャ

目的

現在のパイプライン

問題点：ステージ間の文字列シリアライズ（faiss_manager.py:196-216）

改善案

6. Dependency Injection（DI）

目的

現在の実装（subclaim_scorer.py:29-44）

問題点

設計全体の評価サマリー

優先度「高」の改善項目（構造変更不要）

実際の動作（`subclaim_scorer.py:46-68`）

現在の実装（`raw_data_processor.py`）

実際のコード（`query_processor.py:26-31`）

最大の問題：`FAISSIndexManager`（`faiss_manager.py:245-286`）

問題点：ステージ間の文字列シリアライズ（`faiss_manager.py:196-216`）

現在の実装（`subclaim_scorer.py:29-44`）