Spaces:

EQUES
/

Response-Quality-Assessment

Running

App Files Files Community

Response-Quality-Assessment / docs /context /02_design_patterns_analysis.md

Ryoya Awano

deploy: fix MedLFQA Marginal mode sample matching

19fc84f 1 day ago

preview code

raw

history blame contribute delete

9.04 kB

	# 設計パターン詳細分析

	## 概要

	本ドキュメントでは、現在の `src/` に適用されている設計パターンをコードベースに基づいて詳細に解説し、問題点と改善案を記録する。

	---

	## 1. Strategy パターン

	### 目的
	アルゴリズムをオブジェクトとして差し替え可能にする。

	### 適用箇所
	```
	src/subclaim_processor/strategies/
	├── aggregation/
	│ ├── base.py → AggregationStrategy (ABC)
	│ ├── mean.py → MeanAggregation
	│ └── max.py → MaxAggregation
	└── scoring/
	├── base.py → ScoringStrategy (ABC)
	└── product.py → ProductScoreStrategy
	```

	### 実際の動作（`subclaim_scorer.py:46-68`）
	```python
	AGGREGATION_STRATEGIES: Dict[str, Callable] = {
	"max": MaxAggregation,
	"mean": MeanAggregation,
	}
	SCORING_STRATEGIES: Dict[str, Callable] = {
	"product": ProductScoreStrategy,
	}

	def score(self, claim, retrieved_docs, aggregation_strategy, scoring_strategy):
	agg_func = AGGREGATION_STRATEGIES[aggregation_strategy]() # "mean" or "max"
	scoring_func = SCORING_STRATEGIES[scoring_strategy]() # "product"

	for doc in retrieved_docs:
	score = scoring_func.compute_score(...) # 差し替え可能

	return agg_func.aggregate(doc_scores) # 差し替え可能
	```

	### 問題点
	Strategy の選択が文字列キーの辞書で行われており、型安全性がない。`"product"` を typo しても実行時まで気づけない。

	### 改善案
	```python
	# Enum で型安全にする
	from enum import Enum

	class AggregationMethod(Enum):
	MEAN = "mean"
	MAX = "max"

	# または Protocol を使う（Python 3.8+）
	from typing import Protocol

	class AggregationStrategy(Protocol):
	def aggregate(self, scores: list[float]) -> float: ...
	```

	---

	## 2. Template Method パターン

	### 目的
	処理の骨格を親クラスで定め、詳細を子クラスに委ねる。

	### 適用箇所
	```
	src/data_processor/raw_data_processor.py → IRawDataProcessor, DatasetProcessor (ABC)
	src/calibration/base_calibration.py → ICalibration (ABC)
	src/common/llm/llm_agent.py → LLMAgent (ABC)
	```

	### 現在の実装（`raw_data_processor.py`）
	```python
	class IRawDataProcessor(ABC):
	@abstractmethod
	def get_queries(self, input_file, output_file): pass

	@abstractmethod
	def get_documents(self, query_file, output_file): pass

	class DatasetProcessor(ABC):
	@abstractmethod
	def process_queries(self, input_file, **kwargs) -> list: pass

	@abstractmethod
	def process_documents(self, query_file, db, **kwargs) -> dict: pass
	```

	### 問題点
	抽象クラスが2つある（`IRawDataProcessor` と `DatasetProcessor`）。

	\| クラス \| 役割 \|
	\|--------\|------\|
	\| `IRawDataProcessor` \| ファイルI/O込みのインターフェース \|
	\| `DatasetProcessor` \| 純粋な処理ロジックのインターフェース \|

	`QueryProcessor` は `IRawDataProcessor` を継承しつつ、内部で `DatasetProcessor` を利用するというダブル構造になっている。

	真の Template Method パターンなら、親クラスに「骨格メソッド」が実装されているはずだが、現在の実装では `QueryProcessor.get_queries()` が骨格を担っている。

	---

	## 3. Factory / Dispatcher パターン

	### 目的
	入力に基づいて適切なオブジェクトを生成・委譲する。

	### 実際のコード（`query_processor.py:26-31`）
	```python
	self.processors = {
	"fact_score": FactScoreProcessor(),
	"hotpot_qa": HotpotQAProcessor(),
	"pop_qa": PopQAProcessor(),
	"medlf_qa": MedLFQAProcessor(),
	}

	# 使用時
	processor = self.processors.get(dataset)
	```

	### 問題点
	これは「Factory パターン」ではなく「Registry パターン」に近い。

	\| パターン \| 特徴 \|
	\|---------\|------\|
	\| Factory（正） \| 必要時にオブジェクトを生成する \|
	\| Registry（現在） \| 起動時に全オブジェクトを一括生成して辞書に保持 \|

	`QueryProcessor()` を作った時点で4データセット全プロセッサがインスタンス化される。実害は少ないが、設計の意図とずれている。

	---

	## 4. Manager パターン

	### 目的
	リソースのライフサイクル（生成・使用・破棄）を一元管理する。

	### 各 Manager の評価

	\| Manager \| 管理対象 \| 実際の責任数 \| 評価 \|
	\|---------\|---------\|------------\|------\|
	\| `ConfigManager` \| YAMLファイル \| 設定読み込み + ログセットアップ \| △ 2責任 \|
	\| `FAISSIndexManager` \| FAISSインデックス \| インデックス管理 + 検索 + 応答生成 \| ✗ 3責任 \|
	\| `FileManager` \| ドキュメント \| ファイル読み込み + チャンキング \| △ 2責任 \|
	\| `OpenAIManager` \| Embeddings API \| API呼び出しのみ \| ✓ 適切 \|

	### 最大の問題：`FAISSIndexManager`（`faiss_manager.py:245-286`）
	```python
	class FAISSIndexManager:
	def upsert_file_to_faiss(...) # インデックス管理 ✓
	def search_faiss_index(...) # 検索 ✓
	def generate_response_from_context(...) # ← LLM応答生成（無関係）
	def parse_result(...) # ← 文字列パース（雑多）
	```

	`generate_response_from_context` はインデックス管理とは無関係なのに `FAISSIndexManager` に含まれている。この責任は `OpenAIRAGAgent` に移すべき。

	---

	## 5. Pipeline アーキテクチャ

	### 目的
	処理を段階的に分けて順次実行する。

	### 現在のパイプライン
	```
	main.py
	1. DataLoader → 生データ取得
	2. QueryProcessor → データ正規化
	3. FAISSIndexManager → インデックス構築
	4. SubclaimProcessor → 応答生成 → サブクレーム抽出 → スコアリング → アノテーション
	5. SplitConformalCalibration → 統計的キャリブレーション
	```

	### 問題点：ステージ間の文字列シリアライズ（`faiss_manager.py:196-216`）
	```python
	# 検索結果を文字列として返す
	results.append(
	f"{text} indice={idx} fileposition={relative_idx} score={dist:.4f}"
	# TODO reformat this ← 本人コメントあり
	)
	```

	その後 `parse_result()` で正規表現パースし直している：
	```python
	pattern = re.compile(
	r"page_content='(.?)'\smetadata=(\{.?\})\sindice=(\d+)\sfileposition=(\d+)\sscore=([\d.]+)",
	re.DOTALL,
	)
	```

	### 改善案
	dataclass でステージ間のデータ型を定義すれば `parse_result()` が不要になる：
	```python
	@dataclass
	class SearchResult:
	text: str
	indice: int
	fileposition: int
	score: float
	```

	---

	## 6. Dependency Injection（DI）

	### 目的
	依存オブジェクトを外から注入してテスト容易性を高める。

	### 現在の実装（`subclaim_scorer.py:29-44`）
	```python
	class SubclaimScorer(IDocumentScorer):
	def __init__(self, index_truncation_config, embedding_model, index_path, ...):
	self.faiss_manager = FAISSIndexManager(...) # ← 内部で直接生成
	self.gen = OpenAIAtomicFactGenerator() # ← 内部で直接生成
	self.openai_client = OpenAI() # ← 内部で直接生成
	```

	### 問題点
	これは DI ではない。本当の DI は依存オブジェクトを外部から受け取る：
	```python
	# DI（正）
	class SubclaimScorer:
	def __init__(
	self,
	faiss_manager: FAISSIndexManager,
	fact_generator: OpenAIAtomicFactGenerator,
	openai_client: OpenAI,
	):
	self.faiss_manager = faiss_manager
	```

	現在の設計では `SubclaimScorer` の単体テストに本物の OpenAI API と FAISS インデックスが必要になる。

	---

	## 設計全体の評価サマリー

	\| パターン \| 意図の正確さ \| 実装の品質 \| 改善優先度 \|
	\|---------\|------------\|----------\|----------\|
	\| Strategy \| ✓ 正しい \| △ 文字列キーで型安全でない \| 低（機能はしている） \|
	\| Template Method \| △ 混乱あり \| △ 2つの抽象クラスが混在 \| 中 \|
	\| Factory/Dispatcher \| △ Registry に近い \| △ 起動時に全数生成 \| 低 \|
	\| Manager \| △ 責任過多 \| ✗ FAISSがLLM応答生成を担当 \| 高 \|
	\| Pipeline \| ✓ 正しい \| ✗ 文字列シリアライズ＋正規表現パース \| 高 \|
	\| Dependency Injection \| ✗ 名前だけ \| ✗ 内部生成で外部注入不可 \| 中 \|

	### 優先度「高」の改善項目（構造変更不要）

	1. `FAISSIndexManager.generate_response_from_context` の移動
	- 移動先：`OpenAIRAGAgent`（すでに存在する）
	- 効果：Manager の単一責任原則回復

	2. パイプライン間データ型の構造化
	- `SearchResult` dataclass の導入
	- `parse_result()` の正規表現解析の廃止
	- 効果：バグ発生箇所の削減、デバッグ容易性の向上