Spaces:

fumiyaaa
/

dokoCame

Sleeping

App Files Files Community

Fumiya Imazato commited on Dec 6, 2025

Commit

57bc6ef

0 Parent(s):

Initial commit: どこカメ

Browse files

Files changed (21) hide show

.claude/settings.local.json +10 -0
00_企画書.md +168 -0
README.md +77 -0
app.py +277 -0
config/__init__.py +3 -0
config/settings.py +48 -0
core/__init__.py +13 -0
core/frame_sampler.py +95 -0
core/location_matcher.py +244 -0
core/ocr_engine.py +145 -0
core/result_aggregator.py +207 -0
core/vlm_analyzer.py +187 -0
memo.txt +21 -0
requirements.txt +30 -0
services/__init__.py +4 -0
services/gemini_client.py +188 -0
services/overpass_client.py +251 -0
utils/__init__.py +13 -0
utils/geo_utils.py +84 -0
utils/image_utils.py +63 -0
utils/text_cleaner.py +92 -0

.claude/settings.local.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "permissions": {
+    "allow": [
+      "WebSearch",
+      "WebFetch(domain:ai.google.dev)"
+    ],
+    "deny": [],
+    "ask": []
+  }
+}

00_企画書.md ADDED Viewed

	@@ -0,0 +1,168 @@

+# サービス企画書: どこカメ (Semantic Geo-Locator, Real-time Video Edition)
+**視覚情報 × オープンデータによる、次世代型・位置特定エンジン**
+- **Version:** 2.0 (Video Streaming Model)
+- **Date:** 2025/12/06
+- **Infrastructure Strategy:** Zero-Cost Cloud Prototyping（無料クラウド枠で成立する実装）
+---
+## 1. サービスコンセプト
+> **「かざすだけで、視界がそのまま住所になる」**
+GPS精度が低下する環境（ビル街・屋内・山間部など）で、
+通報者がスマホカメラを周囲に向けるだけで、
+AIが映像をリアルタイム（準リアルタイム）に解析し、
+**数秒〜数十秒以内に「ここです」と言える座標・住所をピンポイントで特定**する。
+---
+## 2. 解決する課題 (Pain Points)
+1. **GPSの限界**
+   - 高層ビル群・屋内・山間部では、数十〜数百メートルの誤差が平常的に発生する。
+   - 「現在地を送る」ボタンだけでは、指令員が現場を特定できないケースが残る。
+2. **パニック時の伝達困難**
+   - 「山と自販機しかない」「大きな道路のそば」など、
+     **住所・目印が言語化できない状況**での口頭説明は、通報者にも指令員にも大きな負担。
+   - 特に外国人観光客や土地勘のない人は、「今どこにいるか」を説明できない。
+3. **静止画送信の心理的ハードル**
+   - 「写真を撮って送る」という操作は、
+     緊急時の通報者にとって
+     - 立ち止まる
+     - 撮影する
+     - 送信ボタンを押す
+     というステップが必要で、**予想以上に心理的・操作的なハードルが高い**。
+---
+## 3. ソリューション概要: リアルタイム・セマンティック解析
+### 3.1 連続スキャニング (Continuous Scanning)
+- ユーザーは「動画モード」でカメラを起動し、
+  **そのまま周囲を見渡すだけ**。
+- システム側は、ブラウザから送られてくる映像ストリームを継続的に受信する。
+### 3.2 準リアルタイム解析 (Quasi-Real-time Analysis)
+**無料インフラでの運用（Zero-Cost）を前提**に、
+全フレームではなく「間引きフレーム」を解析対象とするサンプリング方式を採用する。
+- **OCR解析（高頻度 / 約 1.0 秒ごと）**
+  - 電柱番号、店舗看板、ビル名、信号機名、標識など、
+    画面に映る**文字情報**を高速に抽出し続ける。
+  - ユーザー画面には、
+    - 「検出中：田中歯科…」
+    - 「検出中：ローソン…」
+    のように、検出されたテキストが1〜2秒遅延で次々とポップアップ表示される。
+- **VLMによる空間推論（低頻度 / 約 5.0 秒ごと、または条件トリガー時）**
+  - Vision-Language Model (VLM) を用いて、
+    - 「コンビニの向かいにコインパーキング」
+    - 「右奥にガソリンスタンド、左にドラッグストア」
+    といった**ランドマーク間の位置関係**を文章として抽出。
+  - このテキスト化された「風景の構造情報」をもとに、
+    **位置特定ロジック（後述）を発火**させる。
+### 3.3 複合クエリによる位置特定
+抽出されたテキストと空間情報を OpenStreetMap (OSM) へ照合する。
+> **例: 検索ロジックのイメージ**
+> - 「現在地から半径1km以内」かつ
+> - 「`田中歯科` というテキストを持つ POI が存在」し
+> - その 30m 以内に 「`ローソン` が存在」する地点
+> → この組み合わせ条件を満たす候補地点をスコアリングし、
+> 最も尤度の高い座標を「推定位置」として採用。
+---
+## 4. システム構成とインフラ (Tech Stack)
+**Hugging Face Spaces を中核に、GPU不要・ゼロコスト運用を実現する構成。**
+| レイヤー | 役割 | 採用技術・仕様 |
+| :--- | :--- | :--- |
+| **Frontend** | 映像入力・UI | - **WebRTC (Gradio streaming)** によるブラウザ映像ストリーミング<br>- スマホアプリ不要、SMSリンクからブラウザ起動だけで利用開始 |
+| **Infrastructure** | 実行基盤 | - **Hugging Face Spaces (Free Tier)**<br>- CPU: 2 vCPU / RAM: 16GB 程度<br>- ランニングコスト: **0円（PoC/小規模運用想定）** |
+| **Edge Logic** | フレーム制御 | - **Sampling Middleware** により、動画全フレームを処理せず、1〜2秒ごとにフレーム抽出<br>- CPU 負荷・APIコストを制御 |
+| **OCR Engine** | 文字認識 | - **PaddleOCR** を Hugging Face 上でローカル実行<br>- 日本語・自然風景の看板文字に強いモデルを採用 |
+| **AI Brain** | 空間理解 | - **Gemini 2.5 Flash API (Free Tier)** を利用（将来は他VLMへの差し替えも可能）<br>- サンプリングした静止画を入力として、ランドマーク情報・位置関係をテキストとして構造化 |
+| **Map DB** | 地図データ | - **OpenStreetMap (Overpass API)**<br>- 無料でPOI情報やタグ検索が可能<br>- 「店舗名＋カテゴリ＋距離条件」による複合クエリで候補地点を絞り込み |
+---
+## 5. ユーザー体験フロー (UX)
+1. **アクセス**
+   - 通報者へ SMS などで URL を送信。
+   - 通報者が URL をタップするとブラウザが起動し、
+     カメラ使用の許可ダイアログが表示される。
+2. **スキャン**
+   - 画面の案内: 「カメラを周囲に向けて、ゆっくり一周してください。」
+   - 通報者はその場でスマホを回転させるだけでよく、
+     写真撮影や送信といった操作は一切不要。
+3. **フィードバック**
+   - 画面上には、AIが検出した文字情報が**1〜2秒程度のラグで順次ポップアップ**。
+     - 例: 「検出中: 〇〇医院」「検出中: 消火栓」「検出中: LAWSON」など。
+   - これにより通報者に
+     - 「ちゃんと見てくれている」
+     - 「今の映像が役に立っている」
+     という安心感を与える。
+4. **位置特定**
+   - OSM との照合が一定スコア以上になった時点で、
+     画面に以下のように表示：
+     - 「**場所を特定しました：〇〇市〇〇町3丁目 〇〇交差点付近**」
+   - 同時に、この座標・テキスト情報が指令台システムへ送信される想定。
+---
+## 6. 競合優位性 (Differentiators)
+1. **導入・運用コストゼロに近い構成**
+   - Google Maps API 等の従量課金サービスに依存せず、
+     OSS ＋ 無料枠クラウドで最小構成を実現。
+   - **PoC〜小規模本番**までは、自治体予算にほぼ影響を与えない形でスタート可能。
+2. **「動画」なのに軽い設計**
+   - 通信は WebRTC による動画ストリーミングだが、
+     サーバ側の解析は間欠的なサンプリング方式。
+   - 全フレームを処理するリアルタイム動画解析と比べて、
+     **CPUのみ＋低スペックでも成立する負荷設計**。
+3. **曖昧な状況に強いセマンティック位置特定**
+   - 住所プレートが見えない場合でも、
+     - コンビニ＋駐車場
+     - ガソリンスタンド＋大型交差点
+     といった**ランドマーク構成・風景の“組み合わせ”**から位置を推定。
+   - 文字情報だけに頼らず、
+     **「風景の構造」×「オープンデータ」の掛け合わせ**
+     による位置特定が可能。
+---
+## 7. 将来拡張性 (Future Extensions)
+- **PLATEAU (3D都市モデル) との連携**
+  - ビル群の「スカイライン（屋上形状・高さ分布）」を3D都市モデルと照合。
+  - 文字情報や店舗が乏しいエリアでも、
+    - 建物の輪郭
+    - 道路のパターン
+    - 遠景の山並み
+    などの**幾何学的特徴から方角・位置を推定**できるように拡張。
+- **マルチモーダル連携**
+  - 将来的には、音声（環境音・通話内容）も加味し、
+    - 「踏切の音」「救急車サイレンの反響」「川のせせらぎ」などを手がかりに
+      空間推論の精度をさらに高める余地がある。
+---
+**End of Document**

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+title: どこカメ (dokoCame)
+emoji: 📍
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 5.0.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 📍 どこカメ (dokoCame)
+**かざすだけで、視界がそのまま住所になる**
+スマートフォンのカメラ映像をリアルタイム解析し、GPSに頼らず位置を特定するサービスです。
+## 特徴
+- **GPS不要**: 映像内の看板、店舗名、標識などから位置を推定
+- **リアルタイム解析**: カメラを向けるだけで自動的に解析
+- **ブラウザで動作**: アプリのインストール不要
+## 技術スタック
+| コンポーネント | 技術 |
+|---------------|------|
+| Frontend | Gradio |
+| OCR | PaddleOCR (日本語対応) |
+| VLM | Gemini 2.5 Flash |
+| Map DB | OpenStreetMap (Overpass API) |
+| Hosting | Hugging Face Spaces |
+## 使い方
+1. カメラを許可
+2. スマホまたはPCのカメラで周囲を映す
+3. 看板、店舗名、標識などが検出されます
+4. 複数の情報から位置を特定します
+## セットアップ (ローカル開発)
+```bash
+# リポジトリクローン
+git clone https://huggingface.co/spaces/<username>/dokoCame
+cd dokoCame
+# 依存パッケージインストール
+pip install -r requirements.txt
+# 環境変数設定
+export GEMINI_API_KEY="your-api-key-here"
+# 起動
+python app.py
+```
+## 環境変数
+| 変数名 | 説明 | 必須 |
+|--------|------|------|
+| `GEMINI_API_KEY` | Gemini API キー | はい |
+## Hugging Face Spaces へのデプロイ
+1. Hugging Face で新しい Space を作成
+   - SDK: `Gradio`
+   - Hardware: `CPU basic` (Free)
+2. Settings → Repository secrets で `GEMINI_API_KEY` を追加
+3. このリポジトリを Space にプッシュ
+## ライセンス
+MIT License

app.py ADDED Viewed

	@@ -0,0 +1,277 @@

+"""
+どこカメ (dokoCame) - リアルタイム映像位置特定サービス
+スマホカメラの映像をリアルタイム解析し、
+GPSに頼らず位置を特定するサービス
+"""
+import time
+import asyncio
+from typing import Optional
+import numpy as np
+import gradio as gr
+from config.settings import settings
+from core.frame_sampler import FrameSampler
+from core.ocr_engine import OCREngine
+from core.vlm_analyzer import VLMAnalyzer, SpatialAnalysis
+from core.location_matcher import LocationMatcher
+from core.result_aggregator import ResultAggregator
+from utils.image_utils import resize_frame
+from utils.text_cleaner import clean_ocr_text
+class DokoCameApp:
+    """どこカメアプリケーション"""
+    def __init__(self):
+        self.frame_sampler = FrameSampler(
+            ocr_interval=settings.ocr_interval_sec,
+            vlm_interval=settings.vlm_interval_sec,
+        )
+        self.ocr_engine = OCREngine(lang=settings.ocr_lang)
+        self.vlm_analyzer = VLMAnalyzer()
+        self.location_matcher = LocationMatcher(search_radius=settings.search_radius_m)
+        self.result_aggregator = ResultAggregator(
+            buffer_size=settings.history_buffer_size,
+            confidence_threshold=settings.confidence_threshold,
+        )
+        # 状態管理
+        self._latest_ocr_texts: list = []
+        self._latest_analysis: Optional[SpatialAnalysis] = None
+        self._processing = False
+        # ヒント座標（将来的にはブラウザのGeolocation APIから取得）
+        self._hint_lat: float = 35.6812  # 東京駅（デフォルト）
+        self._hint_lon: float = 139.7671
+    def process_frame(self, frame: np.ndarray) -> dict:
+        """
+        フレームを処理
+        Returns:
+            {
+                "ocr_texts": [...],
+                "landmarks": [...],
+                "location_status": "...",
+                "result": AggregatedResult or None
+            }
+        """
+        if frame is None:
+            return self._empty_result()
+        # フレームリサイズ
+        frame = resize_frame(
+            frame, settings.frame_width, settings.frame_height
+        )
+        # サンプリング判定
+        sample = self.frame_sampler.sample(frame)
+        ocr_texts = []
+        vlm_keywords = []
+        # OCR処理
+        if sample.should_ocr:
+            raw_texts = self.ocr_engine.detect_text_only(frame)
+            ocr_texts = [clean_ocr_text(t) for t in raw_texts if t]
+            self._latest_ocr_texts = ocr_texts
+        # VLM処理
+        if sample.should_vlm and self.vlm_analyzer.is_available:
+            try:
+                analysis = self.vlm_analyzer.analyze(frame)
+                if analysis.success:
+                    self._latest_analysis = analysis
+                    vlm_keywords = self.vlm_analyzer.get_search_keywords(analysis)
+            except Exception as e:
+                print(f"VLM error: {e}")
+        # 位置照合
+        match_result = None
+        if ocr_texts or vlm_keywords:
+            match_result = self.location_matcher.match(
+                ocr_texts=self._latest_ocr_texts,
+                analysis=self._latest_analysis,
+                hint_lat=self._hint_lat,
+                hint_lon=self._hint_lon,
+            )
+            # 結果を統合
+            self.result_aggregator.add_detection(
+                ocr_texts=ocr_texts,
+                vlm_keywords=vlm_keywords,
+                match_result=match_result,
+            )
+        # 統合結果を取得
+        aggregated = self.result_aggregator.get_aggregated_result()
+        # ステータス生成
+        if aggregated.is_location_found:
+            status = f"📍 場所を特定しました: {aggregated.address_hint}"
+        elif aggregated.match_count > 0:
+            status = f"🔍 検索中... ({aggregated.match_count}件のマッチ)"
+        else:
+            status = "📷 周囲を映してください..."
+        return {
+            "ocr_texts": self._latest_ocr_texts,
+            "landmarks": aggregated.detected_landmarks,
+            "location_status": status,
+            "result": aggregated,
+        }
+    def _empty_result(self) -> dict:
+        return {
+            "ocr_texts": [],
+            "landmarks": [],
+            "location_status": "カメラを起動してください",
+            "result": None,
+        }
+    def reset(self):
+        """状態をリセット"""
+        self.frame_sampler.reset()
+        self.result_aggregator.reset()
+        self._latest_ocr_texts = []
+        self._latest_analysis = None
+    def set_hint_location(self, lat: float, lon: float):
+        """ヒント座標を設定"""
+        self._hint_lat = lat
+        self._hint_lon = lon
+# グローバルアプリインスタンス
+app = DokoCameApp()
+def process_webcam(frame):
+    """Webcam入力を処理（Gradio Image入力用）"""
+    if frame is None:
+        return None, "カメラを起動してください", ""
+    result = app.process_frame(frame)
+    # OCRテキストをフォーマット
+    ocr_display = ""
+    if result["ocr_texts"]:
+        ocr_display = "【検出テキスト】\n" + "\n".join(
+            f"• {text}" for text in result["ocr_texts"][:8]
+        )
+    # ランドマークをフォーマット
+    landmarks_display = ""
+    if result["landmarks"]:
+        landmarks_display = "\n\n【認識ランドマーク】\n" + "\n".join(
+            f"• {lm}" for lm in result["landmarks"][:5]
+        )
+    info_display = ocr_display + landmarks_display
+    # 位置情報
+    location_display = result["location_status"]
+    if result["result"] and result["result"].is_location_found:
+        r = result["result"]
+        location_display += f"\n\n座標: {r.estimated_lat:.6f}, {r.estimated_lon:.6f}"
+        location_display += f"\n信頼度: {r.confidence:.1%}"
+    return frame, location_display, info_display
+def reset_state():
+    """状態リセット"""
+    app.reset()
+    return "リセットしました"
+def create_ui():
+    """Gradio UIを作成"""
+    with gr.Blocks(
+        title="どこカメ - リアルタイム位置特定",
+        theme=gr.themes.Soft(),
+    ) as demo:
+        gr.Markdown(
+            """
+            # 📍 どこカメ (dokoCame)
+            ### かざすだけで、視界がそのまま住所になる
+            カメラを周囲に向けて、ゆっくり一周してください。
+            AIが映像を解析し、位置を特定します。
+            """
+        )
+        with gr.Row():
+            with gr.Column(scale=2):
+                # カメラ入力
+                webcam = gr.Image(
+                    sources=["webcam"],
+                    streaming=True,
+                    label="カメラ映像",
+                    mirror_webcam=False,
+                )
+            with gr.Column(scale=1):
+                # 位置情報表示
+                location_output = gr.Textbox(
+                    label="📍 位置情報",
+                    lines=5,
+                    interactive=False,
+                )
+                # 検出情報表示
+                info_output = gr.Textbox(
+                    label="🔍 検出情報",
+                    lines=10,
+                    interactive=False,
+                )
+                # リセットボタン
+                reset_btn = gr.Button("🔄 リセット", variant="secondary")
+        # イベントハンドラ
+        webcam.stream(
+            fn=process_webcam,
+            inputs=[webcam],
+            outputs=[webcam, location_output, info_output],
+        )
+        reset_btn.click(
+            fn=reset_state,
+            outputs=[location_output],
+        )
+        gr.Markdown(
+            """
+            ---
+            ### 使い方
+            1. 「カメラを許可」をクリックしてカメラを起動
+            2. スマホまたはPCのカメラで周囲を映す
+            3. 看板、店舗名、標識などが検出されます
+            4. 複数の情報から位置を特定します
+            ### 注意事項
+            - GPSは使用していません（映像のみで位置を推定）
+            - 検出精度は周囲の環境に依存します
+            - コンビニ、飲食店、駅などのランドマークが見えると精度が上がります
+            """
+        )
+    return demo
+# メイン実行
+if __name__ == "__main__":
+    # 設定の検証
+    if not settings.validate():
+        print("Warning: Some settings are not configured properly")
+        print("GEMINI_API_KEY is required for VLM features")
+    demo = create_ui()
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        share=False,
+    )

config/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from .settings import settings
2	+
3	+ __all__ = ["settings"]

config/settings.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""アプリケーション設定"""
+import os
+from dataclasses import dataclass
+from dotenv import load_dotenv
+load_dotenv()
+@dataclass
+class Settings:
+    """アプリケーション設定クラス"""
+    # Gemini API
+    gemini_api_key: str = os.environ.get("GEMINI_API_KEY", "")
+    gemini_model: str = "gemini-2.0-flash"
+    gemini_rpm_limit: int = 10  # requests per minute
+    # OCR設定
+    ocr_interval_sec: float = 1.0  # OCR実行間隔（秒）
+    ocr_lang: str = "japan"  # PaddleOCR言語設定
+    # VLM設定
+    vlm_interval_sec: float = 5.0  # VLM実行間隔（秒）
+    # 画像処理設定
+    frame_width: int = 640
+    frame_height: int = 480
+    jpeg_quality: int = 85
+    # 位置照合設定
+    search_radius_m: int = 500  # 検索半径（メートル）
+    overpass_timeout: int = 25  # Overpass APIタイムアウト（秒）
+    cache_ttl_sec: int = 300  # キャッシュ有効期限（秒）
+    # 結果統合設定
+    history_buffer_size: int = 10  # 履歴バッファサイズ
+    confidence_threshold: float = 0.6  # 信頼度閾値
+    def validate(self) -> bool:
+        """設定の検証"""
+        if not self.gemini_api_key:
+            print("Warning: GEMINI_API_KEY is not set")
+            return False
+        return True
+settings = Settings()

core/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from .frame_sampler import FrameSampler
+from .ocr_engine import OCREngine
+from .vlm_analyzer import VLMAnalyzer
+from .location_matcher import LocationMatcher
+from .result_aggregator import ResultAggregator
+__all__ = [
+    "FrameSampler",
+    "OCREngine",
+    "VLMAnalyzer",
+    "LocationMatcher",
+    "ResultAggregator",
+]

core/frame_sampler.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""フレームサンプリング制御"""
+import time
+from dataclasses import dataclass
+from typing import Optional, Tuple
+import numpy as np
+@dataclass
+class SampleResult:
+    """サンプリング結果"""
+    should_ocr: bool
+    should_vlm: bool
+    frame: Optional[np.ndarray]
+class FrameSampler:
+    """
+    映像フレームのサンプリング制御
+    - OCR用: 1秒間隔
+    - VLM用: 5秒間隔
+    """
+    def __init__(
+        self,
+        ocr_interval: float = 1.0,
+        vlm_interval: float = 5.0,
+    ):
+        self.ocr_interval = ocr_interval
+        self.vlm_interval = vlm_interval
+        self._last_ocr_time: float = 0
+        self._last_vlm_time: float = 0
+        self._frame_count: int = 0
+    def sample(self, frame: np.ndarray) -> SampleResult:
+        """
+        フレームをサンプリングし、処理すべきかどうかを判定
+        Args:
+            frame: 入力フレーム
+        Returns:
+            SampleResult: OCR/VLM処理フラグとフレーム
+        """
+        current_time = time.time()
+        self._frame_count += 1
+        should_ocr = False
+        should_vlm = False
+        # OCRサンプリング判定
+        if current_time - self._last_ocr_time >= self.ocr_interval:
+            should_ocr = True
+            self._last_ocr_time = current_time
+        # VLMサンプリング判定
+        if current_time - self._last_vlm_time >= self.vlm_interval:
+            should_vlm = True
+            self._last_vlm_time = current_time
+        return SampleResult(
+            should_ocr=should_ocr,
+            should_vlm=should_vlm,
+            frame=frame if (should_ocr or should_vlm) else None,
+        )
+    def reset(self) -> None:
+        """サンプラーをリセット"""
+        self._last_ocr_time = 0
+        self._last_vlm_time = 0
+        self._frame_count = 0
+    def get_stats(self) -> dict:
+        """統計情報を取得"""
+        return {
+            "frame_count": self._frame_count,
+            "last_ocr_time": self._last_ocr_time,
+            "last_vlm_time": self._last_vlm_time,
+        }
+    def force_vlm(self) -> None:
+        """次のフレームでVLM処理を強制実行"""
+        self._last_vlm_time = 0
+    def time_until_next_ocr(self) -> float:
+        """次のOCR処理までの時間（秒）"""
+        elapsed = time.time() - self._last_ocr_time
+        return max(0, self.ocr_interval - elapsed)
+    def time_until_next_vlm(self) -> float:
+        """次のVLM処理までの時間（秒）"""
+        elapsed = time.time() - self._last_vlm_time
+        return max(0, self.vlm_interval - elapsed)

core/location_matcher.py ADDED Viewed

	@@ -0,0 +1,244 @@

+"""位置照合ロジック"""
+from typing import List, Optional, Dict, Tuple
+from dataclasses import dataclass, field
+import re
+from services.overpass_client import OverpassClient, POI
+from core.vlm_analyzer import SpatialAnalysis, Landmark
+from utils.geo_utils import haversine_distance
+from utils.text_cleaner import clean_ocr_text
+@dataclass
+class LocationCandidate:
+    """位置候補"""
+    lat: float
+    lon: float
+    score: float
+    matched_pois: List[POI] = field(default_factory=list)
+    match_reasons: List[str] = field(default_factory=list)
+@dataclass
+class MatchResult:
+    """マッチング結果"""
+    candidates: List[LocationCandidate] = field(default_factory=list)
+    best_candidate: Optional[LocationCandidate] = None
+    total_matches: int = 0
+    search_keywords: List[str] = field(default_factory=list)
+class LocationMatcher:
+    """
+    OCR結果とVLM分析結果をOSMデータと照合し、
+    位置候補を特定する
+    """
+    # 主要コンビニチェーン名の正規化マッピング
+    CHAIN_NORMALIZATION = {
+        r"ローソン|LAWSON": "ローソン",
+        r"セブン.?イレブン|7.?ELEVEN|7.?11": "セブン-イレブン",
+        r"ファミリーマート|ファミマ|FamilyMart": "ファミリーマート",
+        r"ミニストップ|MINISTOP": "ミニストップ",
+    }
+    def __init__(
+        self,
+        overpass_client: Optional[OverpassClient] = None,
+        search_radius: int = 500,
+    ):
+        self.client = overpass_client or OverpassClient()
+        self.search_radius = search_radius
+    def _normalize_chain_name(self, name: str) -> str:
+        """チェーン店名を正規化"""
+        for pattern, normalized in self.CHAIN_NORMALIZATION.items():
+            if re.search(pattern, name, re.IGNORECASE):
+                return normalized
+        return name
+    def _extract_search_terms(
+        self,
+        ocr_texts: List[str],
+        analysis: Optional[SpatialAnalysis],
+    ) -> Tuple[List[str], List[str]]:
+        """
+        検索ワードを抽出
+        Returns:
+            (名前リスト, タイプリスト)
+        """
+        names = set()
+        types = set()
+        # OCRテキストから抽出
+        for text in ocr_texts:
+            cleaned = clean_ocr_text(text)
+            normalized = self._normalize_chain_name(cleaned)
+            if len(normalized) >= 2:  # 短すぎるテキストは除外
+                names.add(normalized)
+        # VLM分析結果から抽出
+        if analysis and analysis.success:
+            for lm in analysis.landmarks:
+                if lm.name:
+                    names.add(lm.name)
+                if lm.type and lm.type != "unknown":
+                    types.add(lm.type)
+            for text in analysis.visible_text:
+                cleaned = clean_ocr_text(text)
+                if len(cleaned) >= 2:
+                    names.add(cleaned)
+        return list(names), list(types)
+    def match(
+        self,
+        ocr_texts: List[str],
+        analysis: Optional[SpatialAnalysis],
+        hint_lat: Optional[float] = None,
+        hint_lon: Optional[float] = None,
+    ) -> MatchResult:
+        """
+        位置照合を実行
+        Args:
+            ocr_texts: OCRで検出したテキストリスト
+            analysis: VLM空間分析結果
+            hint_lat: ヒント緯度（GPS等から）
+            hint_lon: ヒント経度
+        Returns:
+            MatchResult
+        """
+        # ヒント座標がない場合は検索できない
+        if hint_lat is None or hint_lon is None:
+            # 東京駅周辺をデフォルトに（デモ用）
+            hint_lat = 35.6812
+            hint_lon = 139.7671
+        names, types = self._extract_search_terms(ocr_texts, analysis)
+        if not names and not types:
+            return MatchResult(search_keywords=[])
+        # OSM検索実行
+        search_results = self.client.search_combined(
+            names=names,
+            types=types,
+            lat=hint_lat,
+            lon=hint_lon,
+            radius=self.search_radius,
+        )
+        # 候補の集計とスコアリング
+        candidates = self._score_candidates(
+            search_results, names, types, hint_lat, hint_lon
+        )
+        # 最良候補の選択
+        best = None
+        if candidates:
+            candidates.sort(key=lambda c: c.score, reverse=True)
+            best = candidates[0]
+        return MatchResult(
+            candidates=candidates[:10],  # 上位10件
+            best_candidate=best,
+            total_matches=sum(
+                len(pois) for pois in search_results.get("names", {}).values()
+            ),
+            search_keywords=names + types,
+        )
+    def _score_candidates(
+        self,
+        search_results: Dict,
+        names: List[str],
+        types: List[str],
+        hint_lat: float,
+        hint_lon: float,
+    ) -> List[LocationCandidate]:
+        """候補をスコアリング"""
+        # POIごとにスコアを計算
+        poi_scores: Dict[int, LocationCandidate] = {}
+        # 名前マッチのPOI
+        for name, pois in search_results.get("names", {}).items():
+            for poi in pois:
+                if poi.osm_id not in poi_scores:
+                    poi_scores[poi.osm_id] = LocationCandidate(
+                        lat=poi.lat,
+                        lon=poi.lon,
+                        score=0,
+                        matched_pois=[poi],
+                        match_reasons=[],
+                    )
+                candidate = poi_scores[poi.osm_id]
+                # 名前マッチは高スコア
+                candidate.score += 10
+                candidate.match_reasons.append(f"名前マッチ: {name}")
+        # タイプマッチのPOI
+        for poi_type, pois in search_results.get("types", {}).items():
+            for poi in pois:
+                if poi.osm_id not in poi_scores:
+                    poi_scores[poi.osm_id] = LocationCandidate(
+                        lat=poi.lat,
+                        lon=poi.lon,
+                        score=0,
+                        matched_pois=[poi],
+                        match_reasons=[],
+                    )
+                candidate = poi_scores[poi.osm_id]
+                # タイプマッチは中スコア
+                candidate.score += 5
+                candidate.match_reasons.append(f"タイプマッチ: {poi_type}")
+        # 距離によるスコア調整
+        for candidate in poi_scores.values():
+            distance = haversine_distance(
+                hint_lat, hint_lon, candidate.lat, candidate.lon
+            )
+            # 近いほど高スコア（100m以内で最大ボーナス）
+            if distance < 100:
+                candidate.score += 5
+            elif distance < 300:
+                candidate.score += 3
+            elif distance < 500:
+                candidate.score += 1
+        return list(poi_scores.values())
+    def match_with_spatial_context(
+        self,
+        ocr_texts: List[str],
+        analysis: SpatialAnalysis,
+        hint_lat: float,
+        hint_lon: float,
+    ) -> MatchResult:
+        """
+        空間的コンテキストを考慮したマッチング
+        複数のランドマークの位置関係を考慮して
+        より精度の高いマッチングを行う
+        """
+        base_result = self.match(ocr_texts, analysis, hint_lat, hint_lon)
+        if not analysis.success or not analysis.spatial_relations:
+            return base_result
+        # 空間関係の記述から追加の制約を抽出
+        # 例: "ローソンの右隣にコインパーキング"
+        # → ローソンとパーキングが近接している候補を優先
+        # TODO: 空間関係のパースと追加スコアリング
+        # 現時点では基本マッチングのみ
+        return base_result

core/ocr_engine.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""PaddleOCR ラッパー"""
+from typing import List, Tuple, Optional
+from dataclasses import dataclass
+import numpy as np
+@dataclass
+class OCRResult:
+    """OCR検出結果"""
+    text: str
+    confidence: float
+    bbox: List[List[int]]  # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+class OCREngine:
+    """
+    PaddleOCRラッパークラス
+    日本語テキスト抽出に最適化
+    """
+    def __init__(self, lang: str = "japan", use_gpu: bool = False):
+        """
+        Args:
+            lang: 言語設定 ("japan", "en", "ch" など)
+            use_gpu: GPU使用フラグ（Hugging Face Free TierではFalse）
+        """
+        self.lang = lang
+        self.use_gpu = use_gpu
+        self._ocr = None
+        self._initialized = False
+    def _init_ocr(self) -> None:
+        """OCRエンジンの遅延初期化"""
+        if self._initialized:
+            return
+        try:
+            from paddleocr import PaddleOCR
+            self._ocr = PaddleOCR(
+                use_angle_cls=True,
+                lang=self.lang,
+                use_gpu=self.use_gpu,
+                show_log=False,
+                # CPU最適化設定
+                enable_mkldnn=True,
+                cpu_threads=2,
+            )
+            self._initialized = True
+        except ImportError:
+            print("Warning: PaddleOCR not installed. OCR will not work.")
+            self._initialized = False
+    def detect(self, frame: np.ndarray) -> List[OCRResult]:
+        """
+        フレームからテキストを検出
+        Args:
+            frame: 入力画像（BGR形式）
+        Returns:
+            OCRResult のリスト
+        """
+        self._init_ocr()
+        if self._ocr is None:
+            return []
+        try:
+            result = self._ocr.ocr(frame, cls=True)
+            if result is None or len(result) == 0:
+                return []
+            ocr_results = []
+            for line in result:
+                if line is None:
+                    continue
+                for item in line:
+                    if item is None or len(item) < 2:
+                        continue
+                    bbox = item[0]
+                    text_info = item[1]
+                    if text_info and len(text_info) >= 2:
+                        text = text_info[0]
+                        confidence = float(text_info[1])
+                        ocr_results.append(
+                            OCRResult(
+                                text=text,
+                                confidence=confidence,
+                                bbox=bbox,
+                            )
+                        )
+            return ocr_results
+        except Exception as e:
+            print(f"OCR error: {e}")
+            return []
+    def detect_text_only(self, frame: np.ndarray) -> List[str]:
+        """
+        テキストのみを抽出（信頼度でフィルタリング）
+        Args:
+            frame: 入力画像
+        Returns:
+            検出されたテキストのリスト
+        """
+        results = self.detect(frame)
+        # 信頼度0.5以上のテキストのみ
+        return [r.text for r in results if r.confidence >= 0.5]
+    def detect_with_positions(
+        self, frame: np.ndarray
+    ) -> List[Tuple[str, float, Tuple[int, int]]]:
+        """
+        テキストと位置情報を抽出
+        Returns:
+            (テキスト, 信頼度, 中心座標) のリスト
+        """
+        results = self.detect(frame)
+        output = []
+        for r in results:
+            if r.confidence < 0.5:
+                continue
+            # バウンディングボックスの中心を計算
+            xs = [p[0] for p in r.bbox]
+            ys = [p[1] for p in r.bbox]
+            center_x = int(sum(xs) / 4)
+            center_y = int(sum(ys) / 4)
+            output.append((r.text, r.confidence, (center_x, center_y)))
+        return output
+    @property
+    def is_available(self) -> bool:
+        """OCRエンジンが利用可能かどうか"""
+        self._init_ocr()
+        return self._initialized and self._ocr is not None

core/result_aggregator.py ADDED Viewed

	@@ -0,0 +1,207 @@

+"""検出結果の統合・スコアリング"""
+import time
+from typing import List, Optional, Dict
+from dataclasses import dataclass, field
+from collections import deque
+from core.location_matcher import LocationCandidate, MatchResult
+from utils.geo_utils import haversine_distance
+@dataclass
+class DetectionEvent:
+    """検出イベント"""
+    timestamp: float
+    ocr_texts: List[str]
+    vlm_keywords: List[str]
+    match_result: Optional[MatchResult]
+@dataclass
+class AggregatedResult:
+    """統合結果"""
+    estimated_lat: Optional[float] = None
+    estimated_lon: Optional[float] = None
+    confidence: float = 0.0
+    address_hint: str = ""
+    detected_texts: List[str] = field(default_factory=list)
+    detected_landmarks: List[str] = field(default_factory=list)
+    match_count: int = 0
+    is_location_found: bool = False
+class ResultAggregator:
+    """
+    複数の検出結果を時間軸で統合し、
+    信頼度の高い位置推定を行う
+    """
+    def __init__(
+        self,
+        buffer_size: int = 10,
+        confidence_threshold: float = 0.6,
+        consistency_window_sec: float = 10.0,
+    ):
+        self.buffer_size = buffer_size
+        self.confidence_threshold = confidence_threshold
+        self.consistency_window_sec = consistency_window_sec
+        self._events: deque = deque(maxlen=buffer_size)
+        self._detected_texts: Dict[str, int] = {}  # テキスト -> 検出回数
+        self._candidate_history: List[LocationCandidate] = []
+    def add_detection(
+        self,
+        ocr_texts: List[str],
+        vlm_keywords: List[str],
+        match_result: Optional[MatchResult],
+    ) -> None:
+        """検出イベントを追加"""
+        event = DetectionEvent(
+            timestamp=time.time(),
+            ocr_texts=ocr_texts,
+            vlm_keywords=vlm_keywords,
+            match_result=match_result,
+        )
+        self._events.append(event)
+        # テキスト検出回数を更新
+        for text in ocr_texts:
+            self._detected_texts[text] = self._detected_texts.get(text, 0) + 1
+        # 候補履歴を更新
+        if match_result and match_result.best_candidate:
+            self._candidate_history.append(match_result.best_candidate)
+            # 古い履歴を削除
+            if len(self._candidate_history) > self.buffer_size:
+                self._candidate_history = self._candidate_history[-self.buffer_size:]
+    def get_aggregated_result(self) -> AggregatedResult:
+        """統合結果を取得"""
+        if not self._events:
+            return AggregatedResult()
+        # 頻出テキストを抽出
+        frequent_texts = [
+            text
+            for text, count in sorted(
+                self._detected_texts.items(), key=lambda x: x[1], reverse=True
+            )
+            if count >= 2
+        ][:10]
+        # VLMキーワードを集約
+        vlm_keywords = set()
+        for event in self._events:
+            vlm_keywords.update(event.vlm_keywords)
+        # 候補の一貫性を評価
+        if not self._candidate_history:
+            return AggregatedResult(
+                detected_texts=frequent_texts,
+                detected_landmarks=list(vlm_keywords),
+            )
+        # 最新の候補を基準に一貫性を評価
+        latest = self._candidate_history[-1]
+        consistent_candidates = []
+        for candidate in self._candidate_history:
+            distance = haversine_distance(
+                latest.lat, latest.lon, candidate.lat, candidate.lon
+            )
+            if distance < 100:  # 100m以内なら一貫性あり
+                consistent_candidates.append(candidate)
+        # 信頼度の計算
+        consistency_ratio = len(consistent_candidates) / len(self._candidate_history)
+        avg_score = sum(c.score for c in consistent_candidates) / max(
+            len(consistent_candidates), 1
+        )
+        # 正規化されたスコア（0-1）
+        normalized_score = min(avg_score / 20, 1.0)  # 20点を最大と仮定
+        confidence = (consistency_ratio * 0.6 + normalized_score * 0.4)
+        is_found = (
+            confidence >= self.confidence_threshold
+            and len(consistent_candidates) >= 2
+        )
+        # 重心を計算（一貫性のある候補の平均）
+        if consistent_candidates:
+            avg_lat = sum(c.lat for c in consistent_candidates) / len(
+                consistent_candidates
+            )
+            avg_lon = sum(c.lon for c in consistent_candidates) / len(
+                consistent_candidates
+            )
+        else:
+            avg_lat = latest.lat
+            avg_lon = latest.lon
+        # 住所ヒントの生成
+        address_hint = self._generate_address_hint(consistent_candidates)
+        return AggregatedResult(
+            estimated_lat=avg_lat,
+            estimated_lon=avg_lon,
+            confidence=confidence,
+            address_hint=address_hint,
+            detected_texts=frequent_texts,
+            detected_landmarks=list(vlm_keywords),
+            match_count=len(self._candidate_history),
+            is_location_found=is_found,
+        )
+    def _generate_address_hint(
+        self, candidates: List[LocationCandidate]
+    ) -> str:
+        """候補から住所ヒントを生成"""
+        if not candidates:
+            return ""
+        # マッチ理由から代表的なランドマークを抽出
+        landmarks = []
+        for candidate in candidates:
+            for reason in candidate.match_reasons:
+                if "名前マッチ" in reason:
+                    # "名前マッチ: ローソン" -> "ローソン"
+                    name = reason.replace("名前マッチ: ", "")
+                    if name not in landmarks:
+                        landmarks.append(name)
+        if landmarks:
+            return f"{landmarks[0]}付近"
+        return ""
+    def reset(self) -> None:
+        """状態をリセット"""
+        self._events.clear()
+        self._detected_texts.clear()
+        self._candidate_history.clear()
+    def get_recent_texts(self, limit: int = 5) -> List[str]:
+        """最近検出されたテキストを取得"""
+        texts = []
+        for event in reversed(list(self._events)):
+            for text in event.ocr_texts:
+                if text not in texts:
+                    texts.append(text)
+                    if len(texts) >= limit:
+                        return texts
+        return texts
+    def get_detection_stats(self) -> Dict:
+        """検出統計を取得"""
+        return {
+            "event_count": len(self._events),
+            "unique_texts": len(self._detected_texts),
+            "candidate_count": len(self._candidate_history),
+            "top_texts": sorted(
+                self._detected_texts.items(), key=lambda x: x[1], reverse=True
+            )[:5],
+        }

core/vlm_analyzer.py ADDED Viewed

	@@ -0,0 +1,187 @@

+"""VLM空間推論エンジン"""
+import json
+import re
+from typing import List, Optional, Dict, Any
+from dataclasses import dataclass, field
+import numpy as np
+from services.gemini_client import GeminiClient
+from utils.image_utils import frame_to_pil
+SPATIAL_ANALYSIS_PROMPT = """
+この画像は日本の街中で撮影されたものです。
+位置特定のため、以下の情報を可能な限り抽出してJSON形式で出力してください。
+1. landmarks: 認識できるランドマーク（店舗、施設、看板など）のリスト
+   - name: 名称
+   - type: 種類（convenience_store, restaurant, hospital, station, parking, gas_station, etc.）
+   - position: 画面内での位置（left, center, right, background）
+2. spatial_relations: ランドマーク間の位置関係を日本語で記述
+3. environment: 周辺環境の特徴
+   - road_type: 道路タイプ（大通り, 住宅街の道路, 国道, 県道 など）
+   - area_type: エリアタイプ（商業地域, 住宅街, 駅前, 郊外 など）
+   - notable_features: その他の特徴的な要素
+4. visible_text: 画像内で読み取れるテキスト（看板、標識など）
+必ず有効なJSONのみを出力してください。説明文は不要です。
+出力例:
+{
+  "landmarks": [
+    {"name": "ローソン", "type": "convenience_store", "position": "center"},
+    {"name": "コインパーキング", "type": "parking", "position": "right"}
+  ],
+  "spatial_relations": [
+    "ローソンの右隣にコインパーキングがある",
+    "奥に交差点が見える"
+  ],
+  "environment": {
+    "road_type": "片側1車線の道路",
+    "area_type": "郊外の商業地域",
+    "notable_features": ["信号機あり", "歩道あり"]
+  },
+  "visible_text": ["ローソン", "P 24時間", "一方通行"]
+}
+"""
+@dataclass
+class Landmark:
+    """ランドマーク情報"""
+    name: str
+    type: str
+    position: str
+@dataclass
+class SpatialAnalysis:
+    """空間分析結果"""
+    landmarks: List[Landmark] = field(default_factory=list)
+    spatial_relations: List[str] = field(default_factory=list)
+    environment: Dict[str, Any] = field(default_factory=dict)
+    visible_text: List[str] = field(default_factory=list)
+    raw_response: str = ""
+    success: bool = True
+    error: Optional[str] = None
+class VLMAnalyzer:
+    """
+    Vision-Language Model を使用した空間推論
+    Gemini 2.5 Flash を使用して、画像からランドマーク情報と
+    空間的な位置関係を抽出する。
+    """
+    def __init__(self, gemini_client: Optional[GeminiClient] = None):
+        self.client = gemini_client or GeminiClient()
+        self.prompt = SPATIAL_ANALYSIS_PROMPT
+    def analyze(self, frame: np.ndarray) -> SpatialAnalysis:
+        """
+        フレームを分析して空間情報を抽出
+        Args:
+            frame: 入力画像（BGR形式）
+        Returns:
+            SpatialAnalysis: 分析結果
+        """
+        image = frame_to_pil(frame)
+        response = self.client.analyze_image(image, self.prompt)
+        if not response.success:
+            return SpatialAnalysis(
+                success=False,
+                error=response.error,
+                raw_response="",
+            )
+        return self._parse_response(response.text)
+    async def analyze_async(self, frame: np.ndarray) -> SpatialAnalysis:
+        """非同期で分析"""
+        image = frame_to_pil(frame)
+        response = await self.client.analyze_image_async(image, self.prompt)
+        if not response.success:
+            return SpatialAnalysis(
+                success=False,
+                error=response.error,
+                raw_response="",
+            )
+        return self._parse_response(response.text)
+    def _parse_response(self, response_text: str) -> SpatialAnalysis:
+        """Geminiレスポンスをパース"""
+        try:
+            # JSONブロックを抽出
+            json_match = re.search(r"\{[\s\S]*\}", response_text)
+            if not json_match:
+                return SpatialAnalysis(
+                    success=False,
+                    error="No JSON found in response",
+                    raw_response=response_text,
+                )
+            data = json.loads(json_match.group())
+            landmarks = []
+            for lm in data.get("landmarks", []):
+                landmarks.append(
+                    Landmark(
+                        name=lm.get("name", ""),
+                        type=lm.get("type", "unknown"),
+                        position=lm.get("position", "unknown"),
+                    )
+                )
+            return SpatialAnalysis(
+                landmarks=landmarks,
+                spatial_relations=data.get("spatial_relations", []),
+                environment=data.get("environment", {}),
+                visible_text=data.get("visible_text", []),
+                raw_response=response_text,
+                success=True,
+            )
+        except json.JSONDecodeError as e:
+            return SpatialAnalysis(
+                success=False,
+                error=f"JSON parse error: {e}",
+                raw_response=response_text,
+            )
+        except Exception as e:
+            return SpatialAnalysis(
+                success=False,
+                error=str(e),
+                raw_response=response_text,
+            )
+    def get_search_keywords(self, analysis: SpatialAnalysis) -> List[str]:
+        """分析結果から検索キーワードを抽出"""
+        keywords = []
+        # ランドマーク名
+        for lm in analysis.landmarks:
+            if lm.name:
+                keywords.append(lm.name)
+        # 可視テキスト
+        keywords.extend(analysis.visible_text)
+        # 重複除去
+        return list(set(keywords))
+    @property
+    def is_available(self) -> bool:
+        """VLMが利用可能かどうか"""
+        return self.client.is_available

memo.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+ 概要
+ スマホカメラ映像をリアルタイム解析し、GPS に頼らず位置を特定するサービス。
+ Hugging Face Spaces (Free Tier) で無料運用。
+ ---
+ ユーザー作業 (手動設定が必要)
+ 1. Gemini API キーの取得
+ 1. https://aistudio.google.com/ にアクセス
+ 2. Google アカウントでログイン
+ 3. 左メニュー「Get API key」→「Create API key」
+ 4. API キーをコピーして保存
+ 2. Hugging Face Spaces へのデプロイ (実装完了後)
+ 1. Hugging Face で「New Space」作成
+   - SDK: Gradio
+   - Hardware: CPU basic (Free)
+ 2. Settings → Repository secrets で GEMINI_API_KEY を追加

requirements.txt ADDED Viewed

	@@ -0,0 +1,30 @@

+# Gradio & WebRTC
+gradio>=5.0.0
+gradio-webrtc>=0.0.31
+# OCR - PaddlePaddle (CPU版)
+paddlepaddle==3.0.0
+paddleocr>=2.8.0
+# Gemini API
+google-generativeai>=0.8.0
+# OSM / 地図関連
+overpy>=0.7
+geopy>=2.4.0
+requests>=2.31.0
+# 画像処理
+opencv-python-headless>=4.8.0
+Pillow>=10.0.0
+numpy>=1.24.0
+# 非同期処理
+aiohttp>=3.9.0
+# 環境変数
+python-dotenv>=1.0.0
+# その他ユーティリティ
+pydantic>=2.0.0
+cachetools>=5.3.0

services/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from .gemini_client import GeminiClient
+from .overpass_client import OverpassClient
+__all__ = ["GeminiClient", "OverpassClient"]

services/gemini_client.py ADDED Viewed

	@@ -0,0 +1,188 @@

+"""Gemini API クライアント"""
+import asyncio
+import time
+from typing import Optional
+from dataclasses import dataclass
+import numpy as np
+from PIL import Image
+from config.settings import settings
+from utils.image_utils import frame_to_pil
+@dataclass
+class GeminiResponse:
+    """Gemini APIレスポンス"""
+    text: str
+    success: bool
+    error: Optional[str] = None
+class GeminiClient:
+    """
+    Gemini API クライアント
+    - 10 RPM 制限を遵守
+    - 指数バックオフでリトライ
+    """
+    def __init__(self, api_key: Optional[str] = None):
+        self.api_key = api_key or settings.gemini_api_key
+        self._client = None
+        self._model = None
+        self._last_request_time: float = 0
+        self._min_interval: float = 6.0  # 10 RPM = 6秒間隔
+        self._initialized = False
+    def _init_client(self) -> bool:
+        """クライアントの遅延初期化"""
+        if self._initialized:
+            return self._client is not None
+        if not self.api_key:
+            print("Warning: GEMINI_API_KEY is not set")
+            self._initialized = True
+            return False
+        try:
+            import google.generativeai as genai
+            genai.configure(api_key=self.api_key)
+            self._client = genai
+            self._model = genai.GenerativeModel(settings.gemini_model)
+            self._initialized = True
+            return True
+        except ImportError:
+            print("Warning: google-generativeai not installed")
+            self._initialized = True
+            return False
+        except Exception as e:
+            print(f"Gemini initialization error: {e}")
+            self._initialized = True
+            return False
+    def _wait_for_rate_limit(self) -> None:
+        """レート制限のための待機"""
+        elapsed = time.time() - self._last_request_time
+        if elapsed < self._min_interval:
+            time.sleep(self._min_interval - elapsed)
+    async def _async_wait_for_rate_limit(self) -> None:
+        """非同期レート制限待機"""
+        elapsed = time.time() - self._last_request_time
+        if elapsed < self._min_interval:
+            await asyncio.sleep(self._min_interval - elapsed)
+    def analyze_image(
+        self, image: Image.Image, prompt: str, max_retries: int = 3
+    ) -> GeminiResponse:
+        """
+        画像を分析
+        Args:
+            image: PIL Image
+            prompt: 分析プロンプト
+            max_retries: 最大リトライ回数
+        Returns:
+            GeminiResponse
+        """
+        if not self._init_client():
+            return GeminiResponse(
+                text="",
+                success=False,
+                error="Gemini client not initialized",
+            )
+        for attempt in range(max_retries):
+            try:
+                self._wait_for_rate_limit()
+                self._last_request_time = time.time()
+                response = self._model.generate_content([prompt, image])
+                return GeminiResponse(text=response.text, success=True)
+            except Exception as e:
+                error_msg = str(e)
+                if "429" in error_msg or "quota" in error_msg.lower():
+                    # レート制限エラー: 指数バックオフ
+                    wait_time = (2**attempt) * 10
+                    print(f"Rate limited, waiting {wait_time}s...")
+                    time.sleep(wait_time)
+                elif attempt < max_retries - 1:
+                    time.sleep(2**attempt)
+                else:
+                    return GeminiResponse(
+                        text="",
+                        success=False,
+                        error=error_msg,
+                    )
+        return GeminiResponse(
+            text="",
+            success=False,
+            error="Max retries exceeded",
+        )
+    def analyze_frame(
+        self, frame: np.ndarray, prompt: str
+    ) -> GeminiResponse:
+        """
+        NumPyフレームを分析
+        Args:
+            frame: NumPy配列（BGR形式）
+            prompt: 分析プロンプト
+        Returns:
+            GeminiResponse
+        """
+        image = frame_to_pil(frame)
+        return self.analyze_image(image, prompt)
+    async def analyze_image_async(
+        self, image: Image.Image, prompt: str, max_retries: int = 3
+    ) -> GeminiResponse:
+        """非同期で画像を分析"""
+        if not self._init_client():
+            return GeminiResponse(
+                text="",
+                success=False,
+                error="Gemini client not initialized",
+            )
+        for attempt in range(max_retries):
+            try:
+                await self._async_wait_for_rate_limit()
+                self._last_request_time = time.time()
+                response = await asyncio.to_thread(
+                    self._model.generate_content, [prompt, image]
+                )
+                return GeminiResponse(text=response.text, success=True)
+            except Exception as e:
+                error_msg = str(e)
+                if "429" in error_msg or "quota" in error_msg.lower():
+                    wait_time = (2**attempt) * 10
+                    await asyncio.sleep(wait_time)
+                elif attempt < max_retries - 1:
+                    await asyncio.sleep(2**attempt)
+                else:
+                    return GeminiResponse(
+                        text="",
+                        success=False,
+                        error=error_msg,
+                    )
+        return GeminiResponse(
+            text="",
+            success=False,
+            error="Max retries exceeded",
+        )
+    @property
+    def is_available(self) -> bool:
+        """クライアントが利用可能かどうか"""
+        return self._init_client()

services/overpass_client.py ADDED Viewed

	@@ -0,0 +1,251 @@

+"""OpenStreetMap Overpass API クライアント"""
+import time
+from typing import List, Optional, Dict, Any
+from dataclasses import dataclass
+from cachetools import TTLCache
+from config.settings import settings
+@dataclass
+class POI:
+    """Point of Interest"""
+    osm_id: int
+    name: str
+    lat: float
+    lon: float
+    poi_type: str
+    tags: Dict[str, str]
+class OverpassClient:
+    """
+    OpenStreetMap Overpass API クライアント
+    POI検索とキャッシュ機能を提供
+    """
+    OVERPASS_URL = "https://overpass-api.de/api/interpreter"
+    # 店舗タイプマッピング
+    SHOP_TYPE_MAPPING = {
+        "convenience_store": ["shop=convenience", "amenity=convenience"],
+        "restaurant": ["amenity=restaurant", "amenity=fast_food"],
+        "hospital": ["amenity=hospital", "amenity=clinic"],
+        "pharmacy": ["amenity=pharmacy", "shop=chemist"],
+        "gas_station": ["amenity=fuel"],
+        "parking": ["amenity=parking"],
+        "station": ["railway=station", "public_transport=station"],
+        "bank": ["amenity=bank"],
+        "post_office": ["amenity=post_office"],
+        "supermarket": ["shop=supermarket"],
+    }
+    def __init__(self, timeout: int = 25, cache_ttl: int = 300):
+        self.timeout = timeout
+        self._cache = TTLCache(maxsize=100, ttl=cache_ttl)
+        self._last_request_time: float = 0
+        self._min_interval: float = 1.0  # 最低1秒間隔
+    def _wait_for_rate_limit(self) -> None:
+        """レート制限のための待機"""
+        elapsed = time.time() - self._last_request_time
+        if elapsed < self._min_interval:
+            time.sleep(self._min_interval - elapsed)
+    def _build_name_query(
+        self,
+        name: str,
+        lat: float,
+        lon: float,
+        radius: int,
+    ) -> str:
+        """名前でPOIを検索するクエリを構築"""
+        return f"""
+[out:json][timeout:{self.timeout}];
+(
+  node["name"~"{name}",i](around:{radius},{lat},{lon});
+  way["name"~"{name}",i](around:{radius},{lat},{lon});
+);
+out center;
+"""
+    def _build_type_query(
+        self,
+        poi_type: str,
+        lat: float,
+        lon: float,
+        radius: int,
+    ) -> str:
+        """タイプでPOIを検索するクエリを構築"""
+        tags = self.SHOP_TYPE_MAPPING.get(poi_type, [])
+        if not tags:
+            return ""
+        conditions = []
+        for tag in tags:
+            key, value = tag.split("=")
+            conditions.append(f'node["{key}"="{value}"](around:{radius},{lat},{lon});')
+            conditions.append(f'way["{key}"="{value}"](around:{radius},{lat},{lon});')
+        return f"""
+[out:json][timeout:{self.timeout}];
+(
+  {chr(10).join(conditions)}
+);
+out center;
+"""
+    def _parse_response(self, data: Dict[str, Any]) -> List[POI]:
+        """Overpass APIレスポンスをパース"""
+        pois = []
+        elements = data.get("elements", [])
+        for elem in elements:
+            tags = elem.get("tags", {})
+            name = tags.get("name", "")
+            # 座標の取得（wayの場合はcenter）
+            if elem.get("type") == "way":
+                center = elem.get("center", {})
+                lat = center.get("lat", 0)
+                lon = center.get("lon", 0)
+            else:
+                lat = elem.get("lat", 0)
+                lon = elem.get("lon", 0)
+            # POIタイプの判定
+            poi_type = "unknown"
+            if tags.get("shop"):
+                poi_type = tags.get("shop")
+            elif tags.get("amenity"):
+                poi_type = tags.get("amenity")
+            elif tags.get("railway"):
+                poi_type = "station"
+            if lat and lon:
+                pois.append(
+                    POI(
+                        osm_id=elem.get("id", 0),
+                        name=name,
+                        lat=lat,
+                        lon=lon,
+                        poi_type=poi_type,
+                        tags=tags,
+                    )
+                )
+        return pois
+    def search_by_name(
+        self,
+        name: str,
+        lat: float,
+        lon: float,
+        radius: int = 500,
+    ) -> List[POI]:
+        """
+        名前でPOIを検索
+        Args:
+            name: 検索名
+            lat: 緯度
+            lon: 経度
+            radius: 検索半径（メートル）
+        Returns:
+            POIのリスト
+        """
+        cache_key = f"name:{name}:{lat:.4f}:{lon:.4f}:{radius}"
+        if cache_key in self._cache:
+            return self._cache[cache_key]
+        query = self._build_name_query(name, lat, lon, radius)
+        result = self._execute_query(query)
+        self._cache[cache_key] = result
+        return result
+    def search_by_type(
+        self,
+        poi_type: str,
+        lat: float,
+        lon: float,
+        radius: int = 500,
+    ) -> List[POI]:
+        """
+        タイプでPOIを検索
+        Args:
+            poi_type: POIタイプ
+            lat: 緯度
+            lon: 経度
+            radius: 検索半径（メートル）
+        Returns:
+            POIのリスト
+        """
+        cache_key = f"type:{poi_type}:{lat:.4f}:{lon:.4f}:{radius}"
+        if cache_key in self._cache:
+            return self._cache[cache_key]
+        query = self._build_type_query(poi_type, lat, lon, radius)
+        if not query:
+            return []
+        result = self._execute_query(query)
+        self._cache[cache_key] = result
+        return result
+    def _execute_query(self, query: str) -> List[POI]:
+        """Overpass APIクエリを実行"""
+        try:
+            import requests
+            self._wait_for_rate_limit()
+            self._last_request_time = time.time()
+            response = requests.post(
+                self.OVERPASS_URL,
+                data={"data": query},
+                timeout=self.timeout,
+            )
+            response.raise_for_status()
+            return self._parse_response(response.json())
+        except Exception as e:
+            print(f"Overpass query error: {e}")
+            return []
+    def search_combined(
+        self,
+        names: List[str],
+        types: List[str],
+        lat: float,
+        lon: float,
+        radius: int = 500,
+    ) -> Dict[str, List[POI]]:
+        """
+        複合検索（名前とタイプ両方）
+        Returns:
+            {"names": {...}, "types": {...}} の形式
+        """
+        result = {"names": {}, "types": {}}
+        for name in names:
+            pois = self.search_by_name(name, lat, lon, radius)
+            if pois:
+                result["names"][name] = pois
+        for poi_type in types:
+            pois = self.search_by_type(poi_type, lat, lon, radius)
+            if pois:
+                result["types"][poi_type] = pois
+        return result
+    def clear_cache(self) -> None:
+        """キャッシュをクリア"""
+        self._cache.clear()

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+from .image_utils import resize_frame, frame_to_base64, frame_to_pil
+from .text_cleaner import clean_ocr_text, extract_shop_names
+from .geo_utils import haversine_distance, create_bounding_box
+__all__ = [
+    "resize_frame",
+    "frame_to_base64",
+    "frame_to_pil",
+    "clean_ocr_text",
+    "extract_shop_names",
+    "haversine_distance",
+    "create_bounding_box",
+]

utils/geo_utils.py ADDED Viewed

	@@ -0,0 +1,84 @@

+"""地理情報ユーティリティ"""
+import math
+from typing import Tuple, Optional
+from dataclasses import dataclass
+@dataclass
+class BoundingBox:
+    """バウンディングボックス"""
+    min_lat: float
+    max_lat: float
+    min_lon: float
+    max_lon: float
+def haversine_distance(
+    lat1: float, lon1: float, lat2: float, lon2: float
+) -> float:
+    """
+    2点間の距離を計算（メートル）
+    Haversine formula
+    """
+    R = 6371000  # 地球の半径（メートル）
+    phi1 = math.radians(lat1)
+    phi2 = math.radians(lat2)
+    delta_phi = math.radians(lat2 - lat1)
+    delta_lambda = math.radians(lon2 - lon1)
+    a = (
+        math.sin(delta_phi / 2) ** 2
+        + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2) ** 2
+    )
+    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
+    return R * c
+def create_bounding_box(
+    lat: float, lon: float, radius_m: int
+) -> BoundingBox:
+    """
+    中心座標と半径からバウンディングボックスを作成
+    """
+    # 緯度1度あたり約111km
+    lat_delta = radius_m / 111000
+    # 経度1度あたりの距離は緯度によって変わる
+    lon_delta = radius_m / (111000 * math.cos(math.radians(lat)))
+    return BoundingBox(
+        min_lat=lat - lat_delta,
+        max_lat=lat + lat_delta,
+        min_lon=lon - lon_delta,
+        max_lon=lon + lon_delta,
+    )
+def format_coordinates(lat: float, lon: float, precision: int = 6) -> str:
+    """座標を文字列にフォーマット"""
+    return f"{lat:.{precision}f}, {lon:.{precision}f}"
+def parse_coordinates(coord_str: str) -> Optional[Tuple[float, float]]:
+    """座標文字列をパース"""
+    try:
+        parts = coord_str.replace(" ", "").split(",")
+        if len(parts) == 2:
+            return float(parts[0]), float(parts[1])
+    except ValueError:
+        pass
+    return None
+def meters_to_degrees_lat(meters: float) -> float:
+    """メートルを緯度の度に変換"""
+    return meters / 111000
+def meters_to_degrees_lon(meters: float, lat: float) -> float:
+    """メートルを経度の度に変換（緯度依存）"""
+    return meters / (111000 * math.cos(math.radians(lat)))

utils/image_utils.py ADDED Viewed

	@@ -0,0 +1,63 @@

+"""画像処理ユーティリティ"""
+import base64
+import io
+from typing import Tuple
+import cv2
+import numpy as np
+from PIL import Image
+def resize_frame(
+    frame: np.ndarray, width: int = 640, height: int = 480
+) -> np.ndarray:
+    """フレームをリサイズ"""
+    return cv2.resize(frame, (width, height), interpolation=cv2.INTER_AREA)
+def frame_to_base64(frame: np.ndarray, quality: int = 85) -> str:
+    """フレームをBase64エンコード"""
+    # BGR -> RGB
+    if len(frame.shape) == 3 and frame.shape[2] == 3:
+        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+    else:
+        frame_rgb = frame
+    img = Image.fromarray(frame_rgb)
+    buffer = io.BytesIO()
+    img.save(buffer, format="JPEG", quality=quality)
+    return base64.b64encode(buffer.getvalue()).decode("utf-8")
+def frame_to_pil(frame: np.ndarray) -> Image.Image:
+    """NumPy配列をPIL Imageに変換"""
+    if len(frame.shape) == 3 and frame.shape[2] == 3:
+        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+    else:
+        frame_rgb = frame
+    return Image.fromarray(frame_rgb)
+def pil_to_frame(img: Image.Image) -> np.ndarray:
+    """PIL ImageをNumPy配列に変換"""
+    frame = np.array(img)
+    if len(frame.shape) == 3 and frame.shape[2] == 3:
+        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
+    return frame
+def rotate_frame(frame: np.ndarray, angle: int) -> np.ndarray:
+    """フレームを回転（0, 90, 180, 270度）"""
+    if angle == 90:
+        return cv2.rotate(frame, cv2.ROTATE_90_CLOCKWISE)
+    elif angle == 180:
+        return cv2.rotate(frame, cv2.ROTATE_180)
+    elif angle == 270:
+        return cv2.rotate(frame, cv2.ROTATE_90_COUNTERCLOCKWISE)
+    return frame
+def get_frame_dimensions(frame: np.ndarray) -> Tuple[int, int]:
+    """フレームの寸法を取得 (width, height)"""
+    return frame.shape[1], frame.shape[0]

utils/text_cleaner.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""OCRテキスト正規化ユーティリティ"""
+import re
+import unicodedata
+from typing import List, Set
+def normalize_text(text: str) -> str:
+    """テキストの正規化（全角→半角、NFKC正規化）"""
+    # NFKC正規化（全角英数字→半角など）
+    text = unicodedata.normalize("NFKC", text)
+    return text.strip()
+def remove_noise(text: str) -> str:
+    """ノイズ文字の除去"""
+    # 制御文字を除去
+    text = "".join(char for char in text if not unicodedata.category(char).startswith("C"))
+    # 連続する空白を1つに
+    text = re.sub(r"\s+", " ", text)
+    return text.strip()
+def clean_ocr_text(text: str) -> str:
+    """OCR結果のクリーニング"""
+    if not text:
+        return ""
+    text = normalize_text(text)
+    text = remove_noise(text)
+    return text
+def extract_shop_names(texts: List[str]) -> List[str]:
+    """店舗名らしきテキストを抽出"""
+    shop_patterns = [
+        r"([\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff]+(?:店|屋|堂|館|院|薬局|医院|クリニック|歯科|整骨院))",
+        r"(ローソン|セブン.?イレブン|ファミリーマート|ミニストップ|デイリーヤマザキ)",
+        r"(マクドナルド|すき家|吉野家|松屋|ガスト|サイゼリヤ|CoCo壱番屋)",
+        r"(ドラッグストア|マツモトキヨシ|ウエルシア|ツルハ|スギ薬局|サンドラッグ)",
+        r"(イオン|イトーヨーカドー|西友|ダイエー|ライフ|マルエツ)",
+        r"(LAWSON|FamilyMart|7-ELEVEN|MINISTOP)",
+    ]
+    found: Set[str] = set()
+    for text in texts:
+        cleaned = clean_ocr_text(text)
+        for pattern in shop_patterns:
+            matches = re.findall(pattern, cleaned, re.IGNORECASE)
+            found.update(matches)
+    return list(found)
+def extract_address_parts(texts: List[str]) -> List[str]:
+    """住所らしきテキストを抽出"""
+    address_patterns = [
+        r"([\u4e00-\u9fff]+[都道府県])",
+        r"([\u4e00-\u9fff]+[市区町村])",
+        r"(\d+丁目)",
+        r"(\d+-\d+(?:-\d+)?)",
+    ]
+    found: Set[str] = set()
+    for text in texts:
+        cleaned = clean_ocr_text(text)
+        for pattern in address_patterns:
+            matches = re.findall(pattern, cleaned)
+            found.update(matches)
+    return list(found)
+def extract_landmarks(texts: List[str]) -> List[str]:
+    """ランドマーク名を抽出"""
+    landmark_patterns = [
+        r"([\u4e00-\u9fff]+駅)",
+        r"([\u4e00-\u9fff]+交差点)",
+        r"([\u4e00-\u9fff]+公園)",
+        r"([\u4e00-\u9fff]+橋)",
+        r"([\u4e00-\u9fff]+神社|[\u4e00-\u9fff]+寺)",
+        r"(国道\d+号)",
+        r"(県道\d+号)",
+    ]
+    found: Set[str] = set()
+    for text in texts:
+        cleaned = clean_ocr_text(text)
+        for pattern in landmark_patterns:
+            matches = re.findall(pattern, cleaned)
+            found.update(matches)
+    return list(found)