Fumiya Imazato commited on
Commit ·
57bc6ef
0
Parent(s):
Initial commit: どこカメ
Browse files- .claude/settings.local.json +10 -0
- 00_企画書.md +168 -0
- README.md +77 -0
- app.py +277 -0
- config/__init__.py +3 -0
- config/settings.py +48 -0
- core/__init__.py +13 -0
- core/frame_sampler.py +95 -0
- core/location_matcher.py +244 -0
- core/ocr_engine.py +145 -0
- core/result_aggregator.py +207 -0
- core/vlm_analyzer.py +187 -0
- memo.txt +21 -0
- requirements.txt +30 -0
- services/__init__.py +4 -0
- services/gemini_client.py +188 -0
- services/overpass_client.py +251 -0
- utils/__init__.py +13 -0
- utils/geo_utils.py +84 -0
- utils/image_utils.py +63 -0
- utils/text_cleaner.py +92 -0
.claude/settings.local.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"permissions": {
|
| 3 |
+
"allow": [
|
| 4 |
+
"WebSearch",
|
| 5 |
+
"WebFetch(domain:ai.google.dev)"
|
| 6 |
+
],
|
| 7 |
+
"deny": [],
|
| 8 |
+
"ask": []
|
| 9 |
+
}
|
| 10 |
+
}
|
00_企画書.md
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# サービス企画書: どこカメ (Semantic Geo-Locator, Real-time Video Edition)
|
| 2 |
+
**視覚情報 × オープンデータによる、次世代型・位置特定エンジン**
|
| 3 |
+
|
| 4 |
+
- **Version:** 2.0 (Video Streaming Model)
|
| 5 |
+
- **Date:** 2025/12/06
|
| 6 |
+
- **Infrastructure Strategy:** Zero-Cost Cloud Prototyping(無料クラウド枠で成立する実装)
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## 1. サービスコンセプト
|
| 11 |
+
|
| 12 |
+
> **「かざすだけで、視界がそのまま住所になる」**
|
| 13 |
+
|
| 14 |
+
GPS精度が低下する環境(ビル街・屋内・山間部など)で、
|
| 15 |
+
通報者がスマホカメラを周囲に向けるだけで、
|
| 16 |
+
AIが映像をリアルタイム(準リアルタイム)に解析し、
|
| 17 |
+
**数秒〜数十秒以内に「ここです」と言える座標・住所をピンポイントで特定**する。
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## 2. 解決する課題 (Pain Points)
|
| 22 |
+
|
| 23 |
+
1. **GPSの限界**
|
| 24 |
+
- 高層ビル群・屋内・山間部では、数十〜数百メートルの誤差が平常的に発生する。
|
| 25 |
+
- 「現在地を送る」ボタンだけでは、指令員が現場を特定できないケースが残る。
|
| 26 |
+
|
| 27 |
+
2. **パニック時の伝達困難**
|
| 28 |
+
- 「山と自販機しかない」「大きな道路のそば」など、
|
| 29 |
+
**住所・目印が言語化できない状況**での口頭説明は、通報者にも指令員にも大きな負担。
|
| 30 |
+
- 特に外国人観光客や土地勘のない人は、「今どこにいるか」を説明できない。
|
| 31 |
+
|
| 32 |
+
3. **静止画送信の心理的ハードル**
|
| 33 |
+
- 「写真を撮って送る」という操作は、
|
| 34 |
+
緊急時の通報者にとって
|
| 35 |
+
- 立ち止まる
|
| 36 |
+
- 撮影する
|
| 37 |
+
- 送信ボタンを押す
|
| 38 |
+
というステップが必要で、**予想以上に心理的・操作的なハードルが高い**。
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## 3. ソリューション概要: リアルタイム・セマンティック解析
|
| 43 |
+
|
| 44 |
+
### 3.1 連続スキャニング (Continuous Scanning)
|
| 45 |
+
|
| 46 |
+
- ユーザーは「動画モード」でカメラを起動し、
|
| 47 |
+
**そのまま周囲を見渡すだけ**。
|
| 48 |
+
- システム側は、ブラウザから送られてくる映像ストリームを継続的に受信する。
|
| 49 |
+
|
| 50 |
+
### 3.2 準リアルタイム解析 (Quasi-Real-time Analysis)
|
| 51 |
+
|
| 52 |
+
**無料インフラでの運用(Zero-Cost)を前提**に、
|
| 53 |
+
全フレームではなく「間引きフレーム」を解析対象とするサンプリング方式を採用する。
|
| 54 |
+
|
| 55 |
+
- **OCR解析(高頻度 / 約 1.0 秒ごと)**
|
| 56 |
+
- 電柱番号、店舗看板、ビル名、信号機名、標識など、
|
| 57 |
+
画面に映る**文字情報**を高速に抽出し続ける。
|
| 58 |
+
- ユーザー画面には、
|
| 59 |
+
- 「検出中:田中歯科…」
|
| 60 |
+
- 「検出中:ローソン…」
|
| 61 |
+
のように、検出されたテキストが1〜2秒遅延で次々とポップアップ表示される。
|
| 62 |
+
|
| 63 |
+
- **VLMによる空間推論(低頻度 / 約 5.0 秒ごと、または条件トリガー時)**
|
| 64 |
+
- Vision-Language Model (VLM) を用いて、
|
| 65 |
+
- 「コンビニの向かいにコインパーキング」
|
| 66 |
+
- 「右奥にガソリンスタンド、左にドラッグストア」
|
| 67 |
+
といった**ランドマーク間の位置関係**を文章として抽出。
|
| 68 |
+
- このテキスト化された「風景の構造情報」をもとに、
|
| 69 |
+
**位置特定ロジック(後述)を発火**させる。
|
| 70 |
+
|
| 71 |
+
### 3.3 複合クエリによる位置特定
|
| 72 |
+
|
| 73 |
+
抽出されたテキストと空間情報を OpenStreetMap (OSM) へ照合する。
|
| 74 |
+
|
| 75 |
+
> **例: 検索ロジックのイメージ**
|
| 76 |
+
> - 「現在地から半径1km以内」かつ
|
| 77 |
+
> - 「`田中歯科` というテキストを持つ POI が存在」し
|
| 78 |
+
> - その 30m 以内に 「`ローソン` が存在」する地点
|
| 79 |
+
> → この組み合わせ条件を満たす候補地点をスコアリングし、
|
| 80 |
+
> 最も尤度の高い座標を「推定位置」として採用。
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## 4. システム構成とインフラ (Tech Stack)
|
| 85 |
+
|
| 86 |
+
**Hugging Face Spaces を中核に、GPU不要・ゼロコスト運用を実現する構成。**
|
| 87 |
+
|
| 88 |
+
| レイヤー | 役割 | 採用技術・仕様 |
|
| 89 |
+
| :--- | :--- | :--- |
|
| 90 |
+
| **Frontend** | 映像入力・UI | - **WebRTC (Gradio streaming)** によるブラウザ映像ストリーミング<br>- スマホアプリ不要、SMSリンクからブラウザ起動だけで利用開始 |
|
| 91 |
+
| **Infrastructure** | 実行基盤 | - **Hugging Face Spaces (Free Tier)**<br>- CPU: 2 vCPU / RAM: 16GB 程度<br>- ランニングコスト: **0円(PoC/小規模運用想定)** |
|
| 92 |
+
| **Edge Logic** | フレーム制御 | - **Sampling Middleware** により、動画全フレームを処理せず、1〜2秒ごとにフレーム抽出<br>- CPU 負荷・APIコストを制御 |
|
| 93 |
+
| **OCR Engine** | 文字認識 | - **PaddleOCR** を Hugging Face 上でローカル実行<br>- 日本語・自然風景の看板文字に強いモデルを採用 |
|
| 94 |
+
| **AI Brain** | 空間理解 | - **Gemini 2.5 Flash API (Free Tier)** を利用(将来は他VLMへの差し替えも可能)<br>- サンプリングした静止画を入力として、ランドマーク情報・位置関係をテキストとして構造化 |
|
| 95 |
+
| **Map DB** | 地図データ | - **OpenStreetMap (Overpass API)**<br>- 無料でPOI情報やタグ検索が可能<br>- 「店舗名+カテゴリ+距離条件」による複合クエリで候補地点を絞り込み |
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## 5. ユーザー体験フロー (UX)
|
| 100 |
+
|
| 101 |
+
1. **アクセス**
|
| 102 |
+
- 通報者へ SMS などで URL を送信。
|
| 103 |
+
- 通報者が URL をタップするとブラウザが起動し、
|
| 104 |
+
カメラ使用の許可ダイアログが表示される。
|
| 105 |
+
|
| 106 |
+
2. **スキャン**
|
| 107 |
+
- 画面の案内: 「カメラを周囲に向けて、ゆっくり一周してください。」
|
| 108 |
+
- 通報者はその場でスマホを回転させるだけでよく、
|
| 109 |
+
写真撮影や送信といった操作は一切不要。
|
| 110 |
+
|
| 111 |
+
3. **フィードバック**
|
| 112 |
+
- 画面上には、AIが検出した文字情報が**1〜2秒程度のラグで順次ポップアップ**。
|
| 113 |
+
- 例: 「検出中: 〇〇医院」「検出中: 消火栓」「検出中: LAWSON」など。
|
| 114 |
+
- これにより通報者に
|
| 115 |
+
- 「ちゃんと見てくれている」
|
| 116 |
+
- 「今の映像が役に立っている」
|
| 117 |
+
という安心感を与える。
|
| 118 |
+
|
| 119 |
+
4. **位置特定**
|
| 120 |
+
- OSM との照合が一定スコア以上になった時点で、
|
| 121 |
+
画面に以下のように表示:
|
| 122 |
+
- 「**場所を特定しました:〇〇市〇〇町3丁目 〇〇交差点付近**」
|
| 123 |
+
- 同時に、この座標・テキスト情報が指令台システムへ送信される想定。
|
| 124 |
+
|
| 125 |
+
---
|
| 126 |
+
|
| 127 |
+
## 6. 競合優位性 (Differentiators)
|
| 128 |
+
|
| 129 |
+
1. **導入・運用コストゼロに近い構成**
|
| 130 |
+
- Google Maps API 等の従量課金サービスに依存せず、
|
| 131 |
+
OSS + 無料枠クラウドで最小構成を実現。
|
| 132 |
+
- **PoC〜小規模本番**までは、自治体予算にほぼ影響を与えない形でスタート可能。
|
| 133 |
+
|
| 134 |
+
2. **「動画」なのに軽い設計**
|
| 135 |
+
- 通信は WebRTC による動画ストリーミングだが、
|
| 136 |
+
サーバ側の解析は間欠的なサンプリング方式。
|
| 137 |
+
- 全フレームを処理するリアルタイム動画解析と比べて、
|
| 138 |
+
**CPUのみ+低スペックでも成立する負荷設計**。
|
| 139 |
+
|
| 140 |
+
3. **曖昧な状況に強いセマンティック位置特定**
|
| 141 |
+
- 住所プレートが見えない場合でも、
|
| 142 |
+
- コンビニ+駐車場
|
| 143 |
+
- ガソリンスタンド+大型交差点
|
| 144 |
+
といった**ランドマーク構成・風景の“組み合わせ”**から位置を推定。
|
| 145 |
+
- 文字情報だけに頼らず、
|
| 146 |
+
**「風景の構造」×「オープンデータ」の掛け合わせ**
|
| 147 |
+
による位置特定が可能。
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## 7. 将来拡張性 (Future Extensions)
|
| 152 |
+
|
| 153 |
+
- **PLATEAU (3D都市モデル) との連携**
|
| 154 |
+
- ビル群の「スカイライン(屋上形状・高さ分布)」を3D都市モデルと照合。
|
| 155 |
+
- 文字情報や店舗が乏しいエリアでも、
|
| 156 |
+
- 建物の輪郭
|
| 157 |
+
- 道路のパターン
|
| 158 |
+
- 遠景の山並み
|
| 159 |
+
などの**幾何学的特徴から方角・位置を推定**できるように拡張。
|
| 160 |
+
|
| 161 |
+
- **マルチモーダル連携**
|
| 162 |
+
- 将来的には、音声(環境音・通話内容)も加味し、
|
| 163 |
+
- 「踏切の音」「救急車サイレンの反響」「川のせせらぎ」などを手がかりに
|
| 164 |
+
空間推論の精度をさらに高める余地がある。
|
| 165 |
+
|
| 166 |
+
---
|
| 167 |
+
|
| 168 |
+
**End of Document**
|
README.md
ADDED
|
@@ -0,0 +1,77 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: どこカメ (dokoCame)
|
| 3 |
+
emoji: 📍
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.0.0
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# 📍 どこカメ (dokoCame)
|
| 14 |
+
|
| 15 |
+
**かざすだけで、視界がそのまま住所になる**
|
| 16 |
+
|
| 17 |
+
スマートフォンのカメラ映像をリアルタイム解析し、GPSに頼らず位置を特定するサービスです。
|
| 18 |
+
|
| 19 |
+
## 特徴
|
| 20 |
+
|
| 21 |
+
- **GPS不要**: 映像内の看板、店舗名、標識などから位置を推定
|
| 22 |
+
- **リアルタイム解析**: カメラを向けるだけで自動的に解析
|
| 23 |
+
- **ブラウザで動作**: アプリのインストール不要
|
| 24 |
+
|
| 25 |
+
## 技術スタック
|
| 26 |
+
|
| 27 |
+
| コンポーネント | 技術 |
|
| 28 |
+
|---------------|------|
|
| 29 |
+
| Frontend | Gradio |
|
| 30 |
+
| OCR | PaddleOCR (日本語対応) |
|
| 31 |
+
| VLM | Gemini 2.5 Flash |
|
| 32 |
+
| Map DB | OpenStreetMap (Overpass API) |
|
| 33 |
+
| Hosting | Hugging Face Spaces |
|
| 34 |
+
|
| 35 |
+
## 使い方
|
| 36 |
+
|
| 37 |
+
1. カメラを許可
|
| 38 |
+
2. スマホまたはPCのカメラで周囲を映す
|
| 39 |
+
3. 看板、店舗名、標識などが検出されます
|
| 40 |
+
4. 複数の情報から位置を特定します
|
| 41 |
+
|
| 42 |
+
## セットアップ (ローカル開発)
|
| 43 |
+
|
| 44 |
+
```bash
|
| 45 |
+
# リポジトリクローン
|
| 46 |
+
git clone https://huggingface.co/spaces/<username>/dokoCame
|
| 47 |
+
cd dokoCame
|
| 48 |
+
|
| 49 |
+
# 依存パッケージインストール
|
| 50 |
+
pip install -r requirements.txt
|
| 51 |
+
|
| 52 |
+
# 環境変数設定
|
| 53 |
+
export GEMINI_API_KEY="your-api-key-here"
|
| 54 |
+
|
| 55 |
+
# 起動
|
| 56 |
+
python app.py
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## 環境変数
|
| 60 |
+
|
| 61 |
+
| 変数名 | 説明 | 必須 |
|
| 62 |
+
|--------|------|------|
|
| 63 |
+
| `GEMINI_API_KEY` | Gemini API キー | はい |
|
| 64 |
+
|
| 65 |
+
## Hugging Face Spaces へのデプロイ
|
| 66 |
+
|
| 67 |
+
1. Hugging Face で新しい Space を作成
|
| 68 |
+
- SDK: `Gradio`
|
| 69 |
+
- Hardware: `CPU basic` (Free)
|
| 70 |
+
|
| 71 |
+
2. Settings → Repository secrets で `GEMINI_API_KEY` を追加
|
| 72 |
+
|
| 73 |
+
3. このリポジトリを Space にプッシュ
|
| 74 |
+
|
| 75 |
+
## ライセンス
|
| 76 |
+
|
| 77 |
+
MIT License
|
app.py
ADDED
|
@@ -0,0 +1,277 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
どこカメ (dokoCame) - リアルタイム映像位置特定サービス
|
| 3 |
+
|
| 4 |
+
スマホカメラの映像をリアルタイム解析し、
|
| 5 |
+
GPSに頼らず位置を特定するサービス
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import time
|
| 9 |
+
import asyncio
|
| 10 |
+
from typing import Optional
|
| 11 |
+
import numpy as np
|
| 12 |
+
import gradio as gr
|
| 13 |
+
|
| 14 |
+
from config.settings import settings
|
| 15 |
+
from core.frame_sampler import FrameSampler
|
| 16 |
+
from core.ocr_engine import OCREngine
|
| 17 |
+
from core.vlm_analyzer import VLMAnalyzer, SpatialAnalysis
|
| 18 |
+
from core.location_matcher import LocationMatcher
|
| 19 |
+
from core.result_aggregator import ResultAggregator
|
| 20 |
+
from utils.image_utils import resize_frame
|
| 21 |
+
from utils.text_cleaner import clean_ocr_text
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
class DokoCameApp:
|
| 25 |
+
"""どこカメアプリケーション"""
|
| 26 |
+
|
| 27 |
+
def __init__(self):
|
| 28 |
+
self.frame_sampler = FrameSampler(
|
| 29 |
+
ocr_interval=settings.ocr_interval_sec,
|
| 30 |
+
vlm_interval=settings.vlm_interval_sec,
|
| 31 |
+
)
|
| 32 |
+
self.ocr_engine = OCREngine(lang=settings.ocr_lang)
|
| 33 |
+
self.vlm_analyzer = VLMAnalyzer()
|
| 34 |
+
self.location_matcher = LocationMatcher(search_radius=settings.search_radius_m)
|
| 35 |
+
self.result_aggregator = ResultAggregator(
|
| 36 |
+
buffer_size=settings.history_buffer_size,
|
| 37 |
+
confidence_threshold=settings.confidence_threshold,
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
# 状態管理
|
| 41 |
+
self._latest_ocr_texts: list = []
|
| 42 |
+
self._latest_analysis: Optional[SpatialAnalysis] = None
|
| 43 |
+
self._processing = False
|
| 44 |
+
|
| 45 |
+
# ヒント座標(将来的にはブラウザのGeolocation APIから取得)
|
| 46 |
+
self._hint_lat: float = 35.6812 # 東京駅(デフォルト)
|
| 47 |
+
self._hint_lon: float = 139.7671
|
| 48 |
+
|
| 49 |
+
def process_frame(self, frame: np.ndarray) -> dict:
|
| 50 |
+
"""
|
| 51 |
+
フレームを処理
|
| 52 |
+
|
| 53 |
+
Returns:
|
| 54 |
+
{
|
| 55 |
+
"ocr_texts": [...],
|
| 56 |
+
"landmarks": [...],
|
| 57 |
+
"location_status": "...",
|
| 58 |
+
"result": AggregatedResult or None
|
| 59 |
+
}
|
| 60 |
+
"""
|
| 61 |
+
if frame is None:
|
| 62 |
+
return self._empty_result()
|
| 63 |
+
|
| 64 |
+
# フレームリサイズ
|
| 65 |
+
frame = resize_frame(
|
| 66 |
+
frame, settings.frame_width, settings.frame_height
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
# サンプリング判定
|
| 70 |
+
sample = self.frame_sampler.sample(frame)
|
| 71 |
+
|
| 72 |
+
ocr_texts = []
|
| 73 |
+
vlm_keywords = []
|
| 74 |
+
|
| 75 |
+
# OCR処理
|
| 76 |
+
if sample.should_ocr:
|
| 77 |
+
raw_texts = self.ocr_engine.detect_text_only(frame)
|
| 78 |
+
ocr_texts = [clean_ocr_text(t) for t in raw_texts if t]
|
| 79 |
+
self._latest_ocr_texts = ocr_texts
|
| 80 |
+
|
| 81 |
+
# VLM処理
|
| 82 |
+
if sample.should_vlm and self.vlm_analyzer.is_available:
|
| 83 |
+
try:
|
| 84 |
+
analysis = self.vlm_analyzer.analyze(frame)
|
| 85 |
+
if analysis.success:
|
| 86 |
+
self._latest_analysis = analysis
|
| 87 |
+
vlm_keywords = self.vlm_analyzer.get_search_keywords(analysis)
|
| 88 |
+
except Exception as e:
|
| 89 |
+
print(f"VLM error: {e}")
|
| 90 |
+
|
| 91 |
+
# 位置照合
|
| 92 |
+
match_result = None
|
| 93 |
+
if ocr_texts or vlm_keywords:
|
| 94 |
+
match_result = self.location_matcher.match(
|
| 95 |
+
ocr_texts=self._latest_ocr_texts,
|
| 96 |
+
analysis=self._latest_analysis,
|
| 97 |
+
hint_lat=self._hint_lat,
|
| 98 |
+
hint_lon=self._hint_lon,
|
| 99 |
+
)
|
| 100 |
+
|
| 101 |
+
# 結果を統合
|
| 102 |
+
self.result_aggregator.add_detection(
|
| 103 |
+
ocr_texts=ocr_texts,
|
| 104 |
+
vlm_keywords=vlm_keywords,
|
| 105 |
+
match_result=match_result,
|
| 106 |
+
)
|
| 107 |
+
|
| 108 |
+
# 統合結果を取得
|
| 109 |
+
aggregated = self.result_aggregator.get_aggregated_result()
|
| 110 |
+
|
| 111 |
+
# ステータス生成
|
| 112 |
+
if aggregated.is_location_found:
|
| 113 |
+
status = f"📍 場所を特定しました: {aggregated.address_hint}"
|
| 114 |
+
elif aggregated.match_count > 0:
|
| 115 |
+
status = f"🔍 検索中... ({aggregated.match_count}件のマッチ)"
|
| 116 |
+
else:
|
| 117 |
+
status = "📷 周囲を映してください..."
|
| 118 |
+
|
| 119 |
+
return {
|
| 120 |
+
"ocr_texts": self._latest_ocr_texts,
|
| 121 |
+
"landmarks": aggregated.detected_landmarks,
|
| 122 |
+
"location_status": status,
|
| 123 |
+
"result": aggregated,
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
def _empty_result(self) -> dict:
|
| 127 |
+
return {
|
| 128 |
+
"ocr_texts": [],
|
| 129 |
+
"landmarks": [],
|
| 130 |
+
"location_status": "カメラを起動してください",
|
| 131 |
+
"result": None,
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
def reset(self):
|
| 135 |
+
"""状態をリセット"""
|
| 136 |
+
self.frame_sampler.reset()
|
| 137 |
+
self.result_aggregator.reset()
|
| 138 |
+
self._latest_ocr_texts = []
|
| 139 |
+
self._latest_analysis = None
|
| 140 |
+
|
| 141 |
+
def set_hint_location(self, lat: float, lon: float):
|
| 142 |
+
"""ヒント座標を設定"""
|
| 143 |
+
self._hint_lat = lat
|
| 144 |
+
self._hint_lon = lon
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
# グローバルアプリインスタンス
|
| 148 |
+
app = DokoCameApp()
|
| 149 |
+
|
| 150 |
+
|
| 151 |
+
def process_webcam(frame):
|
| 152 |
+
"""Webcam入力を処理(Gradio Image入力用)"""
|
| 153 |
+
if frame is None:
|
| 154 |
+
return None, "カメラを起動してください", ""
|
| 155 |
+
|
| 156 |
+
result = app.process_frame(frame)
|
| 157 |
+
|
| 158 |
+
# OCRテキストをフォーマット
|
| 159 |
+
ocr_display = ""
|
| 160 |
+
if result["ocr_texts"]:
|
| 161 |
+
ocr_display = "【検出テキスト】\n" + "\n".join(
|
| 162 |
+
f"• {text}" for text in result["ocr_texts"][:8]
|
| 163 |
+
)
|
| 164 |
+
|
| 165 |
+
# ランドマークをフォーマット
|
| 166 |
+
landmarks_display = ""
|
| 167 |
+
if result["landmarks"]:
|
| 168 |
+
landmarks_display = "\n\n【認識ランドマーク】\n" + "\n".join(
|
| 169 |
+
f"• {lm}" for lm in result["landmarks"][:5]
|
| 170 |
+
)
|
| 171 |
+
|
| 172 |
+
info_display = ocr_display + landmarks_display
|
| 173 |
+
|
| 174 |
+
# 位置情報
|
| 175 |
+
location_display = result["location_status"]
|
| 176 |
+
if result["result"] and result["result"].is_location_found:
|
| 177 |
+
r = result["result"]
|
| 178 |
+
location_display += f"\n\n座標: {r.estimated_lat:.6f}, {r.estimated_lon:.6f}"
|
| 179 |
+
location_display += f"\n信頼度: {r.confidence:.1%}"
|
| 180 |
+
|
| 181 |
+
return frame, location_display, info_display
|
| 182 |
+
|
| 183 |
+
|
| 184 |
+
def reset_state():
|
| 185 |
+
"""状態リセット"""
|
| 186 |
+
app.reset()
|
| 187 |
+
return "リセットしました"
|
| 188 |
+
|
| 189 |
+
|
| 190 |
+
def create_ui():
|
| 191 |
+
"""Gradio UIを作成"""
|
| 192 |
+
with gr.Blocks(
|
| 193 |
+
title="どこカメ - リアルタイム位置特定",
|
| 194 |
+
theme=gr.themes.Soft(),
|
| 195 |
+
) as demo:
|
| 196 |
+
gr.Markdown(
|
| 197 |
+
"""
|
| 198 |
+
# 📍 どこカメ (dokoCame)
|
| 199 |
+
### かざすだけで、視界がそのまま住所になる
|
| 200 |
+
|
| 201 |
+
カメラを周囲に向けて、ゆっくり一周してください。
|
| 202 |
+
AIが映像を解析し、位置を特定します。
|
| 203 |
+
"""
|
| 204 |
+
)
|
| 205 |
+
|
| 206 |
+
with gr.Row():
|
| 207 |
+
with gr.Column(scale=2):
|
| 208 |
+
# カメラ入力
|
| 209 |
+
webcam = gr.Image(
|
| 210 |
+
sources=["webcam"],
|
| 211 |
+
streaming=True,
|
| 212 |
+
label="カメラ映像",
|
| 213 |
+
mirror_webcam=False,
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
with gr.Column(scale=1):
|
| 217 |
+
# 位置情報表示
|
| 218 |
+
location_output = gr.Textbox(
|
| 219 |
+
label="📍 位置情報",
|
| 220 |
+
lines=5,
|
| 221 |
+
interactive=False,
|
| 222 |
+
)
|
| 223 |
+
|
| 224 |
+
# 検出情報表示
|
| 225 |
+
info_output = gr.Textbox(
|
| 226 |
+
label="🔍 検出情報",
|
| 227 |
+
lines=10,
|
| 228 |
+
interactive=False,
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
# リセットボタン
|
| 232 |
+
reset_btn = gr.Button("🔄 リセット", variant="secondary")
|
| 233 |
+
|
| 234 |
+
# イベントハンドラ
|
| 235 |
+
webcam.stream(
|
| 236 |
+
fn=process_webcam,
|
| 237 |
+
inputs=[webcam],
|
| 238 |
+
outputs=[webcam, location_output, info_output],
|
| 239 |
+
)
|
| 240 |
+
|
| 241 |
+
reset_btn.click(
|
| 242 |
+
fn=reset_state,
|
| 243 |
+
outputs=[location_output],
|
| 244 |
+
)
|
| 245 |
+
|
| 246 |
+
gr.Markdown(
|
| 247 |
+
"""
|
| 248 |
+
---
|
| 249 |
+
### 使い方
|
| 250 |
+
1. 「カメラを許可」をクリックしてカメラを起動
|
| 251 |
+
2. スマホまたはPCのカメラで周囲を映す
|
| 252 |
+
3. 看板、店舗名、標識などが検出されます
|
| 253 |
+
4. 複数の情報から位置を特定します
|
| 254 |
+
|
| 255 |
+
### 注意事項
|
| 256 |
+
- GPSは使用していません(映像のみで位置を推定)
|
| 257 |
+
- 検出精度は周囲の環境に依存します
|
| 258 |
+
- コンビニ、飲食店、駅などのランドマークが見えると精度が上がります
|
| 259 |
+
"""
|
| 260 |
+
)
|
| 261 |
+
|
| 262 |
+
return demo
|
| 263 |
+
|
| 264 |
+
|
| 265 |
+
# メイン実行
|
| 266 |
+
if __name__ == "__main__":
|
| 267 |
+
# 設定の検証
|
| 268 |
+
if not settings.validate():
|
| 269 |
+
print("Warning: Some settings are not configured properly")
|
| 270 |
+
print("GEMINI_API_KEY is required for VLM features")
|
| 271 |
+
|
| 272 |
+
demo = create_ui()
|
| 273 |
+
demo.launch(
|
| 274 |
+
server_name="0.0.0.0",
|
| 275 |
+
server_port=7860,
|
| 276 |
+
share=False,
|
| 277 |
+
)
|
config/__init__.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .settings import settings
|
| 2 |
+
|
| 3 |
+
__all__ = ["settings"]
|
config/settings.py
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""アプリケーション設定"""
|
| 2 |
+
|
| 3 |
+
import os
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
|
| 7 |
+
load_dotenv()
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
@dataclass
|
| 11 |
+
class Settings:
|
| 12 |
+
"""アプリケーション設定クラス"""
|
| 13 |
+
|
| 14 |
+
# Gemini API
|
| 15 |
+
gemini_api_key: str = os.environ.get("GEMINI_API_KEY", "")
|
| 16 |
+
gemini_model: str = "gemini-2.0-flash"
|
| 17 |
+
gemini_rpm_limit: int = 10 # requests per minute
|
| 18 |
+
|
| 19 |
+
# OCR設定
|
| 20 |
+
ocr_interval_sec: float = 1.0 # OCR実行間隔(秒)
|
| 21 |
+
ocr_lang: str = "japan" # PaddleOCR言語設定
|
| 22 |
+
|
| 23 |
+
# VLM設定
|
| 24 |
+
vlm_interval_sec: float = 5.0 # VLM実行間隔(秒)
|
| 25 |
+
|
| 26 |
+
# 画像処理設定
|
| 27 |
+
frame_width: int = 640
|
| 28 |
+
frame_height: int = 480
|
| 29 |
+
jpeg_quality: int = 85
|
| 30 |
+
|
| 31 |
+
# 位置照合設定
|
| 32 |
+
search_radius_m: int = 500 # 検索半径(メートル)
|
| 33 |
+
overpass_timeout: int = 25 # Overpass APIタイムアウト(秒)
|
| 34 |
+
cache_ttl_sec: int = 300 # キャッシュ有効期限(秒)
|
| 35 |
+
|
| 36 |
+
# 結果統合設定
|
| 37 |
+
history_buffer_size: int = 10 # 履歴バッファサイズ
|
| 38 |
+
confidence_threshold: float = 0.6 # 信頼度閾値
|
| 39 |
+
|
| 40 |
+
def validate(self) -> bool:
|
| 41 |
+
"""設定の検証"""
|
| 42 |
+
if not self.gemini_api_key:
|
| 43 |
+
print("Warning: GEMINI_API_KEY is not set")
|
| 44 |
+
return False
|
| 45 |
+
return True
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
settings = Settings()
|
core/__init__.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .frame_sampler import FrameSampler
|
| 2 |
+
from .ocr_engine import OCREngine
|
| 3 |
+
from .vlm_analyzer import VLMAnalyzer
|
| 4 |
+
from .location_matcher import LocationMatcher
|
| 5 |
+
from .result_aggregator import ResultAggregator
|
| 6 |
+
|
| 7 |
+
__all__ = [
|
| 8 |
+
"FrameSampler",
|
| 9 |
+
"OCREngine",
|
| 10 |
+
"VLMAnalyzer",
|
| 11 |
+
"LocationMatcher",
|
| 12 |
+
"ResultAggregator",
|
| 13 |
+
]
|
core/frame_sampler.py
ADDED
|
@@ -0,0 +1,95 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""フレームサンプリング制御"""
|
| 2 |
+
|
| 3 |
+
import time
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
from typing import Optional, Tuple
|
| 6 |
+
import numpy as np
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
@dataclass
|
| 10 |
+
class SampleResult:
|
| 11 |
+
"""サンプリング結果"""
|
| 12 |
+
|
| 13 |
+
should_ocr: bool
|
| 14 |
+
should_vlm: bool
|
| 15 |
+
frame: Optional[np.ndarray]
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
class FrameSampler:
|
| 19 |
+
"""
|
| 20 |
+
映像フレームのサンプリング制御
|
| 21 |
+
- OCR用: 1秒間隔
|
| 22 |
+
- VLM用: 5秒間隔
|
| 23 |
+
"""
|
| 24 |
+
|
| 25 |
+
def __init__(
|
| 26 |
+
self,
|
| 27 |
+
ocr_interval: float = 1.0,
|
| 28 |
+
vlm_interval: float = 5.0,
|
| 29 |
+
):
|
| 30 |
+
self.ocr_interval = ocr_interval
|
| 31 |
+
self.vlm_interval = vlm_interval
|
| 32 |
+
|
| 33 |
+
self._last_ocr_time: float = 0
|
| 34 |
+
self._last_vlm_time: float = 0
|
| 35 |
+
self._frame_count: int = 0
|
| 36 |
+
|
| 37 |
+
def sample(self, frame: np.ndarray) -> SampleResult:
|
| 38 |
+
"""
|
| 39 |
+
フレームをサンプリングし、処理すべきかどうかを判定
|
| 40 |
+
|
| 41 |
+
Args:
|
| 42 |
+
frame: 入力フレーム
|
| 43 |
+
|
| 44 |
+
Returns:
|
| 45 |
+
SampleResult: OCR/VLM処理フラグとフレーム
|
| 46 |
+
"""
|
| 47 |
+
current_time = time.time()
|
| 48 |
+
self._frame_count += 1
|
| 49 |
+
|
| 50 |
+
should_ocr = False
|
| 51 |
+
should_vlm = False
|
| 52 |
+
|
| 53 |
+
# OCRサンプリング判定
|
| 54 |
+
if current_time - self._last_ocr_time >= self.ocr_interval:
|
| 55 |
+
should_ocr = True
|
| 56 |
+
self._last_ocr_time = current_time
|
| 57 |
+
|
| 58 |
+
# VLMサンプリング判定
|
| 59 |
+
if current_time - self._last_vlm_time >= self.vlm_interval:
|
| 60 |
+
should_vlm = True
|
| 61 |
+
self._last_vlm_time = current_time
|
| 62 |
+
|
| 63 |
+
return SampleResult(
|
| 64 |
+
should_ocr=should_ocr,
|
| 65 |
+
should_vlm=should_vlm,
|
| 66 |
+
frame=frame if (should_ocr or should_vlm) else None,
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
def reset(self) -> None:
|
| 70 |
+
"""サンプラーをリセット"""
|
| 71 |
+
self._last_ocr_time = 0
|
| 72 |
+
self._last_vlm_time = 0
|
| 73 |
+
self._frame_count = 0
|
| 74 |
+
|
| 75 |
+
def get_stats(self) -> dict:
|
| 76 |
+
"""統計情報を取得"""
|
| 77 |
+
return {
|
| 78 |
+
"frame_count": self._frame_count,
|
| 79 |
+
"last_ocr_time": self._last_ocr_time,
|
| 80 |
+
"last_vlm_time": self._last_vlm_time,
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
def force_vlm(self) -> None:
|
| 84 |
+
"""次のフレームでVLM処理を強制実行"""
|
| 85 |
+
self._last_vlm_time = 0
|
| 86 |
+
|
| 87 |
+
def time_until_next_ocr(self) -> float:
|
| 88 |
+
"""次のOCR処理までの時間(秒)"""
|
| 89 |
+
elapsed = time.time() - self._last_ocr_time
|
| 90 |
+
return max(0, self.ocr_interval - elapsed)
|
| 91 |
+
|
| 92 |
+
def time_until_next_vlm(self) -> float:
|
| 93 |
+
"""次のVLM処理までの時間(秒)"""
|
| 94 |
+
elapsed = time.time() - self._last_vlm_time
|
| 95 |
+
return max(0, self.vlm_interval - elapsed)
|
core/location_matcher.py
ADDED
|
@@ -0,0 +1,244 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""位置照合ロジック"""
|
| 2 |
+
|
| 3 |
+
from typing import List, Optional, Dict, Tuple
|
| 4 |
+
from dataclasses import dataclass, field
|
| 5 |
+
import re
|
| 6 |
+
|
| 7 |
+
from services.overpass_client import OverpassClient, POI
|
| 8 |
+
from core.vlm_analyzer import SpatialAnalysis, Landmark
|
| 9 |
+
from utils.geo_utils import haversine_distance
|
| 10 |
+
from utils.text_cleaner import clean_ocr_text
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
@dataclass
|
| 14 |
+
class LocationCandidate:
|
| 15 |
+
"""位置候補"""
|
| 16 |
+
|
| 17 |
+
lat: float
|
| 18 |
+
lon: float
|
| 19 |
+
score: float
|
| 20 |
+
matched_pois: List[POI] = field(default_factory=list)
|
| 21 |
+
match_reasons: List[str] = field(default_factory=list)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@dataclass
|
| 25 |
+
class MatchResult:
|
| 26 |
+
"""マッチング結果"""
|
| 27 |
+
|
| 28 |
+
candidates: List[LocationCandidate] = field(default_factory=list)
|
| 29 |
+
best_candidate: Optional[LocationCandidate] = None
|
| 30 |
+
total_matches: int = 0
|
| 31 |
+
search_keywords: List[str] = field(default_factory=list)
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
class LocationMatcher:
|
| 35 |
+
"""
|
| 36 |
+
OCR結果とVLM分析結果をOSMデータと照合し、
|
| 37 |
+
位置候補を特定する
|
| 38 |
+
"""
|
| 39 |
+
|
| 40 |
+
# 主要コンビニチェーン名の正規化マッピング
|
| 41 |
+
CHAIN_NORMALIZATION = {
|
| 42 |
+
r"ローソン|LAWSON": "ローソン",
|
| 43 |
+
r"セブン.?イレブン|7.?ELEVEN|7.?11": "セブン-イレブン",
|
| 44 |
+
r"ファミリーマート|ファミマ|FamilyMart": "ファミリーマート",
|
| 45 |
+
r"ミニストップ|MINISTOP": "ミニストップ",
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
def __init__(
|
| 49 |
+
self,
|
| 50 |
+
overpass_client: Optional[OverpassClient] = None,
|
| 51 |
+
search_radius: int = 500,
|
| 52 |
+
):
|
| 53 |
+
self.client = overpass_client or OverpassClient()
|
| 54 |
+
self.search_radius = search_radius
|
| 55 |
+
|
| 56 |
+
def _normalize_chain_name(self, name: str) -> str:
|
| 57 |
+
"""チェーン店名を正規化"""
|
| 58 |
+
for pattern, normalized in self.CHAIN_NORMALIZATION.items():
|
| 59 |
+
if re.search(pattern, name, re.IGNORECASE):
|
| 60 |
+
return normalized
|
| 61 |
+
return name
|
| 62 |
+
|
| 63 |
+
def _extract_search_terms(
|
| 64 |
+
self,
|
| 65 |
+
ocr_texts: List[str],
|
| 66 |
+
analysis: Optional[SpatialAnalysis],
|
| 67 |
+
) -> Tuple[List[str], List[str]]:
|
| 68 |
+
"""
|
| 69 |
+
検索ワードを抽出
|
| 70 |
+
|
| 71 |
+
Returns:
|
| 72 |
+
(名前リスト, タイプリスト)
|
| 73 |
+
"""
|
| 74 |
+
names = set()
|
| 75 |
+
types = set()
|
| 76 |
+
|
| 77 |
+
# OCRテキストから抽出
|
| 78 |
+
for text in ocr_texts:
|
| 79 |
+
cleaned = clean_ocr_text(text)
|
| 80 |
+
normalized = self._normalize_chain_name(cleaned)
|
| 81 |
+
if len(normalized) >= 2: # 短すぎるテキストは除外
|
| 82 |
+
names.add(normalized)
|
| 83 |
+
|
| 84 |
+
# VLM分析結果から抽出
|
| 85 |
+
if analysis and analysis.success:
|
| 86 |
+
for lm in analysis.landmarks:
|
| 87 |
+
if lm.name:
|
| 88 |
+
names.add(lm.name)
|
| 89 |
+
if lm.type and lm.type != "unknown":
|
| 90 |
+
types.add(lm.type)
|
| 91 |
+
|
| 92 |
+
for text in analysis.visible_text:
|
| 93 |
+
cleaned = clean_ocr_text(text)
|
| 94 |
+
if len(cleaned) >= 2:
|
| 95 |
+
names.add(cleaned)
|
| 96 |
+
|
| 97 |
+
return list(names), list(types)
|
| 98 |
+
|
| 99 |
+
def match(
|
| 100 |
+
self,
|
| 101 |
+
ocr_texts: List[str],
|
| 102 |
+
analysis: Optional[SpatialAnalysis],
|
| 103 |
+
hint_lat: Optional[float] = None,
|
| 104 |
+
hint_lon: Optional[float] = None,
|
| 105 |
+
) -> MatchResult:
|
| 106 |
+
"""
|
| 107 |
+
位置照合を実行
|
| 108 |
+
|
| 109 |
+
Args:
|
| 110 |
+
ocr_texts: OCRで検出したテキストリスト
|
| 111 |
+
analysis: VLM空間分析結果
|
| 112 |
+
hint_lat: ヒント緯度(GPS等から)
|
| 113 |
+
hint_lon: ヒント経度
|
| 114 |
+
|
| 115 |
+
Returns:
|
| 116 |
+
MatchResult
|
| 117 |
+
"""
|
| 118 |
+
# ヒント座標がない場合は検索できない
|
| 119 |
+
if hint_lat is None or hint_lon is None:
|
| 120 |
+
# 東京駅周辺をデフォルトに(デモ用)
|
| 121 |
+
hint_lat = 35.6812
|
| 122 |
+
hint_lon = 139.7671
|
| 123 |
+
|
| 124 |
+
names, types = self._extract_search_terms(ocr_texts, analysis)
|
| 125 |
+
|
| 126 |
+
if not names and not types:
|
| 127 |
+
return MatchResult(search_keywords=[])
|
| 128 |
+
|
| 129 |
+
# OSM検索実行
|
| 130 |
+
search_results = self.client.search_combined(
|
| 131 |
+
names=names,
|
| 132 |
+
types=types,
|
| 133 |
+
lat=hint_lat,
|
| 134 |
+
lon=hint_lon,
|
| 135 |
+
radius=self.search_radius,
|
| 136 |
+
)
|
| 137 |
+
|
| 138 |
+
# 候補の集計とスコアリング
|
| 139 |
+
candidates = self._score_candidates(
|
| 140 |
+
search_results, names, types, hint_lat, hint_lon
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
# 最良候補の選択
|
| 144 |
+
best = None
|
| 145 |
+
if candidates:
|
| 146 |
+
candidates.sort(key=lambda c: c.score, reverse=True)
|
| 147 |
+
best = candidates[0]
|
| 148 |
+
|
| 149 |
+
return MatchResult(
|
| 150 |
+
candidates=candidates[:10], # 上位10件
|
| 151 |
+
best_candidate=best,
|
| 152 |
+
total_matches=sum(
|
| 153 |
+
len(pois) for pois in search_results.get("names", {}).values()
|
| 154 |
+
),
|
| 155 |
+
search_keywords=names + types,
|
| 156 |
+
)
|
| 157 |
+
|
| 158 |
+
def _score_candidates(
|
| 159 |
+
self,
|
| 160 |
+
search_results: Dict,
|
| 161 |
+
names: List[str],
|
| 162 |
+
types: List[str],
|
| 163 |
+
hint_lat: float,
|
| 164 |
+
hint_lon: float,
|
| 165 |
+
) -> List[LocationCandidate]:
|
| 166 |
+
"""候補をスコアリング"""
|
| 167 |
+
# POIごとにスコアを計算
|
| 168 |
+
poi_scores: Dict[int, LocationCandidate] = {}
|
| 169 |
+
|
| 170 |
+
# 名前マッチのPOI
|
| 171 |
+
for name, pois in search_results.get("names", {}).items():
|
| 172 |
+
for poi in pois:
|
| 173 |
+
if poi.osm_id not in poi_scores:
|
| 174 |
+
poi_scores[poi.osm_id] = LocationCandidate(
|
| 175 |
+
lat=poi.lat,
|
| 176 |
+
lon=poi.lon,
|
| 177 |
+
score=0,
|
| 178 |
+
matched_pois=[poi],
|
| 179 |
+
match_reasons=[],
|
| 180 |
+
)
|
| 181 |
+
|
| 182 |
+
candidate = poi_scores[poi.osm_id]
|
| 183 |
+
# 名前マッチは高スコア
|
| 184 |
+
candidate.score += 10
|
| 185 |
+
candidate.match_reasons.append(f"名前マッチ: {name}")
|
| 186 |
+
|
| 187 |
+
# タイプマッチのPOI
|
| 188 |
+
for poi_type, pois in search_results.get("types", {}).items():
|
| 189 |
+
for poi in pois:
|
| 190 |
+
if poi.osm_id not in poi_scores:
|
| 191 |
+
poi_scores[poi.osm_id] = LocationCandidate(
|
| 192 |
+
lat=poi.lat,
|
| 193 |
+
lon=poi.lon,
|
| 194 |
+
score=0,
|
| 195 |
+
matched_pois=[poi],
|
| 196 |
+
match_reasons=[],
|
| 197 |
+
)
|
| 198 |
+
|
| 199 |
+
candidate = poi_scores[poi.osm_id]
|
| 200 |
+
# タイプマッチは中スコア
|
| 201 |
+
candidate.score += 5
|
| 202 |
+
candidate.match_reasons.append(f"タイプマッチ: {poi_type}")
|
| 203 |
+
|
| 204 |
+
# 距離によるスコア調整
|
| 205 |
+
for candidate in poi_scores.values():
|
| 206 |
+
distance = haversine_distance(
|
| 207 |
+
hint_lat, hint_lon, candidate.lat, candidate.lon
|
| 208 |
+
)
|
| 209 |
+
# 近いほど高スコア(100m以内で最大ボーナス)
|
| 210 |
+
if distance < 100:
|
| 211 |
+
candidate.score += 5
|
| 212 |
+
elif distance < 300:
|
| 213 |
+
candidate.score += 3
|
| 214 |
+
elif distance < 500:
|
| 215 |
+
candidate.score += 1
|
| 216 |
+
|
| 217 |
+
return list(poi_scores.values())
|
| 218 |
+
|
| 219 |
+
def match_with_spatial_context(
|
| 220 |
+
self,
|
| 221 |
+
ocr_texts: List[str],
|
| 222 |
+
analysis: SpatialAnalysis,
|
| 223 |
+
hint_lat: float,
|
| 224 |
+
hint_lon: float,
|
| 225 |
+
) -> MatchResult:
|
| 226 |
+
"""
|
| 227 |
+
空間的コンテキストを考慮したマッチング
|
| 228 |
+
|
| 229 |
+
複数のランドマークの位置関係を考慮して
|
| 230 |
+
より精度の高いマッチングを行う
|
| 231 |
+
"""
|
| 232 |
+
base_result = self.match(ocr_texts, analysis, hint_lat, hint_lon)
|
| 233 |
+
|
| 234 |
+
if not analysis.success or not analysis.spatial_relations:
|
| 235 |
+
return base_result
|
| 236 |
+
|
| 237 |
+
# 空間関係の記述から追加の制約を抽出
|
| 238 |
+
# 例: "ローソンの右隣にコインパーキング"
|
| 239 |
+
# → ローソンとパーキングが近接している候補を優先
|
| 240 |
+
|
| 241 |
+
# TODO: 空間関係のパースと追加スコアリング
|
| 242 |
+
# 現時点では基本マッチングのみ
|
| 243 |
+
|
| 244 |
+
return base_result
|
core/ocr_engine.py
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""PaddleOCR ラッパー"""
|
| 2 |
+
|
| 3 |
+
from typing import List, Tuple, Optional
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
import numpy as np
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
@dataclass
|
| 9 |
+
class OCRResult:
|
| 10 |
+
"""OCR検出結果"""
|
| 11 |
+
|
| 12 |
+
text: str
|
| 13 |
+
confidence: float
|
| 14 |
+
bbox: List[List[int]] # [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
class OCREngine:
|
| 18 |
+
"""
|
| 19 |
+
PaddleOCRラッパークラス
|
| 20 |
+
日本語テキスト抽出に最適化
|
| 21 |
+
"""
|
| 22 |
+
|
| 23 |
+
def __init__(self, lang: str = "japan", use_gpu: bool = False):
|
| 24 |
+
"""
|
| 25 |
+
Args:
|
| 26 |
+
lang: 言語設定 ("japan", "en", "ch" など)
|
| 27 |
+
use_gpu: GPU使用フラグ(Hugging Face Free TierではFalse)
|
| 28 |
+
"""
|
| 29 |
+
self.lang = lang
|
| 30 |
+
self.use_gpu = use_gpu
|
| 31 |
+
self._ocr = None
|
| 32 |
+
self._initialized = False
|
| 33 |
+
|
| 34 |
+
def _init_ocr(self) -> None:
|
| 35 |
+
"""OCRエンジンの遅延初期化"""
|
| 36 |
+
if self._initialized:
|
| 37 |
+
return
|
| 38 |
+
|
| 39 |
+
try:
|
| 40 |
+
from paddleocr import PaddleOCR
|
| 41 |
+
|
| 42 |
+
self._ocr = PaddleOCR(
|
| 43 |
+
use_angle_cls=True,
|
| 44 |
+
lang=self.lang,
|
| 45 |
+
use_gpu=self.use_gpu,
|
| 46 |
+
show_log=False,
|
| 47 |
+
# CPU最適化設定
|
| 48 |
+
enable_mkldnn=True,
|
| 49 |
+
cpu_threads=2,
|
| 50 |
+
)
|
| 51 |
+
self._initialized = True
|
| 52 |
+
except ImportError:
|
| 53 |
+
print("Warning: PaddleOCR not installed. OCR will not work.")
|
| 54 |
+
self._initialized = False
|
| 55 |
+
|
| 56 |
+
def detect(self, frame: np.ndarray) -> List[OCRResult]:
|
| 57 |
+
"""
|
| 58 |
+
フレームからテキストを検出
|
| 59 |
+
|
| 60 |
+
Args:
|
| 61 |
+
frame: 入力画像(BGR形式)
|
| 62 |
+
|
| 63 |
+
Returns:
|
| 64 |
+
OCRResult のリスト
|
| 65 |
+
"""
|
| 66 |
+
self._init_ocr()
|
| 67 |
+
|
| 68 |
+
if self._ocr is None:
|
| 69 |
+
return []
|
| 70 |
+
|
| 71 |
+
try:
|
| 72 |
+
result = self._ocr.ocr(frame, cls=True)
|
| 73 |
+
|
| 74 |
+
if result is None or len(result) == 0:
|
| 75 |
+
return []
|
| 76 |
+
|
| 77 |
+
ocr_results = []
|
| 78 |
+
for line in result:
|
| 79 |
+
if line is None:
|
| 80 |
+
continue
|
| 81 |
+
for item in line:
|
| 82 |
+
if item is None or len(item) < 2:
|
| 83 |
+
continue
|
| 84 |
+
bbox = item[0]
|
| 85 |
+
text_info = item[1]
|
| 86 |
+
if text_info and len(text_info) >= 2:
|
| 87 |
+
text = text_info[0]
|
| 88 |
+
confidence = float(text_info[1])
|
| 89 |
+
ocr_results.append(
|
| 90 |
+
OCRResult(
|
| 91 |
+
text=text,
|
| 92 |
+
confidence=confidence,
|
| 93 |
+
bbox=bbox,
|
| 94 |
+
)
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
return ocr_results
|
| 98 |
+
|
| 99 |
+
except Exception as e:
|
| 100 |
+
print(f"OCR error: {e}")
|
| 101 |
+
return []
|
| 102 |
+
|
| 103 |
+
def detect_text_only(self, frame: np.ndarray) -> List[str]:
|
| 104 |
+
"""
|
| 105 |
+
テキストのみを抽出(信頼度でフィルタリング)
|
| 106 |
+
|
| 107 |
+
Args:
|
| 108 |
+
frame: 入力画像
|
| 109 |
+
|
| 110 |
+
Returns:
|
| 111 |
+
検出されたテキストのリスト
|
| 112 |
+
"""
|
| 113 |
+
results = self.detect(frame)
|
| 114 |
+
# 信頼度0.5以上のテキストのみ
|
| 115 |
+
return [r.text for r in results if r.confidence >= 0.5]
|
| 116 |
+
|
| 117 |
+
def detect_with_positions(
|
| 118 |
+
self, frame: np.ndarray
|
| 119 |
+
) -> List[Tuple[str, float, Tuple[int, int]]]:
|
| 120 |
+
"""
|
| 121 |
+
テキストと位置情報を抽出
|
| 122 |
+
|
| 123 |
+
Returns:
|
| 124 |
+
(テキスト, 信頼度, 中心座標) のリスト
|
| 125 |
+
"""
|
| 126 |
+
results = self.detect(frame)
|
| 127 |
+
output = []
|
| 128 |
+
|
| 129 |
+
for r in results:
|
| 130 |
+
if r.confidence < 0.5:
|
| 131 |
+
continue
|
| 132 |
+
# バウンディングボックスの中心を計算
|
| 133 |
+
xs = [p[0] for p in r.bbox]
|
| 134 |
+
ys = [p[1] for p in r.bbox]
|
| 135 |
+
center_x = int(sum(xs) / 4)
|
| 136 |
+
center_y = int(sum(ys) / 4)
|
| 137 |
+
output.append((r.text, r.confidence, (center_x, center_y)))
|
| 138 |
+
|
| 139 |
+
return output
|
| 140 |
+
|
| 141 |
+
@property
|
| 142 |
+
def is_available(self) -> bool:
|
| 143 |
+
"""OCRエンジンが利用可能かどうか"""
|
| 144 |
+
self._init_ocr()
|
| 145 |
+
return self._initialized and self._ocr is not None
|
core/result_aggregator.py
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""検出結果の統合・スコアリング"""
|
| 2 |
+
|
| 3 |
+
import time
|
| 4 |
+
from typing import List, Optional, Dict
|
| 5 |
+
from dataclasses import dataclass, field
|
| 6 |
+
from collections import deque
|
| 7 |
+
|
| 8 |
+
from core.location_matcher import LocationCandidate, MatchResult
|
| 9 |
+
from utils.geo_utils import haversine_distance
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
@dataclass
|
| 13 |
+
class DetectionEvent:
|
| 14 |
+
"""検出イベント"""
|
| 15 |
+
|
| 16 |
+
timestamp: float
|
| 17 |
+
ocr_texts: List[str]
|
| 18 |
+
vlm_keywords: List[str]
|
| 19 |
+
match_result: Optional[MatchResult]
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
@dataclass
|
| 23 |
+
class AggregatedResult:
|
| 24 |
+
"""統合結果"""
|
| 25 |
+
|
| 26 |
+
estimated_lat: Optional[float] = None
|
| 27 |
+
estimated_lon: Optional[float] = None
|
| 28 |
+
confidence: float = 0.0
|
| 29 |
+
address_hint: str = ""
|
| 30 |
+
detected_texts: List[str] = field(default_factory=list)
|
| 31 |
+
detected_landmarks: List[str] = field(default_factory=list)
|
| 32 |
+
match_count: int = 0
|
| 33 |
+
is_location_found: bool = False
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
class ResultAggregator:
|
| 37 |
+
"""
|
| 38 |
+
複数の検出結果を時間軸で統合し、
|
| 39 |
+
信頼度の高い位置推定を行う
|
| 40 |
+
"""
|
| 41 |
+
|
| 42 |
+
def __init__(
|
| 43 |
+
self,
|
| 44 |
+
buffer_size: int = 10,
|
| 45 |
+
confidence_threshold: float = 0.6,
|
| 46 |
+
consistency_window_sec: float = 10.0,
|
| 47 |
+
):
|
| 48 |
+
self.buffer_size = buffer_size
|
| 49 |
+
self.confidence_threshold = confidence_threshold
|
| 50 |
+
self.consistency_window_sec = consistency_window_sec
|
| 51 |
+
|
| 52 |
+
self._events: deque = deque(maxlen=buffer_size)
|
| 53 |
+
self._detected_texts: Dict[str, int] = {} # テキスト -> 検出回数
|
| 54 |
+
self._candidate_history: List[LocationCandidate] = []
|
| 55 |
+
|
| 56 |
+
def add_detection(
|
| 57 |
+
self,
|
| 58 |
+
ocr_texts: List[str],
|
| 59 |
+
vlm_keywords: List[str],
|
| 60 |
+
match_result: Optional[MatchResult],
|
| 61 |
+
) -> None:
|
| 62 |
+
"""検出イベントを追加"""
|
| 63 |
+
event = DetectionEvent(
|
| 64 |
+
timestamp=time.time(),
|
| 65 |
+
ocr_texts=ocr_texts,
|
| 66 |
+
vlm_keywords=vlm_keywords,
|
| 67 |
+
match_result=match_result,
|
| 68 |
+
)
|
| 69 |
+
self._events.append(event)
|
| 70 |
+
|
| 71 |
+
# テキスト検出回数を更新
|
| 72 |
+
for text in ocr_texts:
|
| 73 |
+
self._detected_texts[text] = self._detected_texts.get(text, 0) + 1
|
| 74 |
+
|
| 75 |
+
# 候補履歴を更新
|
| 76 |
+
if match_result and match_result.best_candidate:
|
| 77 |
+
self._candidate_history.append(match_result.best_candidate)
|
| 78 |
+
# 古い履歴を削除
|
| 79 |
+
if len(self._candidate_history) > self.buffer_size:
|
| 80 |
+
self._candidate_history = self._candidate_history[-self.buffer_size:]
|
| 81 |
+
|
| 82 |
+
def get_aggregated_result(self) -> AggregatedResult:
|
| 83 |
+
"""統合結果を取得"""
|
| 84 |
+
if not self._events:
|
| 85 |
+
return AggregatedResult()
|
| 86 |
+
|
| 87 |
+
# 頻出テキストを抽出
|
| 88 |
+
frequent_texts = [
|
| 89 |
+
text
|
| 90 |
+
for text, count in sorted(
|
| 91 |
+
self._detected_texts.items(), key=lambda x: x[1], reverse=True
|
| 92 |
+
)
|
| 93 |
+
if count >= 2
|
| 94 |
+
][:10]
|
| 95 |
+
|
| 96 |
+
# VLMキーワードを集約
|
| 97 |
+
vlm_keywords = set()
|
| 98 |
+
for event in self._events:
|
| 99 |
+
vlm_keywords.update(event.vlm_keywords)
|
| 100 |
+
|
| 101 |
+
# 候補の一貫性を評価
|
| 102 |
+
if not self._candidate_history:
|
| 103 |
+
return AggregatedResult(
|
| 104 |
+
detected_texts=frequent_texts,
|
| 105 |
+
detected_landmarks=list(vlm_keywords),
|
| 106 |
+
)
|
| 107 |
+
|
| 108 |
+
# 最新の候補を基準に一貫性を評価
|
| 109 |
+
latest = self._candidate_history[-1]
|
| 110 |
+
consistent_candidates = []
|
| 111 |
+
|
| 112 |
+
for candidate in self._candidate_history:
|
| 113 |
+
distance = haversine_distance(
|
| 114 |
+
latest.lat, latest.lon, candidate.lat, candidate.lon
|
| 115 |
+
)
|
| 116 |
+
if distance < 100: # 100m以内なら一貫性あり
|
| 117 |
+
consistent_candidates.append(candidate)
|
| 118 |
+
|
| 119 |
+
# 信頼度の計算
|
| 120 |
+
consistency_ratio = len(consistent_candidates) / len(self._candidate_history)
|
| 121 |
+
avg_score = sum(c.score for c in consistent_candidates) / max(
|
| 122 |
+
len(consistent_candidates), 1
|
| 123 |
+
)
|
| 124 |
+
|
| 125 |
+
# 正規化されたスコア(0-1)
|
| 126 |
+
normalized_score = min(avg_score / 20, 1.0) # 20点を最大と仮定
|
| 127 |
+
confidence = (consistency_ratio * 0.6 + normalized_score * 0.4)
|
| 128 |
+
|
| 129 |
+
is_found = (
|
| 130 |
+
confidence >= self.confidence_threshold
|
| 131 |
+
and len(consistent_candidates) >= 2
|
| 132 |
+
)
|
| 133 |
+
|
| 134 |
+
# 重心を計算(一貫性のある候補の平均)
|
| 135 |
+
if consistent_candidates:
|
| 136 |
+
avg_lat = sum(c.lat for c in consistent_candidates) / len(
|
| 137 |
+
consistent_candidates
|
| 138 |
+
)
|
| 139 |
+
avg_lon = sum(c.lon for c in consistent_candidates) / len(
|
| 140 |
+
consistent_candidates
|
| 141 |
+
)
|
| 142 |
+
else:
|
| 143 |
+
avg_lat = latest.lat
|
| 144 |
+
avg_lon = latest.lon
|
| 145 |
+
|
| 146 |
+
# 住所ヒントの生成
|
| 147 |
+
address_hint = self._generate_address_hint(consistent_candidates)
|
| 148 |
+
|
| 149 |
+
return AggregatedResult(
|
| 150 |
+
estimated_lat=avg_lat,
|
| 151 |
+
estimated_lon=avg_lon,
|
| 152 |
+
confidence=confidence,
|
| 153 |
+
address_hint=address_hint,
|
| 154 |
+
detected_texts=frequent_texts,
|
| 155 |
+
detected_landmarks=list(vlm_keywords),
|
| 156 |
+
match_count=len(self._candidate_history),
|
| 157 |
+
is_location_found=is_found,
|
| 158 |
+
)
|
| 159 |
+
|
| 160 |
+
def _generate_address_hint(
|
| 161 |
+
self, candidates: List[LocationCandidate]
|
| 162 |
+
) -> str:
|
| 163 |
+
"""候補から住所ヒントを生成"""
|
| 164 |
+
if not candidates:
|
| 165 |
+
return ""
|
| 166 |
+
|
| 167 |
+
# マッチ理由から代表的なランドマークを抽出
|
| 168 |
+
landmarks = []
|
| 169 |
+
for candidate in candidates:
|
| 170 |
+
for reason in candidate.match_reasons:
|
| 171 |
+
if "名前マッチ" in reason:
|
| 172 |
+
# "名前マッチ: ローソン" -> "ローソン"
|
| 173 |
+
name = reason.replace("名前マッチ: ", "")
|
| 174 |
+
if name not in landmarks:
|
| 175 |
+
landmarks.append(name)
|
| 176 |
+
|
| 177 |
+
if landmarks:
|
| 178 |
+
return f"{landmarks[0]}付近"
|
| 179 |
+
return ""
|
| 180 |
+
|
| 181 |
+
def reset(self) -> None:
|
| 182 |
+
"""状態をリセット"""
|
| 183 |
+
self._events.clear()
|
| 184 |
+
self._detected_texts.clear()
|
| 185 |
+
self._candidate_history.clear()
|
| 186 |
+
|
| 187 |
+
def get_recent_texts(self, limit: int = 5) -> List[str]:
|
| 188 |
+
"""最近検出されたテキストを取得"""
|
| 189 |
+
texts = []
|
| 190 |
+
for event in reversed(list(self._events)):
|
| 191 |
+
for text in event.ocr_texts:
|
| 192 |
+
if text not in texts:
|
| 193 |
+
texts.append(text)
|
| 194 |
+
if len(texts) >= limit:
|
| 195 |
+
return texts
|
| 196 |
+
return texts
|
| 197 |
+
|
| 198 |
+
def get_detection_stats(self) -> Dict:
|
| 199 |
+
"""検出統計を取得"""
|
| 200 |
+
return {
|
| 201 |
+
"event_count": len(self._events),
|
| 202 |
+
"unique_texts": len(self._detected_texts),
|
| 203 |
+
"candidate_count": len(self._candidate_history),
|
| 204 |
+
"top_texts": sorted(
|
| 205 |
+
self._detected_texts.items(), key=lambda x: x[1], reverse=True
|
| 206 |
+
)[:5],
|
| 207 |
+
}
|
core/vlm_analyzer.py
ADDED
|
@@ -0,0 +1,187 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""VLM空間推論エンジン"""
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import re
|
| 5 |
+
from typing import List, Optional, Dict, Any
|
| 6 |
+
from dataclasses import dataclass, field
|
| 7 |
+
import numpy as np
|
| 8 |
+
|
| 9 |
+
from services.gemini_client import GeminiClient
|
| 10 |
+
from utils.image_utils import frame_to_pil
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
SPATIAL_ANALYSIS_PROMPT = """
|
| 14 |
+
この画像は日本の街中で撮影されたものです。
|
| 15 |
+
位置特定のため、以下の情報を可能な限り抽出してJSON形式で出力してください。
|
| 16 |
+
|
| 17 |
+
1. landmarks: 認識できるランドマーク(店舗、施設、看板など)のリスト
|
| 18 |
+
- name: 名称
|
| 19 |
+
- type: 種類(convenience_store, restaurant, hospital, station, parking, gas_station, etc.)
|
| 20 |
+
- position: 画面内での位置(left, center, right, background)
|
| 21 |
+
|
| 22 |
+
2. spatial_relations: ランドマーク間の位置関係を日本語で記述
|
| 23 |
+
|
| 24 |
+
3. environment: 周辺環境の特徴
|
| 25 |
+
- road_type: 道路タイプ(大通り, 住宅街の道路, 国道, 県道 など)
|
| 26 |
+
- area_type: エリアタイプ(商業地域, 住宅街, 駅前, 郊外 など)
|
| 27 |
+
- notable_features: その他の特徴的な要素
|
| 28 |
+
|
| 29 |
+
4. visible_text: 画像内で読み取れるテキスト(看板、標識など)
|
| 30 |
+
|
| 31 |
+
必ず有効なJSONのみを出力してください。説明文は不要です。
|
| 32 |
+
|
| 33 |
+
出力例:
|
| 34 |
+
{
|
| 35 |
+
"landmarks": [
|
| 36 |
+
{"name": "ローソン", "type": "convenience_store", "position": "center"},
|
| 37 |
+
{"name": "コインパーキング", "type": "parking", "position": "right"}
|
| 38 |
+
],
|
| 39 |
+
"spatial_relations": [
|
| 40 |
+
"ローソンの右隣にコインパーキングがある",
|
| 41 |
+
"奥に交差点が見える"
|
| 42 |
+
],
|
| 43 |
+
"environment": {
|
| 44 |
+
"road_type": "片側1車線の道路",
|
| 45 |
+
"area_type": "郊外の商業地域",
|
| 46 |
+
"notable_features": ["信号機あり", "歩道あり"]
|
| 47 |
+
},
|
| 48 |
+
"visible_text": ["ローソン", "P 24時間", "一方通行"]
|
| 49 |
+
}
|
| 50 |
+
"""
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
@dataclass
|
| 54 |
+
class Landmark:
|
| 55 |
+
"""ランドマーク情報"""
|
| 56 |
+
|
| 57 |
+
name: str
|
| 58 |
+
type: str
|
| 59 |
+
position: str
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
@dataclass
|
| 63 |
+
class SpatialAnalysis:
|
| 64 |
+
"""空間分析結果"""
|
| 65 |
+
|
| 66 |
+
landmarks: List[Landmark] = field(default_factory=list)
|
| 67 |
+
spatial_relations: List[str] = field(default_factory=list)
|
| 68 |
+
environment: Dict[str, Any] = field(default_factory=dict)
|
| 69 |
+
visible_text: List[str] = field(default_factory=list)
|
| 70 |
+
raw_response: str = ""
|
| 71 |
+
success: bool = True
|
| 72 |
+
error: Optional[str] = None
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
class VLMAnalyzer:
|
| 76 |
+
"""
|
| 77 |
+
Vision-Language Model を使用した空間推論
|
| 78 |
+
|
| 79 |
+
Gemini 2.5 Flash を使用して、画像からランドマーク情報と
|
| 80 |
+
空間的な位置関係を抽出する。
|
| 81 |
+
"""
|
| 82 |
+
|
| 83 |
+
def __init__(self, gemini_client: Optional[GeminiClient] = None):
|
| 84 |
+
self.client = gemini_client or GeminiClient()
|
| 85 |
+
self.prompt = SPATIAL_ANALYSIS_PROMPT
|
| 86 |
+
|
| 87 |
+
def analyze(self, frame: np.ndarray) -> SpatialAnalysis:
|
| 88 |
+
"""
|
| 89 |
+
フレームを分析して空間情報を抽出
|
| 90 |
+
|
| 91 |
+
Args:
|
| 92 |
+
frame: 入力画像(BGR形式)
|
| 93 |
+
|
| 94 |
+
Returns:
|
| 95 |
+
SpatialAnalysis: 分析結果
|
| 96 |
+
"""
|
| 97 |
+
image = frame_to_pil(frame)
|
| 98 |
+
response = self.client.analyze_image(image, self.prompt)
|
| 99 |
+
|
| 100 |
+
if not response.success:
|
| 101 |
+
return SpatialAnalysis(
|
| 102 |
+
success=False,
|
| 103 |
+
error=response.error,
|
| 104 |
+
raw_response="",
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
return self._parse_response(response.text)
|
| 108 |
+
|
| 109 |
+
async def analyze_async(self, frame: np.ndarray) -> SpatialAnalysis:
|
| 110 |
+
"""非同期で分析"""
|
| 111 |
+
image = frame_to_pil(frame)
|
| 112 |
+
response = await self.client.analyze_image_async(image, self.prompt)
|
| 113 |
+
|
| 114 |
+
if not response.success:
|
| 115 |
+
return SpatialAnalysis(
|
| 116 |
+
success=False,
|
| 117 |
+
error=response.error,
|
| 118 |
+
raw_response="",
|
| 119 |
+
)
|
| 120 |
+
|
| 121 |
+
return self._parse_response(response.text)
|
| 122 |
+
|
| 123 |
+
def _parse_response(self, response_text: str) -> SpatialAnalysis:
|
| 124 |
+
"""Geminiレスポンスをパース"""
|
| 125 |
+
try:
|
| 126 |
+
# JSONブロックを抽出
|
| 127 |
+
json_match = re.search(r"\{[\s\S]*\}", response_text)
|
| 128 |
+
if not json_match:
|
| 129 |
+
return SpatialAnalysis(
|
| 130 |
+
success=False,
|
| 131 |
+
error="No JSON found in response",
|
| 132 |
+
raw_response=response_text,
|
| 133 |
+
)
|
| 134 |
+
|
| 135 |
+
data = json.loads(json_match.group())
|
| 136 |
+
|
| 137 |
+
landmarks = []
|
| 138 |
+
for lm in data.get("landmarks", []):
|
| 139 |
+
landmarks.append(
|
| 140 |
+
Landmark(
|
| 141 |
+
name=lm.get("name", ""),
|
| 142 |
+
type=lm.get("type", "unknown"),
|
| 143 |
+
position=lm.get("position", "unknown"),
|
| 144 |
+
)
|
| 145 |
+
)
|
| 146 |
+
|
| 147 |
+
return SpatialAnalysis(
|
| 148 |
+
landmarks=landmarks,
|
| 149 |
+
spatial_relations=data.get("spatial_relations", []),
|
| 150 |
+
environment=data.get("environment", {}),
|
| 151 |
+
visible_text=data.get("visible_text", []),
|
| 152 |
+
raw_response=response_text,
|
| 153 |
+
success=True,
|
| 154 |
+
)
|
| 155 |
+
|
| 156 |
+
except json.JSONDecodeError as e:
|
| 157 |
+
return SpatialAnalysis(
|
| 158 |
+
success=False,
|
| 159 |
+
error=f"JSON parse error: {e}",
|
| 160 |
+
raw_response=response_text,
|
| 161 |
+
)
|
| 162 |
+
except Exception as e:
|
| 163 |
+
return SpatialAnalysis(
|
| 164 |
+
success=False,
|
| 165 |
+
error=str(e),
|
| 166 |
+
raw_response=response_text,
|
| 167 |
+
)
|
| 168 |
+
|
| 169 |
+
def get_search_keywords(self, analysis: SpatialAnalysis) -> List[str]:
|
| 170 |
+
"""分析結果から検索キーワードを抽出"""
|
| 171 |
+
keywords = []
|
| 172 |
+
|
| 173 |
+
# ランドマーク名
|
| 174 |
+
for lm in analysis.landmarks:
|
| 175 |
+
if lm.name:
|
| 176 |
+
keywords.append(lm.name)
|
| 177 |
+
|
| 178 |
+
# 可視テキスト
|
| 179 |
+
keywords.extend(analysis.visible_text)
|
| 180 |
+
|
| 181 |
+
# 重複除去
|
| 182 |
+
return list(set(keywords))
|
| 183 |
+
|
| 184 |
+
@property
|
| 185 |
+
def is_available(self) -> bool:
|
| 186 |
+
"""VLMが利用可能かどうか"""
|
| 187 |
+
return self.client.is_available
|
memo.txt
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
概要
|
| 2 |
+
|
| 3 |
+
スマホカメラ映像をリアルタイム解析し、GPS に頼らず位置を特定するサービス。
|
| 4 |
+
Hugging Face Spaces (Free Tier) で無料運用。
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
ユーザー作業 (手動設定が必要)
|
| 8 |
+
|
| 9 |
+
1. Gemini API キーの取得
|
| 10 |
+
|
| 11 |
+
1. https://aistudio.google.com/ にアクセス
|
| 12 |
+
2. Google アカウントでログイン
|
| 13 |
+
3. 左メニュー「Get API key」→「Create API key」
|
| 14 |
+
4. API キーをコピーして保存
|
| 15 |
+
|
| 16 |
+
2. Hugging Face Spaces へのデプロイ (実装完了後)
|
| 17 |
+
|
| 18 |
+
1. Hugging Face で「New Space」作成
|
| 19 |
+
- SDK: Gradio
|
| 20 |
+
- Hardware: CPU basic (Free)
|
| 21 |
+
2. Settings → Repository secrets で GEMINI_API_KEY を追加
|
requirements.txt
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gradio & WebRTC
|
| 2 |
+
gradio>=5.0.0
|
| 3 |
+
gradio-webrtc>=0.0.31
|
| 4 |
+
|
| 5 |
+
# OCR - PaddlePaddle (CPU版)
|
| 6 |
+
paddlepaddle==3.0.0
|
| 7 |
+
paddleocr>=2.8.0
|
| 8 |
+
|
| 9 |
+
# Gemini API
|
| 10 |
+
google-generativeai>=0.8.0
|
| 11 |
+
|
| 12 |
+
# OSM / 地図関連
|
| 13 |
+
overpy>=0.7
|
| 14 |
+
geopy>=2.4.0
|
| 15 |
+
requests>=2.31.0
|
| 16 |
+
|
| 17 |
+
# 画像処理
|
| 18 |
+
opencv-python-headless>=4.8.0
|
| 19 |
+
Pillow>=10.0.0
|
| 20 |
+
numpy>=1.24.0
|
| 21 |
+
|
| 22 |
+
# 非同期処理
|
| 23 |
+
aiohttp>=3.9.0
|
| 24 |
+
|
| 25 |
+
# 環境変数
|
| 26 |
+
python-dotenv>=1.0.0
|
| 27 |
+
|
| 28 |
+
# その他ユーティリティ
|
| 29 |
+
pydantic>=2.0.0
|
| 30 |
+
cachetools>=5.3.0
|
services/__init__.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .gemini_client import GeminiClient
|
| 2 |
+
from .overpass_client import OverpassClient
|
| 3 |
+
|
| 4 |
+
__all__ = ["GeminiClient", "OverpassClient"]
|
services/gemini_client.py
ADDED
|
@@ -0,0 +1,188 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Gemini API クライアント"""
|
| 2 |
+
|
| 3 |
+
import asyncio
|
| 4 |
+
import time
|
| 5 |
+
from typing import Optional
|
| 6 |
+
from dataclasses import dataclass
|
| 7 |
+
import numpy as np
|
| 8 |
+
from PIL import Image
|
| 9 |
+
|
| 10 |
+
from config.settings import settings
|
| 11 |
+
from utils.image_utils import frame_to_pil
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
@dataclass
|
| 15 |
+
class GeminiResponse:
|
| 16 |
+
"""Gemini APIレスポンス"""
|
| 17 |
+
|
| 18 |
+
text: str
|
| 19 |
+
success: bool
|
| 20 |
+
error: Optional[str] = None
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class GeminiClient:
|
| 24 |
+
"""
|
| 25 |
+
Gemini API クライアント
|
| 26 |
+
- 10 RPM 制限を遵守
|
| 27 |
+
- 指数バックオフでリトライ
|
| 28 |
+
"""
|
| 29 |
+
|
| 30 |
+
def __init__(self, api_key: Optional[str] = None):
|
| 31 |
+
self.api_key = api_key or settings.gemini_api_key
|
| 32 |
+
self._client = None
|
| 33 |
+
self._model = None
|
| 34 |
+
self._last_request_time: float = 0
|
| 35 |
+
self._min_interval: float = 6.0 # 10 RPM = 6秒間隔
|
| 36 |
+
self._initialized = False
|
| 37 |
+
|
| 38 |
+
def _init_client(self) -> bool:
|
| 39 |
+
"""クライアントの遅延初期化"""
|
| 40 |
+
if self._initialized:
|
| 41 |
+
return self._client is not None
|
| 42 |
+
|
| 43 |
+
if not self.api_key:
|
| 44 |
+
print("Warning: GEMINI_API_KEY is not set")
|
| 45 |
+
self._initialized = True
|
| 46 |
+
return False
|
| 47 |
+
|
| 48 |
+
try:
|
| 49 |
+
import google.generativeai as genai
|
| 50 |
+
|
| 51 |
+
genai.configure(api_key=self.api_key)
|
| 52 |
+
self._client = genai
|
| 53 |
+
self._model = genai.GenerativeModel(settings.gemini_model)
|
| 54 |
+
self._initialized = True
|
| 55 |
+
return True
|
| 56 |
+
except ImportError:
|
| 57 |
+
print("Warning: google-generativeai not installed")
|
| 58 |
+
self._initialized = True
|
| 59 |
+
return False
|
| 60 |
+
except Exception as e:
|
| 61 |
+
print(f"Gemini initialization error: {e}")
|
| 62 |
+
self._initialized = True
|
| 63 |
+
return False
|
| 64 |
+
|
| 65 |
+
def _wait_for_rate_limit(self) -> None:
|
| 66 |
+
"""レート制限のための待機"""
|
| 67 |
+
elapsed = time.time() - self._last_request_time
|
| 68 |
+
if elapsed < self._min_interval:
|
| 69 |
+
time.sleep(self._min_interval - elapsed)
|
| 70 |
+
|
| 71 |
+
async def _async_wait_for_rate_limit(self) -> None:
|
| 72 |
+
"""非同期レート制限待機"""
|
| 73 |
+
elapsed = time.time() - self._last_request_time
|
| 74 |
+
if elapsed < self._min_interval:
|
| 75 |
+
await asyncio.sleep(self._min_interval - elapsed)
|
| 76 |
+
|
| 77 |
+
def analyze_image(
|
| 78 |
+
self, image: Image.Image, prompt: str, max_retries: int = 3
|
| 79 |
+
) -> GeminiResponse:
|
| 80 |
+
"""
|
| 81 |
+
画像を分析
|
| 82 |
+
|
| 83 |
+
Args:
|
| 84 |
+
image: PIL Image
|
| 85 |
+
prompt: 分析プロンプト
|
| 86 |
+
max_retries: 最大リトライ回数
|
| 87 |
+
|
| 88 |
+
Returns:
|
| 89 |
+
GeminiResponse
|
| 90 |
+
"""
|
| 91 |
+
if not self._init_client():
|
| 92 |
+
return GeminiResponse(
|
| 93 |
+
text="",
|
| 94 |
+
success=False,
|
| 95 |
+
error="Gemini client not initialized",
|
| 96 |
+
)
|
| 97 |
+
|
| 98 |
+
for attempt in range(max_retries):
|
| 99 |
+
try:
|
| 100 |
+
self._wait_for_rate_limit()
|
| 101 |
+
self._last_request_time = time.time()
|
| 102 |
+
|
| 103 |
+
response = self._model.generate_content([prompt, image])
|
| 104 |
+
return GeminiResponse(text=response.text, success=True)
|
| 105 |
+
|
| 106 |
+
except Exception as e:
|
| 107 |
+
error_msg = str(e)
|
| 108 |
+
if "429" in error_msg or "quota" in error_msg.lower():
|
| 109 |
+
# レート制限エラー: 指数バックオフ
|
| 110 |
+
wait_time = (2**attempt) * 10
|
| 111 |
+
print(f"Rate limited, waiting {wait_time}s...")
|
| 112 |
+
time.sleep(wait_time)
|
| 113 |
+
elif attempt < max_retries - 1:
|
| 114 |
+
time.sleep(2**attempt)
|
| 115 |
+
else:
|
| 116 |
+
return GeminiResponse(
|
| 117 |
+
text="",
|
| 118 |
+
success=False,
|
| 119 |
+
error=error_msg,
|
| 120 |
+
)
|
| 121 |
+
|
| 122 |
+
return GeminiResponse(
|
| 123 |
+
text="",
|
| 124 |
+
success=False,
|
| 125 |
+
error="Max retries exceeded",
|
| 126 |
+
)
|
| 127 |
+
|
| 128 |
+
def analyze_frame(
|
| 129 |
+
self, frame: np.ndarray, prompt: str
|
| 130 |
+
) -> GeminiResponse:
|
| 131 |
+
"""
|
| 132 |
+
NumPyフレームを分析
|
| 133 |
+
|
| 134 |
+
Args:
|
| 135 |
+
frame: NumPy配列(BGR形式)
|
| 136 |
+
prompt: 分析プロンプト
|
| 137 |
+
|
| 138 |
+
Returns:
|
| 139 |
+
GeminiResponse
|
| 140 |
+
"""
|
| 141 |
+
image = frame_to_pil(frame)
|
| 142 |
+
return self.analyze_image(image, prompt)
|
| 143 |
+
|
| 144 |
+
async def analyze_image_async(
|
| 145 |
+
self, image: Image.Image, prompt: str, max_retries: int = 3
|
| 146 |
+
) -> GeminiResponse:
|
| 147 |
+
"""非同期で画像を分析"""
|
| 148 |
+
if not self._init_client():
|
| 149 |
+
return GeminiResponse(
|
| 150 |
+
text="",
|
| 151 |
+
success=False,
|
| 152 |
+
error="Gemini client not initialized",
|
| 153 |
+
)
|
| 154 |
+
|
| 155 |
+
for attempt in range(max_retries):
|
| 156 |
+
try:
|
| 157 |
+
await self._async_wait_for_rate_limit()
|
| 158 |
+
self._last_request_time = time.time()
|
| 159 |
+
|
| 160 |
+
response = await asyncio.to_thread(
|
| 161 |
+
self._model.generate_content, [prompt, image]
|
| 162 |
+
)
|
| 163 |
+
return GeminiResponse(text=response.text, success=True)
|
| 164 |
+
|
| 165 |
+
except Exception as e:
|
| 166 |
+
error_msg = str(e)
|
| 167 |
+
if "429" in error_msg or "quota" in error_msg.lower():
|
| 168 |
+
wait_time = (2**attempt) * 10
|
| 169 |
+
await asyncio.sleep(wait_time)
|
| 170 |
+
elif attempt < max_retries - 1:
|
| 171 |
+
await asyncio.sleep(2**attempt)
|
| 172 |
+
else:
|
| 173 |
+
return GeminiResponse(
|
| 174 |
+
text="",
|
| 175 |
+
success=False,
|
| 176 |
+
error=error_msg,
|
| 177 |
+
)
|
| 178 |
+
|
| 179 |
+
return GeminiResponse(
|
| 180 |
+
text="",
|
| 181 |
+
success=False,
|
| 182 |
+
error="Max retries exceeded",
|
| 183 |
+
)
|
| 184 |
+
|
| 185 |
+
@property
|
| 186 |
+
def is_available(self) -> bool:
|
| 187 |
+
"""クライアントが利用可能かどうか"""
|
| 188 |
+
return self._init_client()
|
services/overpass_client.py
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OpenStreetMap Overpass API クライアント"""
|
| 2 |
+
|
| 3 |
+
import time
|
| 4 |
+
from typing import List, Optional, Dict, Any
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
+
from cachetools import TTLCache
|
| 7 |
+
|
| 8 |
+
from config.settings import settings
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
@dataclass
|
| 12 |
+
class POI:
|
| 13 |
+
"""Point of Interest"""
|
| 14 |
+
|
| 15 |
+
osm_id: int
|
| 16 |
+
name: str
|
| 17 |
+
lat: float
|
| 18 |
+
lon: float
|
| 19 |
+
poi_type: str
|
| 20 |
+
tags: Dict[str, str]
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class OverpassClient:
|
| 24 |
+
"""
|
| 25 |
+
OpenStreetMap Overpass API クライアント
|
| 26 |
+
POI検索とキャッシュ機能を提供
|
| 27 |
+
"""
|
| 28 |
+
|
| 29 |
+
OVERPASS_URL = "https://overpass-api.de/api/interpreter"
|
| 30 |
+
|
| 31 |
+
# 店舗タイプマッピング
|
| 32 |
+
SHOP_TYPE_MAPPING = {
|
| 33 |
+
"convenience_store": ["shop=convenience", "amenity=convenience"],
|
| 34 |
+
"restaurant": ["amenity=restaurant", "amenity=fast_food"],
|
| 35 |
+
"hospital": ["amenity=hospital", "amenity=clinic"],
|
| 36 |
+
"pharmacy": ["amenity=pharmacy", "shop=chemist"],
|
| 37 |
+
"gas_station": ["amenity=fuel"],
|
| 38 |
+
"parking": ["amenity=parking"],
|
| 39 |
+
"station": ["railway=station", "public_transport=station"],
|
| 40 |
+
"bank": ["amenity=bank"],
|
| 41 |
+
"post_office": ["amenity=post_office"],
|
| 42 |
+
"supermarket": ["shop=supermarket"],
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
def __init__(self, timeout: int = 25, cache_ttl: int = 300):
|
| 46 |
+
self.timeout = timeout
|
| 47 |
+
self._cache = TTLCache(maxsize=100, ttl=cache_ttl)
|
| 48 |
+
self._last_request_time: float = 0
|
| 49 |
+
self._min_interval: float = 1.0 # 最低1秒間隔
|
| 50 |
+
|
| 51 |
+
def _wait_for_rate_limit(self) -> None:
|
| 52 |
+
"""レート制限のための待機"""
|
| 53 |
+
elapsed = time.time() - self._last_request_time
|
| 54 |
+
if elapsed < self._min_interval:
|
| 55 |
+
time.sleep(self._min_interval - elapsed)
|
| 56 |
+
|
| 57 |
+
def _build_name_query(
|
| 58 |
+
self,
|
| 59 |
+
name: str,
|
| 60 |
+
lat: float,
|
| 61 |
+
lon: float,
|
| 62 |
+
radius: int,
|
| 63 |
+
) -> str:
|
| 64 |
+
"""名前でPOIを検索するクエリを構築"""
|
| 65 |
+
return f"""
|
| 66 |
+
[out:json][timeout:{self.timeout}];
|
| 67 |
+
(
|
| 68 |
+
node["name"~"{name}",i](around:{radius},{lat},{lon});
|
| 69 |
+
way["name"~"{name}",i](around:{radius},{lat},{lon});
|
| 70 |
+
);
|
| 71 |
+
out center;
|
| 72 |
+
"""
|
| 73 |
+
|
| 74 |
+
def _build_type_query(
|
| 75 |
+
self,
|
| 76 |
+
poi_type: str,
|
| 77 |
+
lat: float,
|
| 78 |
+
lon: float,
|
| 79 |
+
radius: int,
|
| 80 |
+
) -> str:
|
| 81 |
+
"""タイプでPOIを検索するクエリを構築"""
|
| 82 |
+
tags = self.SHOP_TYPE_MAPPING.get(poi_type, [])
|
| 83 |
+
if not tags:
|
| 84 |
+
return ""
|
| 85 |
+
|
| 86 |
+
conditions = []
|
| 87 |
+
for tag in tags:
|
| 88 |
+
key, value = tag.split("=")
|
| 89 |
+
conditions.append(f'node["{key}"="{value}"](around:{radius},{lat},{lon});')
|
| 90 |
+
conditions.append(f'way["{key}"="{value}"](around:{radius},{lat},{lon});')
|
| 91 |
+
|
| 92 |
+
return f"""
|
| 93 |
+
[out:json][timeout:{self.timeout}];
|
| 94 |
+
(
|
| 95 |
+
{chr(10).join(conditions)}
|
| 96 |
+
);
|
| 97 |
+
out center;
|
| 98 |
+
"""
|
| 99 |
+
|
| 100 |
+
def _parse_response(self, data: Dict[str, Any]) -> List[POI]:
|
| 101 |
+
"""Overpass APIレスポンスをパース"""
|
| 102 |
+
pois = []
|
| 103 |
+
elements = data.get("elements", [])
|
| 104 |
+
|
| 105 |
+
for elem in elements:
|
| 106 |
+
tags = elem.get("tags", {})
|
| 107 |
+
name = tags.get("name", "")
|
| 108 |
+
|
| 109 |
+
# 座標の取得(wayの場合はcenter)
|
| 110 |
+
if elem.get("type") == "way":
|
| 111 |
+
center = elem.get("center", {})
|
| 112 |
+
lat = center.get("lat", 0)
|
| 113 |
+
lon = center.get("lon", 0)
|
| 114 |
+
else:
|
| 115 |
+
lat = elem.get("lat", 0)
|
| 116 |
+
lon = elem.get("lon", 0)
|
| 117 |
+
|
| 118 |
+
# POIタイプの判定
|
| 119 |
+
poi_type = "unknown"
|
| 120 |
+
if tags.get("shop"):
|
| 121 |
+
poi_type = tags.get("shop")
|
| 122 |
+
elif tags.get("amenity"):
|
| 123 |
+
poi_type = tags.get("amenity")
|
| 124 |
+
elif tags.get("railway"):
|
| 125 |
+
poi_type = "station"
|
| 126 |
+
|
| 127 |
+
if lat and lon:
|
| 128 |
+
pois.append(
|
| 129 |
+
POI(
|
| 130 |
+
osm_id=elem.get("id", 0),
|
| 131 |
+
name=name,
|
| 132 |
+
lat=lat,
|
| 133 |
+
lon=lon,
|
| 134 |
+
poi_type=poi_type,
|
| 135 |
+
tags=tags,
|
| 136 |
+
)
|
| 137 |
+
)
|
| 138 |
+
|
| 139 |
+
return pois
|
| 140 |
+
|
| 141 |
+
def search_by_name(
|
| 142 |
+
self,
|
| 143 |
+
name: str,
|
| 144 |
+
lat: float,
|
| 145 |
+
lon: float,
|
| 146 |
+
radius: int = 500,
|
| 147 |
+
) -> List[POI]:
|
| 148 |
+
"""
|
| 149 |
+
名前でPOIを検索
|
| 150 |
+
|
| 151 |
+
Args:
|
| 152 |
+
name: 検索名
|
| 153 |
+
lat: 緯度
|
| 154 |
+
lon: 経度
|
| 155 |
+
radius: 検索半径(メートル)
|
| 156 |
+
|
| 157 |
+
Returns:
|
| 158 |
+
POIのリスト
|
| 159 |
+
"""
|
| 160 |
+
cache_key = f"name:{name}:{lat:.4f}:{lon:.4f}:{radius}"
|
| 161 |
+
if cache_key in self._cache:
|
| 162 |
+
return self._cache[cache_key]
|
| 163 |
+
|
| 164 |
+
query = self._build_name_query(name, lat, lon, radius)
|
| 165 |
+
result = self._execute_query(query)
|
| 166 |
+
self._cache[cache_key] = result
|
| 167 |
+
return result
|
| 168 |
+
|
| 169 |
+
def search_by_type(
|
| 170 |
+
self,
|
| 171 |
+
poi_type: str,
|
| 172 |
+
lat: float,
|
| 173 |
+
lon: float,
|
| 174 |
+
radius: int = 500,
|
| 175 |
+
) -> List[POI]:
|
| 176 |
+
"""
|
| 177 |
+
タイプでPOIを検索
|
| 178 |
+
|
| 179 |
+
Args:
|
| 180 |
+
poi_type: POIタイプ
|
| 181 |
+
lat: 緯度
|
| 182 |
+
lon: 経度
|
| 183 |
+
radius: 検索半径(メートル)
|
| 184 |
+
|
| 185 |
+
Returns:
|
| 186 |
+
POIのリスト
|
| 187 |
+
"""
|
| 188 |
+
cache_key = f"type:{poi_type}:{lat:.4f}:{lon:.4f}:{radius}"
|
| 189 |
+
if cache_key in self._cache:
|
| 190 |
+
return self._cache[cache_key]
|
| 191 |
+
|
| 192 |
+
query = self._build_type_query(poi_type, lat, lon, radius)
|
| 193 |
+
if not query:
|
| 194 |
+
return []
|
| 195 |
+
|
| 196 |
+
result = self._execute_query(query)
|
| 197 |
+
self._cache[cache_key] = result
|
| 198 |
+
return result
|
| 199 |
+
|
| 200 |
+
def _execute_query(self, query: str) -> List[POI]:
|
| 201 |
+
"""Overpass APIクエリを実行"""
|
| 202 |
+
try:
|
| 203 |
+
import requests
|
| 204 |
+
|
| 205 |
+
self._wait_for_rate_limit()
|
| 206 |
+
self._last_request_time = time.time()
|
| 207 |
+
|
| 208 |
+
response = requests.post(
|
| 209 |
+
self.OVERPASS_URL,
|
| 210 |
+
data={"data": query},
|
| 211 |
+
timeout=self.timeout,
|
| 212 |
+
)
|
| 213 |
+
response.raise_for_status()
|
| 214 |
+
|
| 215 |
+
return self._parse_response(response.json())
|
| 216 |
+
|
| 217 |
+
except Exception as e:
|
| 218 |
+
print(f"Overpass query error: {e}")
|
| 219 |
+
return []
|
| 220 |
+
|
| 221 |
+
def search_combined(
|
| 222 |
+
self,
|
| 223 |
+
names: List[str],
|
| 224 |
+
types: List[str],
|
| 225 |
+
lat: float,
|
| 226 |
+
lon: float,
|
| 227 |
+
radius: int = 500,
|
| 228 |
+
) -> Dict[str, List[POI]]:
|
| 229 |
+
"""
|
| 230 |
+
複合検索(名前とタイプ両方)
|
| 231 |
+
|
| 232 |
+
Returns:
|
| 233 |
+
{"names": {...}, "types": {...}} の形式
|
| 234 |
+
"""
|
| 235 |
+
result = {"names": {}, "types": {}}
|
| 236 |
+
|
| 237 |
+
for name in names:
|
| 238 |
+
pois = self.search_by_name(name, lat, lon, radius)
|
| 239 |
+
if pois:
|
| 240 |
+
result["names"][name] = pois
|
| 241 |
+
|
| 242 |
+
for poi_type in types:
|
| 243 |
+
pois = self.search_by_type(poi_type, lat, lon, radius)
|
| 244 |
+
if pois:
|
| 245 |
+
result["types"][poi_type] = pois
|
| 246 |
+
|
| 247 |
+
return result
|
| 248 |
+
|
| 249 |
+
def clear_cache(self) -> None:
|
| 250 |
+
"""キャッシュをクリア"""
|
| 251 |
+
self._cache.clear()
|
utils/__init__.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from .image_utils import resize_frame, frame_to_base64, frame_to_pil
|
| 2 |
+
from .text_cleaner import clean_ocr_text, extract_shop_names
|
| 3 |
+
from .geo_utils import haversine_distance, create_bounding_box
|
| 4 |
+
|
| 5 |
+
__all__ = [
|
| 6 |
+
"resize_frame",
|
| 7 |
+
"frame_to_base64",
|
| 8 |
+
"frame_to_pil",
|
| 9 |
+
"clean_ocr_text",
|
| 10 |
+
"extract_shop_names",
|
| 11 |
+
"haversine_distance",
|
| 12 |
+
"create_bounding_box",
|
| 13 |
+
]
|
utils/geo_utils.py
ADDED
|
@@ -0,0 +1,84 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""地理情報ユーティリティ"""
|
| 2 |
+
|
| 3 |
+
import math
|
| 4 |
+
from typing import Tuple, Optional
|
| 5 |
+
from dataclasses import dataclass
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
@dataclass
|
| 9 |
+
class BoundingBox:
|
| 10 |
+
"""バウンディングボックス"""
|
| 11 |
+
|
| 12 |
+
min_lat: float
|
| 13 |
+
max_lat: float
|
| 14 |
+
min_lon: float
|
| 15 |
+
max_lon: float
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def haversine_distance(
|
| 19 |
+
lat1: float, lon1: float, lat2: float, lon2: float
|
| 20 |
+
) -> float:
|
| 21 |
+
"""
|
| 22 |
+
2点間の距離を計算(メートル)
|
| 23 |
+
Haversine formula
|
| 24 |
+
"""
|
| 25 |
+
R = 6371000 # 地球の半径(メートル)
|
| 26 |
+
|
| 27 |
+
phi1 = math.radians(lat1)
|
| 28 |
+
phi2 = math.radians(lat2)
|
| 29 |
+
delta_phi = math.radians(lat2 - lat1)
|
| 30 |
+
delta_lambda = math.radians(lon2 - lon1)
|
| 31 |
+
|
| 32 |
+
a = (
|
| 33 |
+
math.sin(delta_phi / 2) ** 2
|
| 34 |
+
+ math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2) ** 2
|
| 35 |
+
)
|
| 36 |
+
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
|
| 37 |
+
|
| 38 |
+
return R * c
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
def create_bounding_box(
|
| 42 |
+
lat: float, lon: float, radius_m: int
|
| 43 |
+
) -> BoundingBox:
|
| 44 |
+
"""
|
| 45 |
+
中心座標と半径からバウンディングボックスを作成
|
| 46 |
+
"""
|
| 47 |
+
# 緯度1度あたり約111km
|
| 48 |
+
lat_delta = radius_m / 111000
|
| 49 |
+
|
| 50 |
+
# 経度1度あたりの距離は緯度によって変わる
|
| 51 |
+
lon_delta = radius_m / (111000 * math.cos(math.radians(lat)))
|
| 52 |
+
|
| 53 |
+
return BoundingBox(
|
| 54 |
+
min_lat=lat - lat_delta,
|
| 55 |
+
max_lat=lat + lat_delta,
|
| 56 |
+
min_lon=lon - lon_delta,
|
| 57 |
+
max_lon=lon + lon_delta,
|
| 58 |
+
)
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def format_coordinates(lat: float, lon: float, precision: int = 6) -> str:
|
| 62 |
+
"""座標を文字列にフォーマット"""
|
| 63 |
+
return f"{lat:.{precision}f}, {lon:.{precision}f}"
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def parse_coordinates(coord_str: str) -> Optional[Tuple[float, float]]:
|
| 67 |
+
"""座標文字列をパース"""
|
| 68 |
+
try:
|
| 69 |
+
parts = coord_str.replace(" ", "").split(",")
|
| 70 |
+
if len(parts) == 2:
|
| 71 |
+
return float(parts[0]), float(parts[1])
|
| 72 |
+
except ValueError:
|
| 73 |
+
pass
|
| 74 |
+
return None
|
| 75 |
+
|
| 76 |
+
|
| 77 |
+
def meters_to_degrees_lat(meters: float) -> float:
|
| 78 |
+
"""メートルを緯度の度に変換"""
|
| 79 |
+
return meters / 111000
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def meters_to_degrees_lon(meters: float, lat: float) -> float:
|
| 83 |
+
"""メートルを経度の度に変換(緯度依存)"""
|
| 84 |
+
return meters / (111000 * math.cos(math.radians(lat)))
|
utils/image_utils.py
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""画像処理ユーティリティ"""
|
| 2 |
+
|
| 3 |
+
import base64
|
| 4 |
+
import io
|
| 5 |
+
from typing import Tuple
|
| 6 |
+
|
| 7 |
+
import cv2
|
| 8 |
+
import numpy as np
|
| 9 |
+
from PIL import Image
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def resize_frame(
|
| 13 |
+
frame: np.ndarray, width: int = 640, height: int = 480
|
| 14 |
+
) -> np.ndarray:
|
| 15 |
+
"""フレームをリサイズ"""
|
| 16 |
+
return cv2.resize(frame, (width, height), interpolation=cv2.INTER_AREA)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def frame_to_base64(frame: np.ndarray, quality: int = 85) -> str:
|
| 20 |
+
"""フレームをBase64エンコード"""
|
| 21 |
+
# BGR -> RGB
|
| 22 |
+
if len(frame.shape) == 3 and frame.shape[2] == 3:
|
| 23 |
+
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 24 |
+
else:
|
| 25 |
+
frame_rgb = frame
|
| 26 |
+
|
| 27 |
+
img = Image.fromarray(frame_rgb)
|
| 28 |
+
buffer = io.BytesIO()
|
| 29 |
+
img.save(buffer, format="JPEG", quality=quality)
|
| 30 |
+
return base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def frame_to_pil(frame: np.ndarray) -> Image.Image:
|
| 34 |
+
"""NumPy配列をPIL Imageに変換"""
|
| 35 |
+
if len(frame.shape) == 3 and frame.shape[2] == 3:
|
| 36 |
+
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 37 |
+
else:
|
| 38 |
+
frame_rgb = frame
|
| 39 |
+
return Image.fromarray(frame_rgb)
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def pil_to_frame(img: Image.Image) -> np.ndarray:
|
| 43 |
+
"""PIL ImageをNumPy配列に変換"""
|
| 44 |
+
frame = np.array(img)
|
| 45 |
+
if len(frame.shape) == 3 and frame.shape[2] == 3:
|
| 46 |
+
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
|
| 47 |
+
return frame
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
def rotate_frame(frame: np.ndarray, angle: int) -> np.ndarray:
|
| 51 |
+
"""フレームを回転(0, 90, 180, 270度)"""
|
| 52 |
+
if angle == 90:
|
| 53 |
+
return cv2.rotate(frame, cv2.ROTATE_90_CLOCKWISE)
|
| 54 |
+
elif angle == 180:
|
| 55 |
+
return cv2.rotate(frame, cv2.ROTATE_180)
|
| 56 |
+
elif angle == 270:
|
| 57 |
+
return cv2.rotate(frame, cv2.ROTATE_90_COUNTERCLOCKWISE)
|
| 58 |
+
return frame
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
def get_frame_dimensions(frame: np.ndarray) -> Tuple[int, int]:
|
| 62 |
+
"""フレームの寸法を取得 (width, height)"""
|
| 63 |
+
return frame.shape[1], frame.shape[0]
|
utils/text_cleaner.py
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OCRテキスト正規化ユーティリティ"""
|
| 2 |
+
|
| 3 |
+
import re
|
| 4 |
+
import unicodedata
|
| 5 |
+
from typing import List, Set
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def normalize_text(text: str) -> str:
|
| 9 |
+
"""テキストの正規化(全角→半角、NFKC正規化)"""
|
| 10 |
+
# NFKC正規化(全角英数字→半角など)
|
| 11 |
+
text = unicodedata.normalize("NFKC", text)
|
| 12 |
+
return text.strip()
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def remove_noise(text: str) -> str:
|
| 16 |
+
"""ノイズ文字の除去"""
|
| 17 |
+
# 制御文字を除去
|
| 18 |
+
text = "".join(char for char in text if not unicodedata.category(char).startswith("C"))
|
| 19 |
+
# 連続する空白を1つに
|
| 20 |
+
text = re.sub(r"\s+", " ", text)
|
| 21 |
+
return text.strip()
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def clean_ocr_text(text: str) -> str:
|
| 25 |
+
"""OCR結果のクリーニング"""
|
| 26 |
+
if not text:
|
| 27 |
+
return ""
|
| 28 |
+
text = normalize_text(text)
|
| 29 |
+
text = remove_noise(text)
|
| 30 |
+
return text
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def extract_shop_names(texts: List[str]) -> List[str]:
|
| 34 |
+
"""店舗名らしきテキストを抽出"""
|
| 35 |
+
shop_patterns = [
|
| 36 |
+
r"([\u4e00-\u9fff\u3040-\u309f\u30a0-\u30ff]+(?:店|屋|堂|館|院|薬局|医院|クリニック|歯科|整骨院))",
|
| 37 |
+
r"(ローソン|セブン.?イレブン|ファミリーマート|ミニストップ|デイリーヤマザキ)",
|
| 38 |
+
r"(マクドナルド|すき家|吉野家|松屋|ガスト|サイゼリヤ|CoCo壱番屋)",
|
| 39 |
+
r"(ドラッグストア|マツモトキヨシ|ウエルシア|ツルハ|スギ薬局|サンドラッグ)",
|
| 40 |
+
r"(イオン|イトーヨーカドー|西友|ダイエー|ライフ|マルエツ)",
|
| 41 |
+
r"(LAWSON|FamilyMart|7-ELEVEN|MINISTOP)",
|
| 42 |
+
]
|
| 43 |
+
|
| 44 |
+
found: Set[str] = set()
|
| 45 |
+
for text in texts:
|
| 46 |
+
cleaned = clean_ocr_text(text)
|
| 47 |
+
for pattern in shop_patterns:
|
| 48 |
+
matches = re.findall(pattern, cleaned, re.IGNORECASE)
|
| 49 |
+
found.update(matches)
|
| 50 |
+
|
| 51 |
+
return list(found)
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def extract_address_parts(texts: List[str]) -> List[str]:
|
| 55 |
+
"""住所らしきテキストを抽出"""
|
| 56 |
+
address_patterns = [
|
| 57 |
+
r"([\u4e00-\u9fff]+[都道府県])",
|
| 58 |
+
r"([\u4e00-\u9fff]+[市区町村])",
|
| 59 |
+
r"(\d+丁目)",
|
| 60 |
+
r"(\d+-\d+(?:-\d+)?)",
|
| 61 |
+
]
|
| 62 |
+
|
| 63 |
+
found: Set[str] = set()
|
| 64 |
+
for text in texts:
|
| 65 |
+
cleaned = clean_ocr_text(text)
|
| 66 |
+
for pattern in address_patterns:
|
| 67 |
+
matches = re.findall(pattern, cleaned)
|
| 68 |
+
found.update(matches)
|
| 69 |
+
|
| 70 |
+
return list(found)
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
def extract_landmarks(texts: List[str]) -> List[str]:
|
| 74 |
+
"""ランドマーク名を抽出"""
|
| 75 |
+
landmark_patterns = [
|
| 76 |
+
r"([\u4e00-\u9fff]+駅)",
|
| 77 |
+
r"([\u4e00-\u9fff]+交差点)",
|
| 78 |
+
r"([\u4e00-\u9fff]+公園)",
|
| 79 |
+
r"([\u4e00-\u9fff]+橋)",
|
| 80 |
+
r"([\u4e00-\u9fff]+神社|[\u4e00-\u9fff]+寺)",
|
| 81 |
+
r"(国道\d+号)",
|
| 82 |
+
r"(県道\d+号)",
|
| 83 |
+
]
|
| 84 |
+
|
| 85 |
+
found: Set[str] = set()
|
| 86 |
+
for text in texts:
|
| 87 |
+
cleaned = clean_ocr_text(text)
|
| 88 |
+
for pattern in landmark_patterns:
|
| 89 |
+
matches = re.findall(pattern, cleaned)
|
| 90 |
+
found.update(matches)
|
| 91 |
+
|
| 92 |
+
return list(found)
|