|
|
---
|
|
|
title: "Muse:Eye β On-device Multimodal XR Docent"
|
|
|
emoji: π¨
|
|
|
colorFrom: blue
|
|
|
colorTo: purple
|
|
|
sdk: other
|
|
|
pipeline_tag: other
|
|
|
tags:
|
|
|
- multimodal
|
|
|
- XR
|
|
|
- on-device
|
|
|
- tflite
|
|
|
- mobileclip
|
|
|
- android
|
|
|
- unity
|
|
|
- RAG
|
|
|
- museum-tech
|
|
|
---
|
|
|
|
|
|
# Muse:Eye
|
|
|
**On-device Multimodal AI + XR Museum Docent for Android**
|
|
|
|
|
|
Muse:Eyeλ **μΉ΄λ©λΌ κΈ°λ° μν μΈμ(MobileCLIP)**,
|
|
|
**μ¨λλ°μ΄μ€ LLM(Gemma-3N)**,
|
|
|
**XR μΈν°λμ
(Unity MR)**
|
|
|
μ κ²°ν©ν νμ΄λΈλ¦¬λ AI λμ¨νΈ μμ€ν
μ
λλ€.
|
|
|
|
|
|
μ€νλΌμΈ νκ²½μμλ λΉ λ₯Έ μ΄λ―Έμ§ κ²μ,
|
|
|
λ©νλ°μ΄ν° κΈ°λ° RAG,
|
|
|
μ±μΈ/μ΄λ¦°μ΄/μ λ¬Έκ° λͺ¨λ μ€λͺ
μ κ³΅μ΄ κ°λ₯ν©λλ€.
|
|
|
|
|
|
---
|
|
|
|
|
|
# β¨ Features
|
|
|
|
|
|
### 1) β‘ On-device AI (No Internet Required)
|
|
|
- MobileCLIP (TFLite) μ΄λ―Έμ§ μλ² λ©
|
|
|
- μν μ μ¬λ κ²μ (FAISS β custom `.bin` index)
|
|
|
- Gemma-3N μ¨λλ°μ΄μ€ LLM
|
|
|
|
|
|
### 2) πΌ Artwork Recognition
|
|
|
- μΉ΄λ©λΌ μΈμ β μ¦μ embedding μΆμΆ
|
|
|
- RAG κΈ°λ° μνΒ·μκ° μ€λͺ
μμ±
|
|
|
|
|
|
### 3) πΆπ§β𦳠Three Docent Modes
|
|
|
- μ΄λ¦°μ΄ λͺ¨λ
|
|
|
- μ±μΈ λͺ¨λ
|
|
|
- μ λ¬Έκ° λͺ¨λ
|
|
|
|
|
|
### 4) π₯½ XR Integrated
|
|
|
- Unity + Android Studio ν΅ν©
|
|
|
- MR λͺ¨λμμ 3D UI λ° μμ± μλ΄ μ 곡
|
|
|
|
|
|
---
|
|
|
# π System Architecture
|
|
|
|
|
|
Muse:Eyeλ λ κ°μ§ μ€ν λͺ¨λλ₯Ό μ§μν©λλ€:
|
|
|
|
|
|
- **Mobile Mode (On-device AI)** β μμ μ€νλΌμΈ
|
|
|
- **XR/MR Mode (Cloud Multimodal AI)** β Unity κΈ°λ° MR + Cloud API
|
|
|
|
|
|
---
|
|
|
|
|
|
## π± A. Mobile Mode (On-device AI)
|
|
|
|
|
|
```plaintext
|
|
|
Mobile App (Android)
|
|
|
β
|
|
|
βββ Camera Input
|
|
|
β
|
|
|
βββ MobileCLIP (TFLite)
|
|
|
β ββ on-device embedding (512-d)
|
|
|
β
|
|
|
βββ Embedding Index (.bin)
|
|
|
β ββ cosine similarity search (offline)
|
|
|
β
|
|
|
βββ Local Metadata RAG
|
|
|
β ββ artwork metadata lookup
|
|
|
β
|
|
|
βββ Gemma-3N On-device LLM
|
|
|
β ββ adult / child / expert explanation
|
|
|
β
|
|
|
βββ Android TTS
|
|
|
```
|
|
|
|
|
|
---
|
|
|
|
|
|
## π₯½ B. XR/MR Mode (Cloud Multimodal AI)
|
|
|
|
|
|
```plaintext
|
|
|
Unity XR App (MR View)
|
|
|
β
|
|
|
βββ MR Camera (RenderTexture)
|
|
|
β
|
|
|
βββ JNI Bridge β Android
|
|
|
β ββ frame bytes μ λ¬
|
|
|
β
|
|
|
βββ Cloud Multimodal API (Gemini)
|
|
|
β ββ artwork analysis
|
|
|
β ββ style / meaning extraction
|
|
|
β ββ multimodal reasoning
|
|
|
β
|
|
|
βββ Cloud PromptManager
|
|
|
β ββ μ΄λ¦°μ΄ λͺ¨λ
|
|
|
β ββ μ±μΈ λͺ¨λ
|
|
|
β ββ μ λ¬Έκ° λͺ¨λ
|
|
|
β
|
|
|
βββ Unity 3D UI + Android TTS
|
|
|
ββ floating info panel
|
|
|
βββ 3D guide elements
|
|
|
βββ audio output
|
|
|
```
|
|
|
|