Spaces:

Msk7000
/

Image_Clf_App_Implementation_Comparison

Running

App Files Files Community

Msk7000 commited on 5 days ago

Commit

a60082f

verified ·

1 Parent(s): 20f1e50

Upload 6 files

Browse files

Files changed (6) hide show

README.md +109 -10
__pycache__/model_2025.cpython-313.pyc +0 -0
app.py +238 -0
model_2015.py +165 -0
model_2025.py +27 -0
requirements.txt +22 -0

README.md CHANGED Viewed

@@ -1,13 +1,112 @@
 ---
-title: Image Clf App Implementation Comparison
-emoji: 🏃
-colorFrom: yellow
-colorTo: green
-sdk: gradio
-sdk_version: 6.15.2
-python_version: '3.13'
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 画像分類デモ — 2015 vs 2025 実装比較
+同じ機能（画像 → カテゴリ予測）を **2 世代の実装**で並べて見比べるための教材・デモアプリです。
+推論は 2025 実装（HuggingFace ViT）が担い、2015 実装（Theano CNN）は
+実装コードを参照表示します。
 ---
+## 実装比較サマリー
+| 項目 | 2015（Theano + NumPy） | 2025（HuggingFace Transformers） |
+|---|---|---|
+| **実装行数** | 約 130 行 | 5 行 |
+| **モデル** | 手書き CNN | ViT-Base（事前学習済） |
+| **前処理** | 手動実装（正規化・CHW変換） | 自動 |
+| **学習** | SGD・ループ・勾配計算を手動記述 | 不要（Fine-tuning は別途） |
+| **精度目安** | ~70 % (CIFAR-10) | ~81 % (ImageNet) |
+| **コンパイル** | Theano グラフ最適化（数十秒） | 不要 |
+| **Python 対応** | Python 3.8 以下 | Python 3.10〜3.12 |
+> 約 **26 倍**のコード量の差で、同じ推論機能を実現できるようになりました。
 ---
+## ファイル構成
+```
+imgclf_app/
+├── app.py            # Gradio Web アプリ（エントリポイント）
+├── model_2025.py     # 2025 実装：HuggingFace pipeline（5 行）
+├── model_2015.py     # 2015 実装：Theano CNN（参照用ドキュメント）
+├── requirements.txt  # 依存パッケージ
+└── README.md         # このファイル
+```
+---
+## セットアップ
+### 1. リポジトリを取得
+```bash
+git clone <this-repo>
+cd imgclf_app
+```
+### 2. 仮想環境を作成して依存をインストール
+```bash
+python -m venv .venv
+source .venv/bin/activate        # Windows: .venv\Scripts\activate
+pip install -r requirements.txt
+```
+> **GPU を使う場合**（任意）：`torch` を CUDA 版に差し替えると推論が高速になります。
+> ```bash
+> pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
+> ```
+### 3. 起動
+```bash
+python app.py
+```
+ブラウザで `http://localhost:7860` を開きます。
+初回起動時に HuggingFace Hub から ViT モデル（約 330 MB）がダウンロードされます。
+---
+## 使い方
+1. 左側の画像アップロードエリアに画像をドロップ（または選択）
+2. **▶ 分類を実行** をクリック（または画像変更で自動実行）
+3. 予測結果（カテゴリ名とスコア上位 5 件）が表示される
+4. 右側で 2015 / 2025 のコードを並べて確認
+---
+## 2015 実装（`model_2015.py`）について
+`model_2015.py` は **Python 3.8 + Theano 1.0** 環境でのみ動作します。
+現在は Theano の開発が停止されており、Python 3.9 以降では動作しません。
+このファイルは**実装コストの比較・教育目的**のドキュメントとして収録しています。
+2015 年当時は以下をすべて手書きで実装する必要がありました。
+- 重みの初期化（各層の `W`, `b`）
+- シンボルグラフ（`conv2d` → `pool` → `flatten` → `softmax`）
+- 損失関数・勾配計算・SGD 更新則
+- Theano 関数のコンパイル
+- 画像前処理（正規化・次元変換）
+- 学習ループ（バッチ分割・epoch 管理）
+- モデルの保存・読み込み
+---
+## 技術スタック
+| ライブラリ | バージョン | 用途 |
+|---|---|---|
+| `transformers` | ≥ 4.40 | ViT モデル・pipeline |
+| `torch` | ≥ 2.2 | 推論バックエンド |
+| `Pillow` | ≥ 10.0 | 画像入出力 |
+| `gradio` | ≥ 4.36 | Web UI |
+---
+## ライセンス
+MIT

__pycache__/model_2025.cpython-313.pyc ADDED Viewed

Binary file (943 Bytes). View file

app.py ADDED Viewed

	@@ -0,0 +1,238 @@

+"""
+Image Classification Demo — 2015 vs 2025 Implementation Comparison
+画像分類デモアプリ — 2015 vs 2025 実装比較
+====================================================
+Compares the same feature (image → category prediction) across two generations.
+同じ機能（画像 → カテゴリ予測）を 2 世代の実装で並べて表示する。
+Inference is handled by the 2025 implementation (HuggingFace ViT).
+推論は 2025 実装（HuggingFace ViT）が担い、
+The 2015 implementation (Theano CNN) is shown as reference code.
+2015 実装（Theano CNN）は実装コードを参照表示する。
+Usage / 起動方法:
+    python app.py
+"""
+import textwrap
+import gradio as gr
+from model_2025 import classify as classify_2025
+# ── Code snippets for display / 表示用コードスニペット ─────────────────────────
+CODE_2015 = textwrap.dedent("""\
+    # 2015 Implementation — Theano + NumPy  (excerpt, ~130 lines)
+    # 2015 実装 — Theano + NumPy（抜粋・約 130 行）
+    # ❶ Manually Initialize the Weights
+    #    重みを手動で初期化
+    W0 = theano.shared(np.random.normal(0, 0.01, (32,3,5,5)), 'W0')
+    W1 = theano.shared(np.random.normal(0, 0.01, (64,32,5,5)), 'W1')
+    W2 = theano.shared(np.random.normal(0, 0.01, (1600,512)), 'W2')
+    W3 = theano.shared(np.random.normal(0, 0.01, (512,10)),  'W3')
+    # ... b0, b1, b2, b3 defined in the same way / 同様に定義 ...
+    # ❷ Hand-write the Symbolic Computation Graph
+    #    シンボルグラフを手書き
+    x    = T.tensor4('x')
+    conv0 = T.tanh(pool.pool_2d(
+                conv2d(x, W0, filter_shape=(32,3,5,5))
+                + b0.dimshuffle('x',0,'x','x'),
+                ws=(2,2), ignore_border=True))
+    conv1 = T.tanh(pool.pool_2d(
+                conv2d(conv0, W1, filter_shape=(64,32,5,5))
+                + b1.dimshuffle('x',0,'x','x'),
+                ws=(2,2), ignore_border=True))
+    flat  = conv1.flatten(2)
+    fc    = T.tanh(T.dot(flat, W2) + b2)
+    out   = T.nnet.softmax(T.dot(fc, W3) + b3)
+    # ❸ Manually Define Loss, Gradients, and SGD Update Rules
+    #    損失・勾配・SGD 更新則を手動定義
+    loss    = -T.mean(T.log(out)[T.arange(y.shape[0]), y])
+    grads   = T.grad(loss, [W0,b0,W1,b1,W2,b2,W3,b3])
+    updates = [(p, p - 0.01*g) for p, g in zip(params, grads)]
+    # ❹ Compile Theano Functions  (takes tens of seconds)
+    #    Theano 関数をコンパイル（数十秒かかる）
+    train_fn = theano.function([x, y], loss, updates=updates)
+    pred_fn  = theano.function([x], T.argmax(out, axis=1))
+    # ❺ Manually Implement Preprocessing
+    #    前処理を手動実装
+    def preprocess(path):
+        img = Image.open(path).convert('RGB').resize((32,32))
+        arr = (np.array(img)/255.0 - MEAN) / STD
+        return arr.transpose(2,0,1)[np.newaxis]
+    # ❻ Manually Implement the Training Loop
+    #    学習ループを手動実装
+    for epoch in range(200):
+        for batch in range(n // 50):
+            train_fn(X[batch], y[batch])
+    # ❼ Run Inference / 推論
+    idx = pred_fn(preprocess('cat.jpg'))[0]
+    return LABELS[idx]
+""")
+CODE_2025 = textwrap.dedent("""\
+    # 2025 Implementation — HuggingFace Transformers  (just 5 lines)
+    # 2025 実装 — HuggingFace Transformers（実質 5 行）
+    from transformers import pipeline
+    # ❶ Load a Pre-trained Model
+    #    事前学習済みモデルをロード
+    classifier = pipeline(
+        "image-classification",
+        model="google/vit-base-patch16-224",
+    )
+    # ❷ Run Inference  (preprocessing & postprocessing are automatic)
+    #    推論（前処理・後処理すべて自動）
+    result = classifier("cat.jpg", top_k=5)
+    # → [{'label': 'tabby cat', 'score': 0.923}, ...]
+""")
+# ── Comparison table / 比較表 ────────────────────────────────────────────────
+COMPARISON_MD = """\
+| Item<br><small style="color:#999">項目</small> | 2015 (Theano) | 2025 (HuggingFace) |
+|---|---|---|
+| **Lines of code**<br><small style="color:#999">実装行数</small> | ~130 lines | 5 lines |
+| **Model**<br><small style="color:#999">モデル</small> | Hand-written CNN<br><small style="color:#999">手書き CNN</small> | ViT-Base (pre-trained)<br><small style="color:#999">ViT-Base（事前学習済）</small> |
+| **Preprocessing**<br><small style="color:#999">前処理</small> | Manual<br><small style="color:#999">手動実装</small> | Automatic<br><small style="color:#999">自動</small> |
+| **Training**<br><small style="color:#999">学習</small> | SGD written by hand<br><small style="color:#999">SGD 手動記述</small> | Not required (fine-tuning is separate)<br><small style="color:#999">不要（Fine-tuning は別途）</small> |
+| **Accuracy (approx.)**<br><small style="color:#999">精度目安</small> | ~70 % (CIFAR-10) | ~81 % (ImageNet) |
+| **Theano compile step**<br><small style="color:#999">コンパイル</small> | Tens of seconds<br><small style="color:#999">数十秒</small> | Not required<br><small style="color:#999">不要</small> |
+"""
+# ── Inference function / 推論関数 ────────────────────────────────────────────
+def run_inference(image):
+    """Classify the uploaded image with ViT and return top-5 scores.
+    アップロード画像を ViT で分類し、スコア上位 5 件を返す。"""
+    if image is None:
+        return {}, CODE_2015, CODE_2025
+    results = classify_2025(image)
+    label_scores = {r["label"]: float(r["score"]) for r in results}
+    return label_scores, CODE_2015, CODE_2025
+# ── UI / UI 定義 ─────────────────────────────────────────────────────────────
+CSS = """
+.code-2015 textarea { border-left: 3px solid #888780 !important; }
+.code-2025 textarea { border-left: 3px solid #1D9E75 !important; }
+.bilingual-label .label-wrap span {
+    display: block;
+}
+"""
+def _bi(en, ja):
+    """Return bilingual Markdown: English normal, Japanese small gray below."""
+    return f"{en}<br><small style='color:#999'>{ja}</small>"
+with gr.Blocks(
+    title="Image Classification: 2015 vs 2025",
+    css=CSS,
+    theme=gr.themes.Default(
+        font=["BIZ UDPGothic", "Noto Sans JP", "sans-serif"],
+        primary_hue=gr.themes.colors.emerald,
+    ),
+) as demo:
+    gr.Markdown(
+        """
+        # Image Classification Demo — 2015 vs 2025
+        <small style="color:#999">画像分類デモ — 2015 vs 2025 実装比較</small>
+        **The same feature (image → category prediction) compared across two generations of implementation.**
+        <br><small style="color:#999">同じ機能（画像 → カテゴリ予測）を 2 世代の実装で比較する。</small>
+        Inference is handled by the 2025 implementation (ViT).
+        <br><small style="color:#999">推論は 2025 実装（ViT）が担います。</small>
+        """
+    )
+    with gr.Row():
+        # ── Left column: upload + result ──────────────────────────────────
+        with gr.Column(scale=1):
+            img_input = gr.Image(
+                type="pil",
+                label="Upload an Image / 画像をアップロード",
+                height=280,
+            )
+            run_btn = gr.Button(
+                "▶  Run Classification / 分類を実行",
+                variant="primary",
+            )
+            results_output = gr.Label(
+                num_top_classes=5,
+                label="Prediction Results (2025 implementation) / 予測結果（2025 実装）",
+            )
+        # ── Right column: code comparison ─────────────────────────────────
+        with gr.Column(scale=2):
+            gr.Markdown(
+                """
+                ### Code Implementation Comparison
+                <small style="color:#999">実装コードの比較</small>
+                > Difference in lines of code required to implement the same inference feature.
+                > <small style="color:#999">同じ推論機能を実装するのに必要なコード量の差</small>
+                """
+            )
+            with gr.Row():
+                with gr.Column():
+                    gr.Markdown(
+                        "**🕰️ 2015 Implementation — Theano + NumPy (~130 lines)**"
+                        "<br><small style='color:#999'>2015 実装 — Theano + NumPy（約 130 行）</small>"
+                    )
+                    code_2015_box = gr.Code(
+                        value=CODE_2015,
+                        language="python",
+                        label="",
+                        lines=30,
+                        interactive=False,
+                        elem_classes=["code-2015"],
+                    )
+                with gr.Column():
+                    gr.Markdown(
+                        "**✅ 2025 Implementation — HuggingFace Transformers (5 lines)**"
+                        "<br><small style='color:#999'>2025 実装 — HuggingFace Transformers（5 行）</small>"
+                    )
+                    code_2025_box = gr.Code(
+                        value=CODE_2025,
+                        language="python",
+                        label="",
+                        lines=30,
+                        interactive=False,
+                        elem_classes=["code-2025"],
+                    )
+    gr.Markdown("---")
+    gr.Markdown(
+        "### Implementation Comparison Summary\n"
+        "<small style='color:#999'>実装比較サマリー</small>"
+    )
+    gr.Markdown(COMPARISON_MD)
+    # Event binding / イベント��インド
+    run_btn.click(
+        fn=run_inference,
+        inputs=[img_input],
+        outputs=[results_output, code_2015_box, code_2025_box],
+    )
+    img_input.change(
+        fn=run_inference,
+        inputs=[img_input],
+        outputs=[results_output, code_2015_box, code_2025_box],
+    )
+if __name__ == "__main__":
+    demo.launch()

model_2015.py ADDED Viewed

	@@ -0,0 +1,165 @@

+"""
+2015 実装（参照用）— Theano + NumPy による手書き CNN
+現在の Theano は Python 3.12 以降では動作しないため、
+このファイルはアーキテクチャ記録・比較用途のドキュメントとして保存する。
+実際の推論は model_2025.py で行う。
+"""
+# ---------------------------------------------------------------------------
+# ※ 以下のコードは Python 3.8 / Theano 1.0 環境での動作を前提とする
+# ---------------------------------------------------------------------------
+import numpy as np
+try:
+    import theano
+    import theano.tensor as T
+    from theano.tensor.nnet import conv2d
+    from theano.tensor.signal import pool
+    THEANO_AVAILABLE = True
+except ImportError:
+    THEANO_AVAILABLE = False
+from PIL import Image
+import pickle
+# ハイパーパラメータ
+LEARNING_RATE = 0.01
+N_EPOCHS      = 200
+BATCH_SIZE    = 50
+N_CLASSES     = 10
+LABELS = [
+    "airplane", "automobile", "bird", "cat", "deer",
+    "dog", "frog", "horse", "ship", "truck",
+]
+MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
+STD  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
+# ── ユーティリティ ──────────────────────────────────────────────────────────
+def _shared_w(shape, name):
+    return theano.shared(
+        np.random.normal(0, 0.01, shape).astype(np.float32),
+        name=name,
+    )
+def _shared_b(n, name):
+    return theano.shared(np.zeros(n, dtype=np.float32), name=name)
+# ── モデル構築 ──────────────────────────────────────────────────────────────
+def build_model():
+    if not THEANO_AVAILABLE:
+        raise RuntimeError(
+            "Theano がインストールされていません。"
+            "Python 3.8 + Theano 1.0 の環境が必要です。"
+        )
+    # 重みパラメータ
+    W0 = _shared_w((32, 3, 5, 5), "W0")    # Conv 1
+    b0 = _shared_b(32, "b0")
+    W1 = _shared_w((64, 32, 5, 5), "W1")   # Conv 2
+    b1 = _shared_b(64, "b1")
+    W2 = _shared_w((64 * 5 * 5, 512), "W2") # FC 1
+    b2 = _shared_b(512, "b2")
+    W3 = _shared_w((512, N_CLASSES), "W3")  # 出力
+    b3 = _shared_b(N_CLASSES, "b3")
+    params = [W0, b0, W1, b1, W2, b2, W3, b3]
+    # シンボル変数
+    x = T.tensor4("x")
+    y = T.ivector("y")
+    # フォワードパス（手動実装）
+    conv0 = T.tanh(
+        pool.pool_2d(
+            conv2d(x, W0,
+                   input_shape=(BATCH_SIZE, 3, 32, 32),
+                   filter_shape=(32, 3, 5, 5))
+            + b0.dimshuffle("x", 0, "x", "x"),
+            ws=(2, 2), ignore_border=True,
+        )
+    )
+    conv1 = T.tanh(
+        pool.pool_2d(
+            conv2d(conv0, W1, filter_shape=(64, 32, 5, 5))
+            + b1.dimshuffle("x", 0, "x", "x"),
+            ws=(2, 2), ignore_border=True,
+        )
+    )
+    flat = conv1.flatten(2)
+    fc   = T.tanh(T.dot(flat, W2) + b2)
+    out  = T.nnet.softmax(T.dot(fc, W3) + b3)
+    # 損失・勾配・SGD 更新則
+    loss    = -T.mean(T.log(out)[T.arange(y.shape[0]), y])
+    pred    =  T.argmax(out, axis=1)
+    err     =  T.mean(T.neq(pred, y))
+    grads   =  T.grad(loss, params)
+    updates = [(p, p - LEARNING_RATE * g) for p, g in zip(params, grads)]
+    # Theano 関数のコンパイル（GPUグラフ最適化が走るため数十秒かかる）
+    train_fn = theano.function([x, y], [loss, err], updates=updates)
+    pred_fn  = theano.function([x], pred)
+    return params, train_fn, pred_fn
+# ── 前処理 ─────────────────────────────────────────────────────────────────
+def preprocess(image) -> np.ndarray:
+    """PIL.Image または パスを受け取り、(1, 3, 32, 32) の float32 配列を返す。"""
+    if isinstance(image, str):
+        image = Image.open(image)
+    img = image.convert("RGB").resize((32, 32))
+    arr = np.array(img, dtype=np.float32) / 255.0
+    arr = (arr - MEAN) / STD     # チャンネルごとの正規化
+    arr = arr.transpose(2, 0, 1) # HWC → CHW
+    return arr[np.newaxis]       # バッチ次元を追加
+# ── 学習ループ ──────────────────────────────────────────────────────────────
+def fit(train_fn, X_train: np.ndarray, y_train: np.ndarray) -> None:
+    n = len(X_train)
+    for epoch in range(N_EPOCHS):
+        idx = np.random.permutation(n)
+        losses, errs = [], []
+        for i in range(n // BATCH_SIZE):
+            b = idx[i * BATCH_SIZE : (i + 1) * BATCH_SIZE]
+            l, e = train_fn(X_train[b], y_train[b])
+            losses.append(l)
+            errs.append(e)
+        print(
+            f"Epoch {epoch + 1:3d} / {N_EPOCHS}  "
+            f"loss={np.mean(losses):.4f}  "
+            f"err={np.mean(errs):.4f}"
+        )
+# ── 推論 ────────────────────────────────────────────────────────────────────
+def classify(pred_fn, image) -> str:
+    arr = preprocess(image)
+    idx = pred_fn(arr)[0]
+    return LABELS[idx]
+# ── モデルの保存 / 読み込み ──────────────────────────────────────────────────
+def save_model(params, path: str) -> None:
+    weights = [p.get_value() for p in params]
+    with open(path, "wb") as f:
+        pickle.dump(weights, f, protocol=2)
+def load_model(params, path: str) -> None:
+    with open(path, "rb") as f:
+        weights = pickle.load(f)
+    for p, w in zip(params, weights):
+        p.set_value(w)

model_2025.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""
+2025 実装 — HuggingFace Transformers を使った画像分類
+実質 5 行で前処理・推論・後処理がすべて完結する。
+"""
+from transformers import pipeline
+# モデルのロード（初回のみダウンロード）
+classifier = pipeline(
+    "image-classification",
+    model="google/vit-base-patch16-224",
+)
+def classify(image) -> list[dict]:
+    """
+    Parameters
+    ----------
+    image : PIL.Image | str
+        PIL 画像オブジェクト、またはファイルパス文字列
+    Returns
+    -------
+    list[dict]
+        [{"label": str, "score": float}, ...]  上位 5 件
+    """
+    return classifier(image, top_k=5)

requirements.txt ADDED Viewed

	@@ -0,0 +1,22 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# 画像分類デモアプリ — requirements.txt
+# Python 3.10 〜 3.12 推奨
+# ─────────────────────────────────────────────────────────────────────────────
+# ── 2025 実装（動作に必須） ───────────────────────────────────────────────────
+transformers>=4.40.0          # HuggingFace Transformers（ViT モデル）
+torch>=2.2.0                  # PyTorch バックエンド（CPU でも動作）
+torchvision>=0.17.0           # 画像変換ユーティリティ
+Pillow>=10.0.0                # 画像入出力
+# ── UI ───────────────────────────────────────────────────────────────────────
+gradio>=4.36.0                # Web UI フレームワーク
+# ─────────────────────────────────────────────────────────────────────────────
+# 2015 実装（参照用ドキュメントとして model_2015.py を収録）
+# 実行には Python 3.8 + Theano 1.0 が必要なため、現環境では動作しません。
+# 比較・教育目的のコード記録として保存しています。
+#
+#   theano==1.0.5              # Python 3.8 以下が必要
+#   numpy==1.19.5
+# ─────────────────────────────────────────────────────────────────────────────