Spaces:

Msk7000
/

Image_Clf_App_Implementation_Comparison

Running

App Files Files Community

Image_Clf_App_Implementation_Comparison / README.md

Msk7000

Upload 2 files

8bb2faf verified 4 days ago

preview code

raw

history blame contribute delete

5.14 kB

	---
	title: Image Classification 2015 vs 2025
	emoji: 🔍
	colorFrom: green
	colorTo: gray
	sdk: gradio
	sdk_version: 6.15.2
	app_file: app.py
	pinned: false
	python_version: "3.10"
	suggested_hardware: cpu-basic
	---

	# Image Classification Demo — 2015 vs 2025
	画像分類デモ — 2015 vs 2025 実装比較

	A Gradio app that demonstrates how dramatically machine learning implementation
	complexity has changed over a decade — using the same task (image → category
	prediction) as a benchmark.

	同じタスク（画像 → カテゴリ予測）を使って、10年間で機械学習の実装コストが
	いかに変化したかを比較する Gradio デモアプリです。

	---

	## What This App Does / このアプリについて

	Upload any image and get a top-5 category prediction from a pre-trained
	Vision Transformer (ViT). Alongside the result, the app shows the code
	required to build the same classifier in 2015 (Theano + NumPy, ~130 lines)
	versus 2025 (HuggingFace Transformers, 5 lines).

	画像をアップロードすると、事前学習済み ViT による上位5件の予測結果を表示します。
	あわせて、2015年（Theano + NumPy、約130行）と 2025年（HuggingFace Transformers、5行）
	の実装コードを左右に並べて比較します。

	---

	## Implementation Comparison / 実装比較

	\| Item / 項目 \| 2015 (Theano + NumPy) \| 2025 (HuggingFace) \|
	\|---\|---\|---\|
	\| Lines of code / 実装行数 \| ~130 lines \| 5 lines \|
	\| Model / モデル \| Hand-written CNN / 手書き CNN \| ViT-Base (pre-trained) / 事前学習済み \|
	\| Preprocessing / 前処理 \| Manual / 手動実装 \| Automatic / 自動 \|
	\| Training / 学習 \| SGD written by hand / 手動記述 \| Not required / 不要 \|
	\| Accuracy / 精度目安 \| ~70 % (CIFAR-10) \| ~81 % (ImageNet) \|
	\| Compile step / コンパイル \| Tens of seconds / 数十秒 \| Not required / 不要 \|

	---

	## File Structure / ファイル構成

	```
	.
	├── app.py # Gradio app — entry point / エントリポイント
	├── model_2025.py # 2025 implementation: HuggingFace pipeline (5 lines)
	│ # 2025 実装：HuggingFace pipeline（5 行）
	├── model_2015.py # 2015 implementation: Theano CNN (reference / 参照用)
	├── requirements.txt # Dependencies / 依存パッケージ
	└── README.md # This file / このファイル
	```

	---

	## Running Locally / ローカルでの起動

	```bash
	# 1. Clone / クローン
	git clone https://huggingface.co/spaces/<your-username>/image-classification-2015-vs-2025
	cd image-classification-2015-vs-2025

	# 2. Install dependencies / 依存をインストール
	pip install -r requirements.txt

	# 3. Launch / 起動
	python app.py
	# → http://localhost:7860
	```

	> Note / 注意: On first launch, the ViT model (~330 MB) is downloaded from
	> Hugging Face Hub automatically and cached in `~/.cache/huggingface/`.
	>
	> 初回起動時に ViT モデル（約330 MB）が HuggingFace Hub から自動ダウンロードされ、
	> `~/.cache/huggingface/` にキャッシュされます。

	---

	## Hardware / 動作環境

	This Space runs on CPU Basic (free tier — no GPU required).
	ViT-Base inference on CPU typically takes 2–5 seconds per image.

	このSpaceは CPU Basic（無料枠）で動作します。GPU は不要です。
	CPU 上での ViT-Base 推論は 1 枚あたり 2〜5秒程度です。

	\| Resource \| Spec \|
	\|---\|---\|
	\| Hardware \| CPU Basic (2 vCPU / 16 GB RAM) \|
	\| GPU \| None / なし \|
	\| Storage \| Ephemeral (model cached via HF Hub) \|

	---

	## About the 2015 Implementation / 2015年実装について

	`model_2015.py` is reference documentation only — it requires Python 3.8
	and Theano 1.0, which are no longer maintained and incompatible with Python 3.9+.
	The file is included to illustrate the implementation burden of the era.

	`model_2015.py` は参照用ドキュメントです。Python 3.8 と Theano 1.0 が必要で、
	現在はメンテナンスされておらず Python 3.9 以降では動作しません。
	当時の実装コストを示す資料として収録しています。

	What had to be hand-written in 2015 / 2015年当時に手書きが必要だったもの:
	- Weight initialization for each layer / 各層の重み初期化
	- Symbolic computation graph (conv → pool → softmax) / シンボルグラフ
	- Loss function, gradient computation, SGD update rules / 損失・勾配・SGD更新則
	- Theano function compilation / Theano 関数のコンパイル
	- Image preprocessing (normalization, CHW transpose) / 画像前処理
	- Training loop with manual batch splitting / 手動バッチ分割・学習ループ
	- Model save / load / モデルの保存・読み込み

	---

	## Tech Stack / 技術スタック

	\| Library \| Version \| Purpose / 用途 \|
	\|---\|---\|---\|
	\| `transformers` \| ≥ 4.40 \| ViT model & pipeline \|
	\| `torch` \| ≥ 2.2 \| Inference backend / 推論バックエンド \|
	\| `Pillow` \| ≥ 10.0 \| Image I/O / 画像入出力 \|
	\| `gradio` \| ≥ 4.36 \| Web UI \|

	---

	## License / ライセンス

	MIT