poprap
/

vit16L-FT-cellclassification

Image Classification

Transformers

Safetensors

vit

Model card Files Files and versions

xet

Community

dakesan commited on Feb 23, 2025

Commit

e363d75

1 Parent(s): 4e3239c

initial commit

Browse files

Files changed (2) hide show

README.md +298 -192
prepare_data.py +134 -0

README.md CHANGED Viewed

@@ -3,197 +3,303 @@ library_name: transformers
 tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 tags: []
 ---
+# Fine-tuned ViT image classifier
+This repository provides a fine-tuned Vision Transformer model for classifying leukemia patient peripheral blood mononuclear cells.
+## Model Overview
+- **Base Model**: google/vit-large-patch16-224-in21k
+- **Task**: 5-class classification of leukemia cells
+- **Input**: 224x224 pixel dual-channel fluorescence microscopy images (R: ch1, G: ch6)
+- **Output**: Probability distribution over 5 classes
+## Performance
+- **Architecture**: ViT-Large/16 (patch size 16x16)
+- **Parameters**: ~307M
+- **Accuracy**: 94.67% (evaluation dataset)
+## Data Preparation
+### Prerequisites for Data Processing
+```bash
+# Required libraries for image processing
+pip install numpy pillow tifffile
+```
+### Data Processing Tool
+`tools/prepare_data.py` is a lightweight script for preprocessing dual-channel (ch1, ch6) cell images.
+Implemented primarily using standard libraries, it performs the following operations:
+1. Detects ch1 and ch6 image pairs
+2. Normalizes each channel (0-255 scaling)
+3. Converts to RGB format (R: ch1, G: ch6, B: empty channel)
+4. Saves to specified output directory
+```bash
+# Basic usage
+python prepare_data.py input_dir output_dir
+# Example with options
+python prepare_data.py \
+    /path/to/raw_images \
+    /path/to/processed_images \
+    --workers 8 \
+    --recursive
+```
+#### Options
+- `--workers`: Number of parallel workers (default: 4)
+- `--recursive`: Process subdirectories recursively
+#### Input Directory Structure
+```
+input_dir/
+  ├── class1/
+  │   ├── ch1_1.tif
+  │   ├── ch6_1.tif
+  │   ├── ch1_2.tif
+  │   └── ch6_2.tif
+  └── class2/
+      ├── ch1_1.tif
+      ├── ch6_1.tif
+      ...
+```
+#### Output Directory Structure
+```
+output_dir/
+  ├── class1/
+  │   ├── merged_1.tif
+  │   └── merged_2.tif
+  └── class2/
+      ├── merged_1.tif
+      ...
+```
+## Model Usage
+### Prerequisites for Model
+```bash
+# Required libraries for model inference
+pip install torch torchvision transformers
+```
+### Usage Example
+#### Single Image Inference
+```python
+from transformers import ViTForImageClassification, ViTImageProcessor
+import torch
+from PIL import Image
+# Load model and processor
+model = ViTForImageClassification.from_pretrained("poprap/vit16L-FT-cellclassification")
+processor = ViTImageProcessor.from_pretrained("poprap/vit16L-FT-cellclassification")
+# Preprocess image
+image = Image.open("cell_image.tif")
+inputs = processor(images=image, return_tensors="pt")
+# Inference
+outputs = model(**inputs)
+probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
+predicted_class = torch.argmax(probabilities, dim=-1).item()
+```
+#### Batch Processing and Evaluation
+For batch processing and comprehensive evaluation metrics calculation:
+```python
+import torch
+import numpy as np
+import time
+from pathlib import Path
+from tqdm import tqdm
+from torchvision import transforms, datasets
+from torch.utils.data import DataLoader
+from transformers import ViTForImageClassification, ViTImageProcessor
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.metrics import (
+    confusion_matrix, accuracy_score, recall_score,
+    precision_score, f1_score, roc_auc_score,
+    classification_report
+)
+from sklearn.preprocessing import label_binarize
+# --- 1. データセット準備用関数 ---
+def transform_function(feature_extractor, img):
+    resized = transforms.Resize((224, 224))(img)
+    encoded = feature_extractor(images=resized, return_tensors="pt")
+    return encoded["pixel_values"][0]
+def collate_fn(batch):
+    pixel_values = torch.stack([item[0] for item in batch])
+    labels = torch.tensor([item[1] for item in batch])
+    return {"pixel_values": pixel_values, "labels": labels}
+# --- 2. モデルとデータセットの準備 ---
+# モデルの準備
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = ViTForImageClassification.from_pretrained("poprap/vit16L-FT-cellclassification")
+feature_extractor = ViTImageProcessor.from_pretrained("poprap/vit16L-FT-cellclassification")
+model.to(device)
+# データセットとデータローダーの準備
+eval_dir = Path("path/to/eval/data")  # 評価データのパス
+dataset = datasets.ImageFolder(
+    root=str(eval_dir),
+    transform=lambda img: transform_function(feature_extractor, img)
+)
+dataloader = DataLoader(
+    dataset,
+    batch_size=32,
+    shuffle=False,
+    collate_fn=collate_fn
+)
+# --- 3. バッチ推論の実行 ---
+model.eval()
+all_preds = []
+all_labels = []
+all_probs = []
+start_time = time.time()
+with torch.no_grad():
+    for batch in tqdm(dataloader, desc="Evaluating"):
+        inputs = batch["pixel_values"].to(device)
+        labels = batch["labels"].to(device)
+        outputs = model(inputs)
+        logits = outputs.logits
+        probs = torch.softmax(logits, dim=1)
+        preds = torch.argmax(probs, dim=1)
+        all_preds.extend(preds.cpu().numpy())
+        all_labels.extend(labels.cpu().numpy())
+        all_probs.extend(probs.cpu().numpy())
+end_time = time.time()
+# --- 4. 性能指標の計算 ---
+# 処理時間の計算
+total_images = len(all_labels)
+total_time = end_time - start_time
+time_per_image = total_time / total_images
+# 基本的な指標
+cm = confusion_matrix(all_labels, all_preds)
+accuracy = accuracy_score(all_labels, all_preds)
+recall_weighted = recall_score(all_labels, all_preds, average="weighted")
+precision_weighted = precision_score(all_labels, all_preds, average="weighted")
+f1_weighted = f1_score(all_labels, all_preds, average="weighted")
+# クラスごとのAUC計算
+num_classes = len(dataset.classes)
+all_labels_onehot = label_binarize(all_labels, classes=range(num_classes))
+all_probs = np.array(all_probs)
+auc_scores = {}
+for class_idx in range(num_classes):
+    try:
+        auc = roc_auc_score(all_labels_onehot[:, class_idx], all_probs[:, class_idx])
+        auc_scores[dataset.classes[class_idx]] = auc
+    except ValueError:
+        auc_scores[dataset.classes[class_idx]] = None
+# --- 5. 結果の可視化 ---
+# Confusion Matrixの可視化
+plt.figure(figsize=(10, 8))
+sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
+            xticklabels=dataset.classes,
+            yticklabels=dataset.classes)
+plt.xlabel("Predicted Label")
+plt.ylabel("True Label")
+plt.title("Confusion Matrix")
+plt.tight_layout()
+plt.show()
+# 結果の出力
+print(f"\nEvaluation Results:")
+print(f"Accuracy: {accuracy:.4f}")
+print(f"Weighted Recall: {recall_weighted:.4f}")
+print(f"Weighted Precision: {precision_weighted:.4f}")
+print(f"Weighted F1: {f1_weighted:.4f}")
+print(f"\nAUC Scores per Class:")
+for class_name, auc in auc_scores.items():
+    print(f"{class_name}: {auc:.4f}" if auc is not None else f"{class_name}: N/A")
+print(f"\nDetailed Classification Report:")
+print(classification_report(all_labels, all_preds, target_names=dataset.classes))
+print(f"\nPerformance Metrics:")
+print(f"Total images evaluated: {total_images}")
+print(f"Total time: {total_time:.2f} seconds")
+print(f"Average time per image: {time_per_image:.4f} seconds")
+```
+This example demonstrates how to:
+1. Process multiple images in batches
+2. Calculate comprehensive evaluation metrics
+3. Generate confusion matrix visualization
+4. Measure inference time performance
+Key metrics calculated:
+- Accuracy, Precision, Recall, F1-score
+- Class-wise AUC scores
+- Confusion matrix
+- Detailed classification report
+- Processing time statistics
+## Training Configuration
+The model was fine-tuned with the following settings:
+### Hyperparameters
+- Batch size: 56
+- Learning rate: 1e-5
+- Number of epochs: 20
+- Mixed precision training (FP16)
+- Label smoothing: 0.1
+- Cosine scheduling with warmup (warmup steps: 100)
+### Data Augmentation
+- RandomResizedCrop (224x224, scale=(0.8, 1.0))
+- RandomHorizontalFlip
+- RandomRotation (±10 degrees)
+- ColorJitter (brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1)
+### Implementation Details
+- Utilized HuggingFace Transformers' `Trainer` class
+- Checkpoint saving: every 100 steps
+- Evaluation: every 100 steps
+- Logging: every 10 steps
+## Data Source
+This project uses data from the following research paper:
+Phillip Eulenberg, Niklas Köhler, Thomas Blasi, Andrew Filby, Anne E. Carpenter, Paul Rees, Fabian J. Theis & F. Alexander Wolf. "Reconstructing cell cycle and disease progression using deep learning." Nature Communications volume 8, Article number: 463 (2017).
+## License
+This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0), inheriting the same license as the base Google Vision Transformer model.
+## Citations
+```bibtex
+@misc{dosovitskiy2021vit,
+    title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
+    author={Alexey Dosovitskiy and others},
+    year={2021},
+    eprint={2010.11929},
+    archivePrefix={arXiv}
+}

prepare_data.py ADDED Viewed

	@@ -0,0 +1,134 @@

+#!/usr/bin/env python3
+"""
+白血病細胞画像の前処理用スクリプト
+ch1とch6の画像をマージし、正規化してRGB形式で保存します。
+必要なライブラリ:
+- numpy
+- tifffile (TIFFファイルの読み込み用)
+- PIL (画像処理用)
+使用方法:
+python prepare_data.py input_dir output_dir [--workers N] [--recursive]
+"""
+import argparse
+from pathlib import Path
+import numpy as np
+from PIL import Image
+import tifffile
+from concurrent.futures import ProcessPoolExecutor, as_completed
+import sys
+from typing import Tuple, List
+import logging
+def setup_logger():
+    """ロガーの設定"""
+    logging.basicConfig(
+        level=logging.INFO,
+        format='%(asctime)s - %(levelname)s - %(message)s'
+    )
+    return logging.getLogger(__name__)
+def load_and_normalize(path: Path) -> np.ndarray:
+    """
+    TIFF画像を読み込み、0～255の8bit画像に正規化する
+    """
+    img = tifffile.imread(str(path))
+    img_norm = (img - np.min(img)) / (np.max(img) - np.min(img)) * 255
+    return img_norm.astype(np.uint8)
+def process_image_pair(paths: Tuple[Path, Path, Path]) -> None:
+    """
+    ch1とch6の画像ペアを処理してマージ画像を保存
+    """
+    ch1_path, ch6_path, save_path = paths
+    try:
+        # 画像の読み込みと正規化
+        arr1 = load_and_normalize(ch1_path)
+        arr6 = load_and_normalize(ch6_path)
+        # 空のチャンネル作成
+        empty_channel = np.zeros_like(arr1)
+        # RGB形式で統合 (R: ch1, G: ch6, B: empty)
+        merged_array = np.stack((arr1, arr6, empty_channel), axis=-1)
+        merged_image = Image.fromarray(merged_array)
+        # 保存
+        save_path.parent.mkdir(parents=True, exist_ok=True)
+        merged_image.save(save_path)
+        return True
+    except Exception as e:
+        logging.error(f"Error processing {ch1_path}: {e}")
+        return False
+def find_image_pairs(input_dir: Path) -> List[Tuple[Path, Path]]:
+    """
+    入力ディレクトリからch1とch6のペアを見つける
+    """
+    pairs = []
+    for ch1_file in input_dir.glob("ch1_*.tif"):
+        idx = ch1_file.stem.split('_')[1]
+        ch6_file = ch1_file.parent / f"ch6_{idx}.tif"
+        if ch6_file.exists():
+            pairs.append((ch1_file, ch6_file))
+    return pairs
+def main():
+    parser = argparse.ArgumentParser(description='細胞画像の前処理スクリプト')
+    parser.add_argument('input_dir', type=str, help='入力ディレクトリのパス')
+    parser.add_argument('output_dir', type=str, help='出力ディレクトリのパス')
+    parser.add_argument('--workers', type=int, default=4, help='並列処理のワーカー数')
+    parser.add_argument('--recursive', action='store_true', help='サブディレクトリも処理する')
+    args = parser.parse_args()
+    logger = setup_logger()
+    input_path = Path(args.input_dir)
+    output_path = Path(args.output_dir)
+    if not input_path.exists():
+        logger.error(f"入力ディレクトリが存在しません: {args.input_dir}")
+        sys.exit(1)
+    # 処理対象のディレクトリを特定
+    target_dirs = list(input_path.glob("**/*")) if args.recursive else [input_path]
+    target_dirs = [d for d in target_dirs if d.is_dir()]
+    total_processed = 0
+    total_failed = 0
+    with ProcessPoolExecutor(max_workers=args.workers) as executor:
+        for current_dir in target_dirs:
+            # 画像ペアの検索
+            pairs = find_image_pairs(current_dir)
+            if not pairs:
+                continue
+            # 相対パスを保持した出力先の設定
+            rel_path = current_dir.relative_to(input_path)
+            current_output_dir = output_path / rel_path
+            # 処理タスクのリスト作成
+            tasks = [
+                (ch1_file, ch6_file, current_output_dir / f"merged_{ch1_file.stem.split('_')[1]}.tif")
+                for ch1_file, ch6_file in pairs
+            ]
+            # 並列処理の実行
+            futures = [executor.submit(process_image_pair, task) for task in tasks]
+            successful = sum(1 for future in futures if future.result())
+            failed = len(futures) - successful
+            total_processed += successful
+            total_failed += failed
+            logger.info(f"{current_dir.name}: {successful}/{len(pairs)} files processed successfully")
+    logger.info(f"\n処理完了:")
+    logger.info(f"成功: {total_processed}")
+    logger.info(f"失敗: {total_failed}")
+if __name__ == "__main__":
+    main()