hasari-api / scripts /DATA_README.md
erdoganpeker's picture
v0.3.0 β€” multimodal vehicle damage MVP
e327f0d

Veri Hazirlik Rehberi β€” Arac Hasar Tespiti MVP

Bu rehber, services/ml/ icin gereken tum veri setlerini ve pretrained model agirliklarini nasil indirip dogrulayacaginizi anlatir.

0. Hizli Baslangic

# 1) Bagimliliklar (sadece scriptler icin)
pip install -r scripts/requirements.txt

# 2) Plan/disk raporu (hicbir sey indirmez)
python scripts/download_data.py --all --dry-run
python scripts/download_pretrained.py --all --dry-run

# 3) Pretrained weights + CarDD HF mirror + CarParts-Seg (paralel)
python scripts/download_pretrained.py --yolo11
python scripts/download_data.py --cardd-hf
python scripts/download_data.py --carparts-ultra

# 4) Form dolduktan ve CarDD ZIP elinize ulastiginda:
python scripts/download_data.py --cardd-manual "C:\Users\Erdogan\Downloads\CarDD_release.zip"

# 5) Dogrulama
python scripts/verify_data.py

1. Veri Setleri

1.1. CarDD (Car Damage Detection)

  • Kaynak: https://cardd-ustc.github.io
  • Yayin: Wang et al., 2023. ~4000 goruntu, 6 sinif segmentation: dent, scratch, crack, glass_shatter, lamp_broken, tire_flat.
  • Lisans: Academic, non-commercial. Ticari kullanim icin yazarlarla yazili izin gerekir. MVP demo/POC tamam, satis oncesi yeniden lisansla.
  • Erisim yolu A (resmi form, ~1-2 gun): Form gonderildikten sonra ZIP linki e-postaya gelir.
    python scripts/download_data.py --cardd-manual "C:\path\to\CarDD_release.zip"
    
    Bu komut services/ml/data/CarDD_release/ altina cikartir ve services/ml/prepare_data.py icin dogru yolu yazdirir.
  • Erisim yolu B (HF mirror, form bekleme yok):
    python scripts/download_data.py --cardd-hf
    
    Hedef: services/ml/data/cardd_hf/. Lisans CarDD ile ayni β€” sadece data erisimi farkli.
  • Disk: ~6.5 GB ham + ~7 GB YOLO dump (toplam ~14 GB icin yer ayirin).

1.2. Ultralytics CarParts-Seg (parca segmentasyonu)

  • Kaynak: https://docs.ultralytics.com/datasets/segment/carparts-seg/
  • Boyut: ~1.2 GB. 21 sinif (kapilar, tamponlar, farlar, ...).
  • Lisans: Roboflow community license, ticari kullanima izinli.
  • Indirme: Ultralytics otomatik halleder; biz tetikliyoruz:
    python scripts/download_data.py --carparts-ultra
    python services/ml/prepare_parts_data.py --use_ultralytics ^
        --output_dir services/ml/data/parts_yolo
    

1.3. Roboflow severity (minor/moderate/severe)

  • Kaynak: https://universe.roboflow.com (workspace/project kullanici secimi)
  • Erisim: ROBOFLOW_API_KEY gerekli (https://app.roboflow.com/settings/api). Repo kokune .env ekleyin:
    ROBOFLOW_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    
  • Indirme:
    python scripts/download_data.py --roboflow-severity ^
        --rf-workspace car-damage-detection-cardd ^
        --rf-project car-damage-severity ^
        --rf-version 1
    
  • NOT: Workspace/project slug'lari Roboflow Universe'de arayip DOGRULAYIN. Default degerler placeholder'dir.

2. Pretrained Agirliklar

# YOLO11 ailesi (n, s, m) β€” onerilen baslangic
python scripts/download_pretrained.py --yolo11

# YOLO26 ailesi (Ultralytics surumune bagli; yoksa atlanir)
python scripts/download_pretrained.py --yolo26

# CarDD-finetuned ckpt (HF'te varsa)
python scripts/download_pretrained.py --cardd-finetuned

Hedef: services/ml/weights/*.pt + .sha256 sidecar.

3. Sirayla Yapilacaklar

  1. pip install -r scripts/requirements.txt
  2. python scripts/download_pretrained.py --yolo11 (2-3 dk)
  3. python scripts/download_data.py --cardd-hf (10-60 dk, baglantiya bagli)
  4. python scripts/download_data.py --carparts-ultra (5-10 dk)
  5. Paralel olarak CarDD resmi forma basvur (https://cardd-ustc.github.io).
  6. CarDD HF mirror'i ile prepare_data.py calistirip baseline egit.
  7. Resmi ZIP gelince --cardd-manual ile guncelle, modeli yeniden egit.
  8. python scripts/verify_data.py ile her adim sonrasi dogrula.

4. Donanim Onerileri β€” RTX 5050 (8 GB VRAM, Blackwell)

  • PyTorch CUDA 12.8 wheel:
    pip install --index-url https://download.pytorch.org/whl/cu128 ^
        torch torchvision torchaudio
    
  • VRAM butcesi:
    • yolo11n-seg imgsz=640 batch=16 β†’ ~3.5 GB
    • yolo11s-seg imgsz=640 batch=12 β†’ ~5.5 GB
    • yolo11m-seg imgsz=640 batch=6 β†’ ~7.5 GB (mixed precision)
  • Disk: SSD'de en az 30 GB bos alan (ham + YOLO dump + checkpoint'ler).
  • RAM: 16 GB yeterli; 32 GB ile data loader prefetch rahatlar.
  • Worker: workers=4 Windows'ta genelde stabil. Hatada workers=0.

5. Sorun Giderme

Sorun Cozum
huggingface_hub.errors.HfHubHTTPError 401 huggingface-cli login (CarDD HF mirror public, normalde gerekmez)
ROBOFLOW_API_KEY tanimli degil .env ekle veya set ROBOFLOW_API_KEY=...
Ultralytics carparts-seg indirilmiyor ultralytics paketini guncelle: pip install -U ultralytics
CarDD ZIP cok yavas iniyor HF mirror'a (1.1.B) dus, sonra resmi setle guncellersin
torch.cuda.is_available() == False cu128 wheel'i kur, NVIDIA suruculerini guncelle
Disk dolu Once --dry-run ile plan al; gereksiz cardd_yolo/ kopyalarini sil

6. Lisans Ozeti

Set Lisans MVP icin Ticari icin
CarDD Academic non-commercial OK (POC) Yazardan yazili izin
CarParts-Seg Roboflow community OK OK
Roboflow severity Project'e gore degisir Kontrol et Kontrol et
YOLO11/26 weights AGPL-3.0 OK Ticari icin Ultralytics Enterprise

Yasal not: Ulusal mevzuat (KVKK) ve Ultralytics AGPL etkilesimi, satis asamasinda hukuksal incelemeden gecirilmelidir.

7. Dosya Yapisinin Beklenen Hali

services/ml/
  data/
    cardd_hf/                 # HF mirror snapshot
    CarDD_release/            # Form sonrasi resmi ZIP ictigi
      CarDD_COCO/
        annotations/
          instances_train2017.json
          instances_val2017.json
          instances_test2017.json
        train2017/  val2017/  test2017/
    cardd_yolo/               # prepare_data.py ciktisi
      images/{train,val,test}/
      labels/{train,val,test}/
    parts_yolo/               # prepare_parts_data.py ciktisi
    severity_roboflow/        # Roboflow ZIP ictigi
  weights/
    yolo11n-seg.pt + .sha256
    yolo11s-seg.pt + .sha256
    ...
scripts/.logs/                # Tum indirme/dogrulama loglari