hasari-api / docs /TRACKING_POLLING.md
erdoganpeker's picture
v0.3.0 β€” multimodal vehicle damage MVP
e327f0d

TRACKING_POLLING β€” Frontend Hata Izleme + Backend Log Korelasyonu

Scope: Tasarim dokumani. Bu dalgada KOD DEGISTIRILMEDI / SDK KURULMADI. Bu dosya, polling ve multi-foto upload akislarinda kullanici-yonlu hata gozlemlenebilirligini acmak icin onerilen sema, kod ornekleri ve event taxonomisidir.

Sorun ifadesi. Polling timeout / 5xx / network-blip senaryolarinda kullanici "bir seyler bozuk" yazisi gorur, fakat:

  • Frontend, backend log satirina baglanmaz (request_id frontend tarafinda yok)
  • Hangi inspection_id'nin kac kez denendigi, hangi sebeple dustugu client-side'da hicbir yerde toplanmaz (Sentry yok, GA4 yok)
  • Backend access_log zaten request_id, user_id, duration_ms yayinliyor (services/backend/middleware.py AccessLogMiddleware), ama frontend console.error cikinca o request_id'yi ekrana ya da clipboard'a basmiyor β€” destek surecinde "ekran goruntusu yollayin" yerine "su id'yi yollayin" demek imkansiz.

Bu doc ucunu birden kapatmak icin yapilmasi gerekenleri tarif eder.


0. Mevcut Durumun Hizli Envanteri

Tas Durum
Backend X-Request-ID middleware VAR β€” services/backend/middleware.py:121 RequestIDMiddleware, inbound header'i kabul ediyor; yoksa uuid4().hex mintliyor.
CORS expose_headers X-Request-ID icerigi VAR β€” middleware.py:271 expose_headers=["X-Request-ID", ...], dolayisiyla browser JS tarafindan response.headers.get("x-request-id") okunabilir.
Backend access_log JSON satiri VAR β€” event=http.access icinde request_id, user_id, status, duration_ms, path.
Frontend axios response interceptor request_id okuma YOK β€” apps/web/lib/api.ts:91 interceptor sadece 401 refresh akisini handle ediyor, header okunmuyor.
Frontend polling instrumentation YOK β€” apps/web/lib/use-inspection-polling.ts attempts sayiyor ama disari yayinlamiyor.
Multi-foto upload instrumentation YOK β€” inspections.create onUploadProgress callback'i sadece UI progress bar icin kullaniliyor.
GA4 / Sentry / Posthog SDK YOK (KVKK consent banner'i da yok).
inspection_unassigned_ratio backend cikti VAR β€” services/backend/output_formatter.py:411 summary.unassigned_damage_count ve genel damages count'u uzerinden orana cevirilebilir. Frontend henuz okumuyor.

1. X-Request-ID Propagation (KOD ORNEGI β€” uygulanmadi, oneri)

1.1 Frontend: axios response interceptor

apps/web/lib/api.ts icindeki buildClient() icinde response interceptor'a asagidaki ek konabilir. Yalnizca okuma + log. Side-effect yok.

// apps/web/lib/api.ts β€” buildClient() icinde,
// response interceptor'in BASARILI dalinda (su anki `(r) => r` yerine):

instance.interceptors.response.use(
  (r) => {
    const rid = r.headers?.['x-request-id'];
    if (rid) {
      // Successful response: log only at debug verbosity so the console
      // is not polluted in production builds. The tag prefix lets support
      // engineers grep the user-shared screenshots.
      if (process.env.NODE_ENV !== 'production') {
        console.debug(`[api] ${r.config.method?.toUpperCase()} ${r.config.url} rid=${rid}`);
      }
      // Stash on the response so call sites (polling, upload) can grab it
      // for event payloads.
      (r as { requestId?: string }).requestId = rid;
    }
    return r;
  },
  async (error: AxiosError) => {
    // Mevcut 401 / refresh akisi degismez. Ek olarak:
    const rid =
      (error.response?.headers as Record<string, string> | undefined)?.['x-request-id'];
    if (rid) {
      // Hata durumu HER ZAMAN logla β€” destek bilet eslesmesi icin kritik.
      console.error(
        `[api] FAIL ${error.config?.method?.toUpperCase()} ${error.config?.url} ` +
        `status=${error.response?.status ?? 'network'} rid=${rid}`,
      );
      // ApiErrorInfo'ya da dahil et ki UI clipboard'a kopyalayabilsin.
      (error as { requestId?: string }).requestId = rid;
    }
    // ... mevcut refresh / unauthorized logigi aynen kalir.
  },
);

1.2 classifyApiError cikisina requestId ekleme

ApiErrorInfo arayuzu su an kind | status | detail | fieldErrors iceriyor (apps/web/lib/api.ts:513). Asagidaki tek satir genisleme yeterli:

export interface ApiErrorInfo {
  // ... mevcut alanlar
  /** Backend X-Request-ID, support diagnostics icin. */
  requestId?: string;
}

// classifyApiError govdesi sonunda:
const rid = (err as { requestId?: string })?.requestId
  ?? (axios.isAxiosError(err)
        ? (err.response?.headers as Record<string,string> | undefined)?.['x-request-id']
        : undefined);
return { ...base, requestId: rid };

1.3 UI'da gosterim ornegi (destek mesajina kopyalama)

Polling failure ya da upload failure error card'i:

{info.requestId && (
  <p className="text-xs text-zinc-500 mt-2">
    Destek referansi: <code>{info.requestId}</code>
    <button onClick={() => navigator.clipboard.writeText(info.requestId!)}>
      kopyala
    </button>
  </p>
)}

Bu, Sentry kurulduktan sonra Sentry.captureException(err, { tags: { request_id } }) ile dogrudan backend access log'una linklenecek. Sentry yokken bile kullanici "rid=ab12cd34..." dedi mi backend journalctl | grep ab12cd34 ile o exact istegi 1 saniyede bulur.


2. Event Semasi

2.1 Genel kurallar

  • event_name: snake_case, fiil bitis (_started, _completed, _failed).
  • properties: dust JSON-serializable (string / number / bool / null).
  • PII yasak: e-mail, JWT, dosya icerigi, plaka. inspection_id UUID oldugu icin guvenli; user_id GA4 user-id slot'una gider, properties'e degil.
  • Zaman damgalari: *_ms (number, epoch yerine duration tercih).
  • Hata sebebi reason enum'u: timeout | network | unauthorized | forbidden | not_found | server | rate_limited | aborted | unknown. (classifyApiError.kind ile birebir eslesmeli β€” onceki rule mevcut.)

2.2 Polling event'leri (use-inspection-polling.ts icinde emit edilecek)

event_name trigger properties
polling_started tick ilk kez planlandiginda inspection_id (str), mode (async|sync), interval_ms (number), max_duration_ms (number)
polling_attempt her tick cagrisinin BASLAMASINDA inspection_id, attempt_n (1..N), elapsed_ms, current_interval_ms
polling_failed tick icinde fatal/exhausted/timedOut/paused setlendiginde inspection_id, reason (timeout|network|unauthorized|forbidden|not_found|server|rate_limited|exhausted|paused), attempt_n, consecutive_errors, elapsed_ms, request_id (varsa)
polling_completed status === 'completed' doner inspection_id, attempt_n, duration_ms (elapsed since polling_started), final_status (completed|failed), request_id
polling_retried kullanici retry() cagirir inspection_id, previous_attempts, was_timed_out (bool), was_paused (bool)

2.3 Multi-foto upload event'leri (inspections.create cagrisinin etrafinda)

event_name trigger properties
upload_started inspections.create() cagrildiginda file_count (number), total_size_mb (number, 2 ondalik), mode (sync|async)
upload_progress onUploadProgress cagrildiginda β€” kisilmis (10/25/50/75/90/100 esikleri, throttle) percent (0..100), loaded_mb, total_mb
upload_completed inspections.create() 2xx donerse file_count, total_size_mb, mode, duration_ms, inspection_id, request_id
upload_failed inspections.create() reject ederse file_count, total_size_mb, mode, reason (classifyApiError.kind), partial_count (multi-foto'da basarisiz olan dosya sayisi; tek POST oldugu icin su an her zaman file_count veya 0 β€” gelecekte chunked upload'a hazirlik), duration_ms, request_id

2.4 Detection quality event'i

event_name trigger properties
inspection_quality_observed polling_completed 'in hemen ardindan, response gelinince inspection_id, total_damage_count (number), unassigned_damage_count (number β€” backend zaten summary.unassigned_damage_count doruyor: services/backend/output_formatter.py:411), unassigned_ratio (float, 0..1), total_part_count, processing_time_ms (backend'den), image_count

unassigned_ratio = unassigned_damage_count / total_damage_count. Ornek brief'te verilen 10/16 = 0.62 anomalisi bu metrigin GA4'e dustugunde dashboard'da threshold alarm setlemeyi mumkun kilar (model kalibrasyonu icin sinyal).

2.5 Hata baglamlarinda standart envelope

Tum *_failed event'leri mutlaka su alt seti tasimali (sirf bu tanimi genisletmek "bu hata neden oldu" sorusuna 30 saniyede cevap olur):

{
  "event_name": "polling_failed",
  "properties": {
    "inspection_id": "5f6e...",
    "reason": "server",
    "http_status": 502,
    "request_id": "ab12cd34ef56...",   // backend access_log ile JOIN anahtari
    "attempt_n": 4,
    "elapsed_ms": 9421,
    "consecutive_errors": 3,
    "user_agent_brand": "Chrome",       // navigator.userAgentData.brands[0]
    "connection_type": "4g",            // navigator.connection?.effectiveType
    "session_id": "...",                // GA4 / posthog auto
    "build_sha": "abc1234"              // NEXT_PUBLIC_BUILD_SHA, varsa
  }
}

2.6 KVKK consent gate'i

Hicbir analytics event'i, kullanici cookie_consent.analytics === true isaretlemedikce gonderilmemelidir. Onerilen mimari:

  1. apps/web/lib/analytics.ts (henuz yok) β€” track(event, props) fonksiyonu localStorage['cookie_consent'] JSON'unu okur; analytics: true degilse buffer'a yazar (max 200 event), consent verildiginde flush eder, asla network'e basmaz.
  2. Banner kabulune kadar GA4 / Posthog SDK init edilmez (sadece stub).
  3. Backend request_id console'a yazilmaya devam eder β€” bu PII degil, opex.

Bu dalgada SDK ya da banner kurulmadi; bu sema banner geldigi gun gun calismaya hazir.


3. Top 5 Must-Have Event

Once-yapilacaklar siralamasi (etki / efor orani yuksek olandan dusuge):

# Event Niye 1. siniftan
1 polling_failed Tum kullanici sikayetinin kaynagi. reason + request_id ile destek biletlerinin %80'i tek satir log lookup'a duser. SDK gelmeden once console.error'a basmasi bile fayda saglar.
2 upload_failed Multi-foto upload'da kullanici hangi adimda dustugunu bilmiyor (network mi, 413 mu, 422 mi). reason=tooLarge ile reason=network ayrimi UI mesajinin tonunu degistirir.
3 inspection_quality_observed unassigned_ratio model kalibrasyonu icin business metrigi. Brief'teki 10/16 = 0.62 ornegi tam olarak burayi gosteriyor; bu olcum olmadan "model nezaman bozuldu" sorusu cevapsiz.
4 polling_completed Basari yolu olcumu β€” p50/p95 duration_ms SLO panosunun ana metrigi. Hata orani sadece basari hizina karsi anlam kazanir.
5 upload_started Funnel'in tepe noktasi. started olmadan failed rate yorumlanamaz (denominator). file_count distribution UI tasariminda (1-foto vs 20-foto akislari) gerceklerle dogrulanir.

4. Backend ile Korelasyon Akisi (operatif kosesi)

Frontend polling_failed event'i emit ettiginde request_id=ab12cd34 ile. Backend tarafinda:

# /var/log/.../backend.access.log ya da journald
grep '"request_id":"ab12cd34"' backend.access.log | jq .

donus:

{
  "event": "http.access",
  "method": "GET",
  "path": "/api/v1/inspect/5f6e...",
  "status": 502,
  "duration_ms": 12.4,
  "user_id": "user-uuid",
  "ip": "...",
  "request_id": "ab12cd34..."
}

Bu JOIN su an mumkun degil cunku frontend request_id'yi okumuyor. Section 1.1 patch'i uygulanir uygulanmaz, kod degisikligi olmadan bile destek susrecindeki MTTR (mean-time-to-resolve) dakikalardan saniyelere iner.


5. Bu dalgada YAPILMADI (hatirlatma)

  • apps/web/lib/api.ts interceptor patch'i kodlanmadi (sadece ornek).
  • apps/web/lib/use-inspection-polling.ts track(...) cagrilari eklenmedi.
  • apps/web/lib/analytics.ts consent-gated wrapper olusturulmadi.
  • GA4 / Posthog / Sentry SDK kurulumu yapilmadi.
  • KVKK cookie consent banner UI eklenmedi.

Bu dosya, sonraki dalgalarin git grep TRACKING_POLLING.md ile implementasyon spec'ini tek noktada bulmasi icindir.