Spaces:

mihailik
/

localm

Configuration error

App Files Files Community

mihailik commited on Aug 21, 2025

Commit

dcf5774

1 Parent(s): cfef6b5

Core refactoring of list-chat-models, and a plan for the follow-up.

Browse files

Files changed (4) hide show

plans/2025-08-01-model-filtering/4-worker-refactoring-async-iterator.md +59 -0
src/app/worker-connection.js +0 -2
src/worker/boot-worker.js +19 -182
src/worker/list-chat-models.js +223 -0

plans/2025-08-01-model-filtering/4-worker-refactoring-async-iterator.md CHANGED Viewed

	@@ -103,3 +103,62 @@ Design rule: yield final `{ status: 'done' }` explicitly; do not rely on generat
103
104	This plan keeps the code minimal, uses language-native iterator cancellation, isolates side-effects to the generator, and leaves the worker wrapper trivial. Proceed to implement when ready.
105

 This plan keeps the code minimal, uses language-native iterator cancellation, isolates side-effects to the generator, and leaves the worker wrapper trivial. Proceed to implement when ready.
+## Follow-up improvements (post-cleanup)
+After the extraction and wrapper are in place, the following three low-risk improvements should be applied as follow-ups. Each item below is written as an exact, small TODO with acceptance criteria and a short testing checklist.
+1) Cleanup `boot-worker.js` (remove duplicated helpers)
+- Goal: remove leftover duplicate helper implementations from `boot-worker.js` so there is a single source-of-truth for `fetchConfigForModel` and `classifyModel` (the new action module). This reduces maintenance surface and avoids accidental drift.
+- Changes (exact):
+	- Delete the `fetchConfigForModel` and `classifyModel` helper functions from `src/worker/boot-worker.js` (the ones that are no longer used after extraction).
+	- Ensure the only import of those helpers is via `import { listChatModelsIterator } from './actions/list-chat-models.js';` — do not re-export them from the action module unless tests need them. If other parts of `boot-worker.js` still need config/classify helpers, move shared helper functions to a small `src/worker/lib/model-utils.js` and import from both places.
+	- Run a quick grep for remaining references to unused symbols (`fetchConfigForModel`, `classifyModel`) and remove any stale code.
+- Tests / acceptance:
+	- Build/lint passes with no unused-variable warnings for the removed symbols.
+	- The `listChatModels` worker flow behaves identically after cleanup (run smoke test: progress + final response + cancellation).
+- Risk: trivial; mitigate by running the smoke test immediately after change.
+2) Adaptive concurrency on repeated 429 responses
+- Goal: make the config-fetch promise pool responsive to huggingface rate-limits by reducing parallelism when many 429s occur and gradually increasing when the rate improves.
+- Design (practical, minimal):
+	- Instrument a small rate-limit counter in the action module: keep a sliding window counter of `configFetch.429` events (for example, track count and timestamp of recent 429s in an array limited to the last 30s).
+	- Add two simple thresholds: if 429_count_in_window >= 10 then reduce `effectiveConcurrency = Math.max(1, Math.floor(effectiveConcurrency / 2))` and mark `rateLimitedUntil = Date.now() + backoffWindowMs` (backoffWindowMs e.g. 30s). If no 429s recorded for backoffWindowMs, gradually restore `effectiveConcurrency = Math.min(initialConcurrency, effectiveConcurrency + 1)` every backoffWindowMs/2.
+	- Implementation notes: do not recreate worker goroutines; rather implement a token/semaphore scheme where each worker must acquire a token before starting a config fetch. Maintain `tokenCount = effectiveConcurrency`. When thresholds change, adjust tokenCount (release or reduce available tokens). This allows workers to respect new concurrency without tearing down the pool.
+	- Track metrics: increment `counters.configFetch429++` and `counters.configFetch200++` for telemetry (simple numeric fields inside the action module). Expose the counters in the final meta if `params.debug` is true.
+- Changes (exact):
+	- Add `const counters = { configFetch429:0, configFetch200:0, configFetchError:0 }` to the top of `list-chat-models.js` action module.
+	- Replace the fixed `workerCount = Math.min(concurrency, survivors.length || 1)` with a token-based semaphore that uses `effectiveConcurrency` which can be modified at runtime by the rate-limit detector.
+	- On each fetch response with 429, update the sliding window and possibly reduce `effectiveConcurrency` and tokenCount; when tokenCount changes, wake waiting workers so they re-evaluate.
+- Tests / acceptance:
+	- Simulate many 429s using a mocked fetch; confirm effectiveConcurrency is reduced and fewer concurrent requests are outstanding (can use counters or a small instrumentation hook).
+	- After simulated quiet period, confirm effectiveConcurrency slowly ramps back up.
+- Risk: medium subtlety but implementable as a simple token count and counters; keep logic intentionally conservative and well-tested.
+3) Batching of progress messages to avoid UI flood
+- Goal: reduce main-thread overhead when the generator emits many small per-model progress objects in a short time window by coalescing them into small batches sent by the wrapper.
+- Design (minimal):
+	- Implement a tiny buffer in `boot-worker.js`'s wrapper around `self.postMessage` for progress messages: collect deltas into an array `batchBuffer` and flush every `BATCH_MS` (recommended 50ms) or when buffer reaches `BATCH_MAX` (recommended 50 items).
+	- Message shape: send `{ id, type: 'progress', batch: true, items: [ ...deltas ] }` for batched messages. Keep single-item messages as `{ id, type: 'progress', ...delta }` for backward compatibility if prefer — but prefer consistent batching (UI can accept either if updated). Document the change for the UI consumer and update the UI progress handler to accept `msg.batch ? msg.items : [msg]`.
+	- Implementation exact steps:
+		1. In `boot-worker.js` create `let batchBuffer = []; let batchTimer = null; const BATCH_MS = 50; const BATCH_MAX = 50;`
+		2. Replace immediate `self.postMessage(Object.assign({ id, type: 'progress' }, delta))` with `enqueueProgress(delta)` where `enqueueProgress` pushes delta into `batchBuffer`, starts the timer if not running, and flushes when length >= BATCH_MAX.
+		3. `flush` sends one message `self.postMessage({ id, type: 'progress', batch: true, items: batchBuffer.splice(0) })` and clears timer.
+		4. On `done` or `error` ensure `flush()` is called synchronously before sending `response`/`error` so UI receives final state.
+- Tests / acceptance:
+	- Smoke test: when scanning many models, verify UI receives fewer, larger progress messages (monitor frequency in devtools) and UI still updates correctly.
+	- Ensure cancellation still works and that flush is invoked on iterator termination.
+- Risk: low; main caution is updating UI to accept `batch` messages (small consumer change). If you prefer zero-change on UI, the wrapper can detect if `pendingEntry.onProgress` exists and deliver either single items or preferrably call `onProgress({ batch:true, items })`.
+Order of follow-ups and rollout suggestion
+- Apply `boot-worker.js` cleanup first (trivial, low-risk). Run smoke tests.
+- Add batching next (low-risk). Update the UI progress handler to accept batch messages; smoke test with a large run to ensure smooth UI.
+- Add adaptive concurrency last (medium risk). Implement with conservative thresholds and unit tests that simulate 429 storms.
+Verification checklist after all follow-ups
+- Lint/build: PASS
+- Worker: streams progress in batches, final `response` arrives unchanged
+- Cancellation: `cancelListChatModels` aborts iterator quickly and no more progress messages are sent after final cancellation response (or only those emitted during cleanup if intentionally emitted)
+- Under repeated 429s: concurrency reduced and total in-flight requests observed to drop; counters exposed in `meta` when `params.debug` is enabled
+When you want, I can implement these three follow-ups in that order; tell me to proceed and I'll make small, focused edits and run quick smoke tests after each one.

src/app/worker-connection.js CHANGED Viewed

@@ -99,8 +99,6 @@ export function workerConnection() {
         pending.delete(id);
         return reject(err);
       }
-      // also send via send to allow worker to reply with final response via same flow
-      send({ type: 'listChatModels', params }).then(resolve).catch(reject);
     });
   }

         pending.delete(id);
         return reject(err);
       }
     });
   }

src/worker/boot-worker.js CHANGED Viewed

@@ -1,6 +1,7 @@
 // @ts-check
 import { ModelCache } from './model-cache';
 export function bootWorker() {
   const modelCache = new ModelCache();
@@ -71,199 +72,35 @@ export function bootWorker() {
     }
   }
-  // Implementation of the listChatModels worker action
   async function handleListChatModels({ id, params = {} }) {
-  const opts = Object.assign({ maxCandidates: 250, concurrency: 12, hfToken: null, timeoutMs: 10000, maxListing: 5000 }, params || {});
-  const { maxCandidates, concurrency, hfToken, timeoutMs, maxListing } = opts;
-  const MAX_TOTAL_TO_FETCH = Math.min(maxListing, 5000);
-    const PAGE_SIZE = 100;
-    const RETRIES = 3;
-    const BACKOFF_BASE_MS = 200;
-    let cancelled = false;
-    const abortControllers = new Set();
-    activeTasks.set(id, { abort: () => { cancelled = true; for (const c of abortControllers) c.abort(); } });
     try {
-      // 1) Fetch listing pages up to MAX_TOTAL_TO_FETCH
-      let listing = [];
-      let offset = 0;
-      while (listing.length < MAX_TOTAL_TO_FETCH && !cancelled) {
-        const url = `https://huggingface.co/api/models?full=true&limit=${PAGE_SIZE}&offset=${offset}`;
-        let ok = false;
-        for (let attempt = 0; attempt <= RETRIES && !ok && !cancelled; attempt++) {
-          try {
-            const controller = new AbortController();
-            abortControllers.add(controller);
-            const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
-            abortControllers.delete(controller);
-            if (resp.status === 429) {
-              const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
-              await new Promise(r => setTimeout(r, backoff));
-              continue;
-            }
-            if (!resp.ok) throw new Error(`listing-fetch-failed:${resp.status}`);
-            const page = await resp.json();
-            if (!Array.isArray(page) || page.length === 0) { ok = true; break; }
-            listing.push(...page);
-            offset += PAGE_SIZE;
-            ok = true;
-          } catch (err) {
-            if (attempt === RETRIES) throw err;
-            await new Promise(r => setTimeout(r, BACKOFF_BASE_MS * Math.pow(2, attempt)));
-          }
         }
-        if (!ok) break;
-      }
-      // send listing_done progress
-      self.postMessage({ id, type: 'progress', status: 'listing_done', data: { totalFound: listing.length } });
-      if (cancelled) {
-        activeTasks.delete(id);
-        return self.postMessage({ id, type: 'response', result: { cancelled: true } });
       }
-      // 2) Pre-filter
-      const denyPipeline = new Set(['feature-extraction', 'fill-mask', 'sentence-similarity', 'masked-lm']);
-      const survivors = [];
-      for (const m of listing) {
-        if (survivors.length >= maxCandidates) break;
-        const pipeline = m.pipeline_tag;
-        if (pipeline && denyPipeline.has(pipeline)) continue;
-        if (typeof m.modelId === 'string' && m.modelId.includes('sentence-transformers')) continue;
-        // siblings check: allow if tokenizer or vocab present
-        const siblings = m.siblings || [];
-        const hasTokenizer = siblings.some(s => /tokenizer|vocab|merges|sentencepiece/i.test(s));
-        if (!hasTokenizer) continue;
-        survivors.push(m);
       }
-      self.postMessage({ id, type: 'progress', status: 'prefiltered', data: { survivors: survivors.length } });
-      // 3) Config fetch & classification with concurrency
-      const results = [];
-      const errors = [];
-      let idx = 0;
-      const pool = new Array(Math.min(concurrency, survivors.length)).fill(0).map(async () => {
-        while (!cancelled && idx < survivors.length) {
-          const i = idx++;
-          const model = survivors[i];
-          const modelId = model.modelId || model.id || model.model || model.modelId;
-          try {
-            self.postMessage({ id, type: 'progress', modelId, status: 'config_fetching' });
-            const fetchResult = await fetchConfigForModel(modelId, hfToken, timeoutMs, RETRIES, BACKOFF_BASE_MS);
-            const entry = classifyModel(model, fetchResult);
-            results.push(entry);
-            self.postMessage({ id, type: 'progress', modelId, status: 'classified', data: entry });
-          } catch (err) {
-            errors.push({ modelId, message: String(err) });
-            self.postMessage({ id, type: 'progress', modelId, status: 'error', data: { message: String(err) } });
-          }
-        }
-      });
-      await Promise.all(pool);
-      if (cancelled) {
-        activeTasks.delete(id);
-        return self.postMessage({ id, type: 'response', result: { cancelled: true } });
-      }
-      // finalize
-      const models = results.map(r => ({ id: r.id, model_type: r.model_type, architectures: r.architectures, classification: r.classification, confidence: r.confidence, fetchStatus: r.fetchStatus }));
-      const meta = { fetched: listing.length, filtered: survivors.length, errors };
-      activeTasks.delete(id);
-      return self.postMessage({ id, type: 'response', result: { models, meta } });
     } catch (err) {
       activeTasks.delete(id);
-      return self.postMessage({ id, type: 'error', error: String(err) });
     }
   }
   // helper: fetchConfigForModel
-  async function fetchConfigForModel(modelId, hfToken, timeoutMs, RETRIES, BACKOFF_BASE_MS) {
-    const urls = [
-      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config.json`,
-      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config/config.json`,
-      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/adapter_config.json`
-    ];
-    for (const url of urls) {
-      for (let attempt = 0; attempt <= RETRIES; attempt++) {
-        const controller = new AbortController();
-        const timeout = setTimeout(() => controller.abort(), timeoutMs);
-        try {
-          const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
-          clearTimeout(timeout);
-          if (resp.status === 200) {
-            const json = await resp.json();
-            return { status: 'ok', model_type: json.model_type || null, architectures: json.architectures || null };
-          }
-          if (resp.status === 401 || resp.status === 403) return { status: 'auth', code: resp.status };
-          if (resp.status === 404) break; // try next fallback
-          if (resp.status === 429) {
-            const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
-            await new Promise(r => setTimeout(r, backoff));
-            continue;
-          }
-          // other non-200 -> treat as error
-          return { status: 'error', code: resp.status, message: `fetch failed ${resp.status}` };
-        } catch (err) {
-          clearTimeout(timeout);
-          if (attempt === RETRIES) return { status: 'error', message: String(err) };
-          const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
-          await new Promise(r => setTimeout(r, backoff));
-        }
-      }
-    }
-    return { status: 'no-config' };
-  }
-  // helper: classifyModel
-  function classifyModel(rawModel, fetchResult) {
-    const id = rawModel.modelId || rawModel.id || rawModel.modelId || rawModel.modelId || rawModel.id;
-    const entry = { id, model_type: null, architectures: null, classification: 'unknown', confidence: 'low', fetchStatus: 'error' };
-    if (!fetchResult) return entry;
-    if (fetchResult.status === 'auth') {
-      entry.classification = 'auth-protected'; entry.confidence = 'high'; entry.fetchStatus = '401';
-      return entry;
-    }
-  if (fetchResult.status === 'ok') {
-  entry.model_type = fetchResult.model_type || null;
-  entry.architectures = Array.isArray(fetchResult.architectures) ? fetchResult.architectures : null;
-      entry.fetchStatus = 'ok';
-      const deny = ['bert','roberta','distilbert','electra','albert','deberta','mobilebert','convbert','sentence-transformers'];
-      const allow = ['gpt2','gptj','gpt_neox','llama','qwen','mistral','phi','gpt','t5','bart','pegasus'];
-      if (entry.model_type && deny.includes(entry.model_type)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
-      if (entry.model_type && allow.includes(entry.model_type)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
-      const arch = entry.architectures;
-      if (arch && Array.isArray(arch)) {
-        /** @type {any[]} */
-        const archArr = /** @type {any[]} */ (arch);
-        for (let i = 0; i < archArr.length; i++) {
-          const a = archArr[i];
-          const al = String(a).toLowerCase();
-          if (allow.includes(al)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
-          if (deny.includes(al)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
-        }
-      }
-      entry.classification = 'unknown'; entry.confidence = 'low'; return entry;
-    }
-    if (fetchResult.status === 'no-config') {
-      // fallback heuristics
-      const pipeline = rawModel.pipeline_tag || '';
-      if (pipeline && pipeline.startsWith('text-generation')) { entry.classification = 'gen'; entry.confidence = 'medium'; }
-      else entry.classification = 'unknown'; entry.confidence = 'low';
-      entry.fetchStatus = '404';
-      return entry;
-    }
-    if (fetchResult.status === 'error') {
-      entry.classification = 'unknown'; entry.confidence = 'low'; entry.fetchStatus = 'error';
-      entry.fetchError = { message: fetchResult.message, code: fetchResult.code };
-      return entry;
-    }
-    return entry;
-  }
 }
 // helper to extract generated text from various runtime outputs

 // @ts-check
 import { ModelCache } from './model-cache';
+import { listChatModelsIterator } from './list-chat-models.js';
 export function bootWorker() {
   const modelCache = new ModelCache();
     }
   }
+  // Implementation of the listChatModels worker action using the async-iterator action.
   async function handleListChatModels({ id, params = {} }) {
+    const iterator = listChatModelsIterator(params);
+    let sawDone = false;
+    activeTasks.set(id, { abort: () => { try { iterator.return(); } catch (e) {} } });
     try {
+      for await (const delta of iterator) {
+        try { self.postMessage(Object.assign({ id, type: 'progress' }, delta)); } catch (e) {}
+        if (delta && delta.status === 'done') {
+          sawDone = true;
+          try { self.postMessage({ id, type: 'response', result: { models: delta.models, meta: delta.meta } }); } catch (e) {}
+          break;
         }
       }
+      if (!sawDone) {
+        // iterator exited early (likely cancelled)
+        try { self.postMessage({ id, type: 'response', result: { cancelled: true } }); } catch (e) {}
       }
     } catch (err) {
+      try { self.postMessage({ id, type: 'error', error: String(err), code: err.code || null }); } catch (e) {}
+    } finally {
       activeTasks.delete(id);
     }
   }
   // helper: fetchConfigForModel
+  // Note: fetchConfigForModel and classifyModel were moved to the
+  // `src/worker/list-chat-models.js` async-iterator action. Keep this file
+  // minimal and delegate to the iterator for listing/classification logic.
 }
 // helper to extract generated text from various runtime outputs

src/worker/list-chat-models.js ADDED Viewed

	@@ -0,0 +1,223 @@

+// Minimal async-iterator implementation of the listChatModels pipeline.
+// Yields plain JSON-serializable progress objects. Uses per-request AbortControllers
+// and a finally block so iterator.return() causes cleanup.
+export async function* listChatModelsIterator(params = {}) {
+  const opts = Object.assign({ maxCandidates: 250, concurrency: 12, hfToken: null, timeoutMs: 10000, maxListing: 5000 }, params || {});
+  const { maxCandidates, concurrency, hfToken, timeoutMs, maxListing } = opts;
+  const MAX_TOTAL_TO_FETCH = Math.min(maxListing, 5000);
+  const PAGE_SIZE = 100;
+  const RETRIES = 3;
+  const BACKOFF_BASE_MS = 200;
+  const inFlight = new Set();
+  async function fetchWithController(url, init = {}) {
+    const c = new AbortController();
+    inFlight.add(c);
+    try {
+      const merged = Object.assign({}, init, { signal: c.signal });
+      const resp = await fetch(url, merged);
+      return resp;
+    } finally {
+      inFlight.delete(c);
+    }
+  }
+  // helper: fetchConfigForModel (tries multiple paths, per-request timeouts & retries)
+  async function fetchConfigForModel(modelId) {
+    const urls = [
+      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config.json`,
+      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config/config.json`,
+      `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/adapter_config.json`
+    ];
+    for (const url of urls) {
+      for (let attempt = 0; attempt <= RETRIES; attempt++) {
+        // per-request timeout via race
+        const controller = new AbortController();
+        inFlight.add(controller);
+        const timeout = setTimeout(() => controller.abort(), timeoutMs);
+        try {
+          const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
+          clearTimeout(timeout);
+          inFlight.delete(controller);
+          if (resp.status === 200) {
+            const json = await resp.json();
+            return { status: 'ok', model_type: json.model_type || null, architectures: json.architectures || null };
+          }
+          if (resp.status === 401 || resp.status === 403) return { status: 'auth', code: resp.status };
+          if (resp.status === 404) break; // try next fallback
+          if (resp.status === 429) {
+            const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
+            await new Promise(r => setTimeout(r, backoff));
+            continue;
+          }
+          return { status: 'error', code: resp.status, message: `fetch failed ${resp.status}` };
+        } catch (err) {
+          clearTimeout(timeout);
+          inFlight.delete(controller);
+          if (attempt === RETRIES) return { status: 'error', message: String(err) };
+          const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
+          await new Promise(r => setTimeout(r, backoff));
+        }
+      }
+    }
+    return { status: 'no-config' };
+  }
+  function classifyModel(rawModel, fetchResult) {
+    const id = rawModel.modelId || rawModel.id || rawModel.model || rawModel.modelId;
+    const entry = { id, model_type: null, architectures: null, classification: 'unknown', confidence: 'low', fetchStatus: 'error' };
+    if (!fetchResult) return entry;
+    if (fetchResult.status === 'auth') {
+      entry.classification = 'auth-protected'; entry.confidence = 'high'; entry.fetchStatus = String(fetchResult.code || 401);
+      return entry;
+    }
+    if (fetchResult.status === 'ok') {
+      entry.model_type = fetchResult.model_type || null;
+      entry.architectures = Array.isArray(fetchResult.architectures) ? fetchResult.architectures : null;
+      entry.fetchStatus = 'ok';
+      const deny = ['bert','roberta','distilbert','electra','albert','deberta','mobilebert','convbert','sentence-transformers'];
+      const allow = ['gpt2','gptj','gpt_neox','llama','qwen','mistral','phi','gpt','t5','bart','pegasus'];
+      if (entry.model_type && deny.includes(entry.model_type)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
+      if (entry.model_type && allow.includes(entry.model_type)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
+      const arch = entry.architectures;
+      if (arch && Array.isArray(arch)) {
+        for (let i = 0; i < arch.length; i++) {
+          const a = String(arch[i]).toLowerCase();
+          if (allow.includes(a)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
+          if (deny.includes(a)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
+        }
+      }
+      entry.classification = 'unknown'; entry.confidence = 'low'; return entry;
+    }
+    if (fetchResult.status === 'no-config') {
+      const pipeline = rawModel.pipeline_tag || '';
+      if (pipeline && pipeline.startsWith('text-generation')) { entry.classification = 'gen'; entry.confidence = 'medium'; }
+      else entry.classification = 'unknown'; entry.confidence = 'low';
+      entry.fetchStatus = '404';
+      return entry;
+    }
+    if (fetchResult.status === 'error') {
+      entry.classification = 'unknown'; entry.confidence = 'low'; entry.fetchStatus = 'error';
+      entry.fetchError = { message: fetchResult.message, code: fetchResult.code };
+      return entry;
+    }
+    return entry;
+  }
+  // Main pipeline
+  let listing = [];
+  try {
+    // 1) listing
+    let offset = 0;
+    while (listing.length < MAX_TOTAL_TO_FETCH) {
+      const url = `https://huggingface.co/api/models?full=true&limit=${PAGE_SIZE}&offset=${offset}`;
+      let ok = false;
+      for (let attempt = 0; attempt <= RETRIES && !ok; attempt++) {
+        try {
+          const resp = await fetch(url, { headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
+          if (resp.status === 429) {
+            const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
+            await new Promise(r => setTimeout(r, backoff));
+            continue;
+          }
+          if (!resp.ok) throw Object.assign(new Error(`listing-fetch-failed:${resp.status}`), { code: 'listing_fetch_failed', status: resp.status });
+          const page = await resp.json();
+          if (!Array.isArray(page) || page.length === 0) { ok = true; break; }
+          listing.push(...page);
+          offset += PAGE_SIZE;
+          ok = true;
+        } catch (err) {
+          if (attempt === RETRIES) throw err;
+          await new Promise(r => setTimeout(r, BACKOFF_BASE_MS * Math.pow(2, attempt)));
+        }
+      }
+      if (!ok) break;
+    }
+    // emit listing_done
+    yield { status: 'listing_done', totalFound: listing.length };
+    // 2) prefilter
+    const denyPipeline = new Set(['feature-extraction', 'fill-mask', 'sentence-similarity', 'masked-lm']);
+    const survivors = [];
+    for (const m of listing) {
+      if (survivors.length >= maxCandidates) break;
+      const pipeline = m.pipeline_tag;
+      if (pipeline && denyPipeline.has(pipeline)) continue;
+      if (typeof m.modelId === 'string' && m.modelId.includes('sentence-transformers')) continue;
+      const siblings = m.siblings || [];
+      const hasTokenizer = siblings.some(s => /tokenizer|vocab|merges|sentencepiece/i.test(s));
+      if (!hasTokenizer) continue;
+      survivors.push(m);
+    }
+    yield { status: 'prefiltered', survivors: survivors.length };
+    // 3) concurrent config fetch & classify using an event queue so workers can emit
+    // progress while the generator yields them.
+    const results = [];
+    const errors = [];
+    let idx = 0;
+    let processed = 0;
+    const events = [];
+    let resolveNext = null;
+    function emit(ev) {
+      events.push(ev);
+      if (resolveNext) {
+        resolveNext();
+        resolveNext = null;
+      }
+    }
+    async function nextEvent() {
+      while (events.length === 0) {
+        await new Promise(r => { resolveNext = r; });
+      }
+      return events.shift();
+    }
+    const workerCount = Math.min(concurrency, survivors.length || 1);
+    const pool = new Array(workerCount).fill(0).map(async () => {
+      while (true) {
+        const i = idx++;
+        if (i >= survivors.length) break;
+        const model = survivors[i];
+        const modelId = model.modelId || model.id || model.model || model.modelId;
+        try {
+          emit({ modelId, status: 'config_fetching' });
+          const fetchResult = await fetchConfigForModel(modelId);
+          const entry = classifyModel(model, fetchResult);
+          results.push(entry);
+          emit({ modelId, status: 'classified', data: entry });
+        } catch (err) {
+          errors.push({ modelId, message: String(err) });
+          emit({ modelId, status: 'error', data: { message: String(err) } });
+        } finally {
+          processed++;
+        }
+      }
+    });
+    // consume events as workers produce them
+    while (processed < survivors.length) {
+      const ev = await nextEvent();
+      yield ev;
+    }
+    // make sure any remaining events are yielded
+    while (events.length > 0) {
+      yield events.shift();
+    }
+    await Promise.all(pool);
+    // final
+    const models = results.map(r => ({ id: r.id, model_type: r.model_type, architectures: r.architectures, classification: r.classification, confidence: r.confidence, fetchStatus: r.fetchStatus }));
+    const meta = { fetched: listing.length, filtered: survivors.length, errors };
+    yield { status: 'done', models, meta };
+  } finally {
+    // abort any in-flight fetches if iteration stopped early
+    for (const c of Array.from(inFlight)) try { c.abort(); } catch (e) {}
+  }
+}