mihailik commited on
Commit
dcf5774
·
1 Parent(s): cfef6b5

Core refactoring of list-chat-models, and a plan for the follow-up.

Browse files
plans/2025-08-01-model-filtering/4-worker-refactoring-async-iterator.md CHANGED
@@ -103,3 +103,62 @@ Design rule: yield final `{ status: 'done' }` explicitly; do not rely on generat
103
 
104
  This plan keeps the code minimal, uses language-native iterator cancellation, isolates side-effects to the generator, and leaves the worker wrapper trivial. Proceed to implement when ready.
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  This plan keeps the code minimal, uses language-native iterator cancellation, isolates side-effects to the generator, and leaves the worker wrapper trivial. Proceed to implement when ready.
105
 
106
+ ## Follow-up improvements (post-cleanup)
107
+
108
+ After the extraction and wrapper are in place, the following three low-risk improvements should be applied as follow-ups. Each item below is written as an exact, small TODO with acceptance criteria and a short testing checklist.
109
+
110
+ 1) Cleanup `boot-worker.js` (remove duplicated helpers)
111
+ - Goal: remove leftover duplicate helper implementations from `boot-worker.js` so there is a single source-of-truth for `fetchConfigForModel` and `classifyModel` (the new action module). This reduces maintenance surface and avoids accidental drift.
112
+ - Changes (exact):
113
+ - Delete the `fetchConfigForModel` and `classifyModel` helper functions from `src/worker/boot-worker.js` (the ones that are no longer used after extraction).
114
+ - Ensure the only import of those helpers is via `import { listChatModelsIterator } from './actions/list-chat-models.js';` — do not re-export them from the action module unless tests need them. If other parts of `boot-worker.js` still need config/classify helpers, move shared helper functions to a small `src/worker/lib/model-utils.js` and import from both places.
115
+ - Run a quick grep for remaining references to unused symbols (`fetchConfigForModel`, `classifyModel`) and remove any stale code.
116
+ - Tests / acceptance:
117
+ - Build/lint passes with no unused-variable warnings for the removed symbols.
118
+ - The `listChatModels` worker flow behaves identically after cleanup (run smoke test: progress + final response + cancellation).
119
+ - Risk: trivial; mitigate by running the smoke test immediately after change.
120
+
121
+ 2) Adaptive concurrency on repeated 429 responses
122
+ - Goal: make the config-fetch promise pool responsive to huggingface rate-limits by reducing parallelism when many 429s occur and gradually increasing when the rate improves.
123
+ - Design (practical, minimal):
124
+ - Instrument a small rate-limit counter in the action module: keep a sliding window counter of `configFetch.429` events (for example, track count and timestamp of recent 429s in an array limited to the last 30s).
125
+ - Add two simple thresholds: if 429_count_in_window >= 10 then reduce `effectiveConcurrency = Math.max(1, Math.floor(effectiveConcurrency / 2))` and mark `rateLimitedUntil = Date.now() + backoffWindowMs` (backoffWindowMs e.g. 30s). If no 429s recorded for backoffWindowMs, gradually restore `effectiveConcurrency = Math.min(initialConcurrency, effectiveConcurrency + 1)` every backoffWindowMs/2.
126
+ - Implementation notes: do not recreate worker goroutines; rather implement a token/semaphore scheme where each worker must acquire a token before starting a config fetch. Maintain `tokenCount = effectiveConcurrency`. When thresholds change, adjust tokenCount (release or reduce available tokens). This allows workers to respect new concurrency without tearing down the pool.
127
+ - Track metrics: increment `counters.configFetch429++` and `counters.configFetch200++` for telemetry (simple numeric fields inside the action module). Expose the counters in the final meta if `params.debug` is true.
128
+ - Changes (exact):
129
+ - Add `const counters = { configFetch429:0, configFetch200:0, configFetchError:0 }` to the top of `list-chat-models.js` action module.
130
+ - Replace the fixed `workerCount = Math.min(concurrency, survivors.length || 1)` with a token-based semaphore that uses `effectiveConcurrency` which can be modified at runtime by the rate-limit detector.
131
+ - On each fetch response with 429, update the sliding window and possibly reduce `effectiveConcurrency` and tokenCount; when tokenCount changes, wake waiting workers so they re-evaluate.
132
+ - Tests / acceptance:
133
+ - Simulate many 429s using a mocked fetch; confirm effectiveConcurrency is reduced and fewer concurrent requests are outstanding (can use counters or a small instrumentation hook).
134
+ - After simulated quiet period, confirm effectiveConcurrency slowly ramps back up.
135
+ - Risk: medium subtlety but implementable as a simple token count and counters; keep logic intentionally conservative and well-tested.
136
+
137
+ 3) Batching of progress messages to avoid UI flood
138
+ - Goal: reduce main-thread overhead when the generator emits many small per-model progress objects in a short time window by coalescing them into small batches sent by the wrapper.
139
+ - Design (minimal):
140
+ - Implement a tiny buffer in `boot-worker.js`'s wrapper around `self.postMessage` for progress messages: collect deltas into an array `batchBuffer` and flush every `BATCH_MS` (recommended 50ms) or when buffer reaches `BATCH_MAX` (recommended 50 items).
141
+ - Message shape: send `{ id, type: 'progress', batch: true, items: [ ...deltas ] }` for batched messages. Keep single-item messages as `{ id, type: 'progress', ...delta }` for backward compatibility if prefer — but prefer consistent batching (UI can accept either if updated). Document the change for the UI consumer and update the UI progress handler to accept `msg.batch ? msg.items : [msg]`.
142
+ - Implementation exact steps:
143
+ 1. In `boot-worker.js` create `let batchBuffer = []; let batchTimer = null; const BATCH_MS = 50; const BATCH_MAX = 50;`
144
+ 2. Replace immediate `self.postMessage(Object.assign({ id, type: 'progress' }, delta))` with `enqueueProgress(delta)` where `enqueueProgress` pushes delta into `batchBuffer`, starts the timer if not running, and flushes when length >= BATCH_MAX.
145
+ 3. `flush` sends one message `self.postMessage({ id, type: 'progress', batch: true, items: batchBuffer.splice(0) })` and clears timer.
146
+ 4. On `done` or `error` ensure `flush()` is called synchronously before sending `response`/`error` so UI receives final state.
147
+ - Tests / acceptance:
148
+ - Smoke test: when scanning many models, verify UI receives fewer, larger progress messages (monitor frequency in devtools) and UI still updates correctly.
149
+ - Ensure cancellation still works and that flush is invoked on iterator termination.
150
+ - Risk: low; main caution is updating UI to accept `batch` messages (small consumer change). If you prefer zero-change on UI, the wrapper can detect if `pendingEntry.onProgress` exists and deliver either single items or preferrably call `onProgress({ batch:true, items })`.
151
+
152
+ Order of follow-ups and rollout suggestion
153
+ - Apply `boot-worker.js` cleanup first (trivial, low-risk). Run smoke tests.
154
+ - Add batching next (low-risk). Update the UI progress handler to accept batch messages; smoke test with a large run to ensure smooth UI.
155
+ - Add adaptive concurrency last (medium risk). Implement with conservative thresholds and unit tests that simulate 429 storms.
156
+
157
+ Verification checklist after all follow-ups
158
+ - Lint/build: PASS
159
+ - Worker: streams progress in batches, final `response` arrives unchanged
160
+ - Cancellation: `cancelListChatModels` aborts iterator quickly and no more progress messages are sent after final cancellation response (or only those emitted during cleanup if intentionally emitted)
161
+ - Under repeated 429s: concurrency reduced and total in-flight requests observed to drop; counters exposed in `meta` when `params.debug` is enabled
162
+
163
+ When you want, I can implement these three follow-ups in that order; tell me to proceed and I'll make small, focused edits and run quick smoke tests after each one.
164
+
src/app/worker-connection.js CHANGED
@@ -99,8 +99,6 @@ export function workerConnection() {
99
  pending.delete(id);
100
  return reject(err);
101
  }
102
- // also send via send to allow worker to reply with final response via same flow
103
- send({ type: 'listChatModels', params }).then(resolve).catch(reject);
104
  });
105
  }
106
 
 
99
  pending.delete(id);
100
  return reject(err);
101
  }
 
 
102
  });
103
  }
104
 
src/worker/boot-worker.js CHANGED
@@ -1,6 +1,7 @@
1
  // @ts-check
2
 
3
  import { ModelCache } from './model-cache';
 
4
 
5
  export function bootWorker() {
6
  const modelCache = new ModelCache();
@@ -71,199 +72,35 @@ export function bootWorker() {
71
  }
72
  }
73
 
74
- // Implementation of the listChatModels worker action
75
  async function handleListChatModels({ id, params = {} }) {
76
- const opts = Object.assign({ maxCandidates: 250, concurrency: 12, hfToken: null, timeoutMs: 10000, maxListing: 5000 }, params || {});
77
- const { maxCandidates, concurrency, hfToken, timeoutMs, maxListing } = opts;
78
- const MAX_TOTAL_TO_FETCH = Math.min(maxListing, 5000);
79
- const PAGE_SIZE = 100;
80
- const RETRIES = 3;
81
- const BACKOFF_BASE_MS = 200;
82
-
83
- let cancelled = false;
84
- const abortControllers = new Set();
85
- activeTasks.set(id, { abort: () => { cancelled = true; for (const c of abortControllers) c.abort(); } });
86
-
87
  try {
88
- // 1) Fetch listing pages up to MAX_TOTAL_TO_FETCH
89
- let listing = [];
90
- let offset = 0;
91
- while (listing.length < MAX_TOTAL_TO_FETCH && !cancelled) {
92
- const url = `https://huggingface.co/api/models?full=true&limit=${PAGE_SIZE}&offset=${offset}`;
93
- let ok = false;
94
- for (let attempt = 0; attempt <= RETRIES && !ok && !cancelled; attempt++) {
95
- try {
96
- const controller = new AbortController();
97
- abortControllers.add(controller);
98
- const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
99
- abortControllers.delete(controller);
100
- if (resp.status === 429) {
101
- const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
102
- await new Promise(r => setTimeout(r, backoff));
103
- continue;
104
- }
105
- if (!resp.ok) throw new Error(`listing-fetch-failed:${resp.status}`);
106
- const page = await resp.json();
107
- if (!Array.isArray(page) || page.length === 0) { ok = true; break; }
108
- listing.push(...page);
109
- offset += PAGE_SIZE;
110
- ok = true;
111
- } catch (err) {
112
- if (attempt === RETRIES) throw err;
113
- await new Promise(r => setTimeout(r, BACKOFF_BASE_MS * Math.pow(2, attempt)));
114
- }
115
  }
116
- if (!ok) break;
117
- }
118
-
119
- // send listing_done progress
120
- self.postMessage({ id, type: 'progress', status: 'listing_done', data: { totalFound: listing.length } });
121
-
122
- if (cancelled) {
123
- activeTasks.delete(id);
124
- return self.postMessage({ id, type: 'response', result: { cancelled: true } });
125
  }
126
-
127
- // 2) Pre-filter
128
- const denyPipeline = new Set(['feature-extraction', 'fill-mask', 'sentence-similarity', 'masked-lm']);
129
- const survivors = [];
130
- for (const m of listing) {
131
- if (survivors.length >= maxCandidates) break;
132
- const pipeline = m.pipeline_tag;
133
- if (pipeline && denyPipeline.has(pipeline)) continue;
134
- if (typeof m.modelId === 'string' && m.modelId.includes('sentence-transformers')) continue;
135
- // siblings check: allow if tokenizer or vocab present
136
- const siblings = m.siblings || [];
137
- const hasTokenizer = siblings.some(s => /tokenizer|vocab|merges|sentencepiece/i.test(s));
138
- if (!hasTokenizer) continue;
139
- survivors.push(m);
140
  }
141
-
142
- self.postMessage({ id, type: 'progress', status: 'prefiltered', data: { survivors: survivors.length } });
143
-
144
- // 3) Config fetch & classification with concurrency
145
- const results = [];
146
- const errors = [];
147
- let idx = 0;
148
- const pool = new Array(Math.min(concurrency, survivors.length)).fill(0).map(async () => {
149
- while (!cancelled && idx < survivors.length) {
150
- const i = idx++;
151
- const model = survivors[i];
152
- const modelId = model.modelId || model.id || model.model || model.modelId;
153
- try {
154
- self.postMessage({ id, type: 'progress', modelId, status: 'config_fetching' });
155
- const fetchResult = await fetchConfigForModel(modelId, hfToken, timeoutMs, RETRIES, BACKOFF_BASE_MS);
156
- const entry = classifyModel(model, fetchResult);
157
- results.push(entry);
158
- self.postMessage({ id, type: 'progress', modelId, status: 'classified', data: entry });
159
- } catch (err) {
160
- errors.push({ modelId, message: String(err) });
161
- self.postMessage({ id, type: 'progress', modelId, status: 'error', data: { message: String(err) } });
162
- }
163
- }
164
- });
165
-
166
- await Promise.all(pool);
167
-
168
- if (cancelled) {
169
- activeTasks.delete(id);
170
- return self.postMessage({ id, type: 'response', result: { cancelled: true } });
171
- }
172
-
173
- // finalize
174
- const models = results.map(r => ({ id: r.id, model_type: r.model_type, architectures: r.architectures, classification: r.classification, confidence: r.confidence, fetchStatus: r.fetchStatus }));
175
- const meta = { fetched: listing.length, filtered: survivors.length, errors };
176
- activeTasks.delete(id);
177
- return self.postMessage({ id, type: 'response', result: { models, meta } });
178
  } catch (err) {
 
 
179
  activeTasks.delete(id);
180
- return self.postMessage({ id, type: 'error', error: String(err) });
181
  }
182
  }
183
 
184
  // helper: fetchConfigForModel
185
- async function fetchConfigForModel(modelId, hfToken, timeoutMs, RETRIES, BACKOFF_BASE_MS) {
186
- const urls = [
187
- `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config.json`,
188
- `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config/config.json`,
189
- `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/adapter_config.json`
190
- ];
191
- for (const url of urls) {
192
- for (let attempt = 0; attempt <= RETRIES; attempt++) {
193
- const controller = new AbortController();
194
- const timeout = setTimeout(() => controller.abort(), timeoutMs);
195
- try {
196
- const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
197
- clearTimeout(timeout);
198
- if (resp.status === 200) {
199
- const json = await resp.json();
200
- return { status: 'ok', model_type: json.model_type || null, architectures: json.architectures || null };
201
- }
202
- if (resp.status === 401 || resp.status === 403) return { status: 'auth', code: resp.status };
203
- if (resp.status === 404) break; // try next fallback
204
- if (resp.status === 429) {
205
- const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
206
- await new Promise(r => setTimeout(r, backoff));
207
- continue;
208
- }
209
- // other non-200 -> treat as error
210
- return { status: 'error', code: resp.status, message: `fetch failed ${resp.status}` };
211
- } catch (err) {
212
- clearTimeout(timeout);
213
- if (attempt === RETRIES) return { status: 'error', message: String(err) };
214
- const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
215
- await new Promise(r => setTimeout(r, backoff));
216
- }
217
- }
218
- }
219
- return { status: 'no-config' };
220
- }
221
-
222
- // helper: classifyModel
223
- function classifyModel(rawModel, fetchResult) {
224
- const id = rawModel.modelId || rawModel.id || rawModel.modelId || rawModel.modelId || rawModel.id;
225
- const entry = { id, model_type: null, architectures: null, classification: 'unknown', confidence: 'low', fetchStatus: 'error' };
226
- if (!fetchResult) return entry;
227
- if (fetchResult.status === 'auth') {
228
- entry.classification = 'auth-protected'; entry.confidence = 'high'; entry.fetchStatus = '401';
229
- return entry;
230
- }
231
- if (fetchResult.status === 'ok') {
232
- entry.model_type = fetchResult.model_type || null;
233
- entry.architectures = Array.isArray(fetchResult.architectures) ? fetchResult.architectures : null;
234
- entry.fetchStatus = 'ok';
235
- const deny = ['bert','roberta','distilbert','electra','albert','deberta','mobilebert','convbert','sentence-transformers'];
236
- const allow = ['gpt2','gptj','gpt_neox','llama','qwen','mistral','phi','gpt','t5','bart','pegasus'];
237
- if (entry.model_type && deny.includes(entry.model_type)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
238
- if (entry.model_type && allow.includes(entry.model_type)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
239
- const arch = entry.architectures;
240
- if (arch && Array.isArray(arch)) {
241
- /** @type {any[]} */
242
- const archArr = /** @type {any[]} */ (arch);
243
- for (let i = 0; i < archArr.length; i++) {
244
- const a = archArr[i];
245
- const al = String(a).toLowerCase();
246
- if (allow.includes(al)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
247
- if (deny.includes(al)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
248
- }
249
- }
250
- entry.classification = 'unknown'; entry.confidence = 'low'; return entry;
251
- }
252
- if (fetchResult.status === 'no-config') {
253
- // fallback heuristics
254
- const pipeline = rawModel.pipeline_tag || '';
255
- if (pipeline && pipeline.startsWith('text-generation')) { entry.classification = 'gen'; entry.confidence = 'medium'; }
256
- else entry.classification = 'unknown'; entry.confidence = 'low';
257
- entry.fetchStatus = '404';
258
- return entry;
259
- }
260
- if (fetchResult.status === 'error') {
261
- entry.classification = 'unknown'; entry.confidence = 'low'; entry.fetchStatus = 'error';
262
- entry.fetchError = { message: fetchResult.message, code: fetchResult.code };
263
- return entry;
264
- }
265
- return entry;
266
- }
267
  }
268
 
269
  // helper to extract generated text from various runtime outputs
 
1
  // @ts-check
2
 
3
  import { ModelCache } from './model-cache';
4
+ import { listChatModelsIterator } from './list-chat-models.js';
5
 
6
  export function bootWorker() {
7
  const modelCache = new ModelCache();
 
72
  }
73
  }
74
 
75
+ // Implementation of the listChatModels worker action using the async-iterator action.
76
  async function handleListChatModels({ id, params = {} }) {
77
+ const iterator = listChatModelsIterator(params);
78
+ let sawDone = false;
79
+ activeTasks.set(id, { abort: () => { try { iterator.return(); } catch (e) {} } });
 
 
 
 
 
 
 
 
80
  try {
81
+ for await (const delta of iterator) {
82
+ try { self.postMessage(Object.assign({ id, type: 'progress' }, delta)); } catch (e) {}
83
+ if (delta && delta.status === 'done') {
84
+ sawDone = true;
85
+ try { self.postMessage({ id, type: 'response', result: { models: delta.models, meta: delta.meta } }); } catch (e) {}
86
+ break;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  }
 
 
 
 
 
 
 
 
 
88
  }
89
+ if (!sawDone) {
90
+ // iterator exited early (likely cancelled)
91
+ try { self.postMessage({ id, type: 'response', result: { cancelled: true } }); } catch (e) {}
 
 
 
 
 
 
 
 
 
 
 
92
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  } catch (err) {
94
+ try { self.postMessage({ id, type: 'error', error: String(err), code: err.code || null }); } catch (e) {}
95
+ } finally {
96
  activeTasks.delete(id);
 
97
  }
98
  }
99
 
100
  // helper: fetchConfigForModel
101
+ // Note: fetchConfigForModel and classifyModel were moved to the
102
+ // `src/worker/list-chat-models.js` async-iterator action. Keep this file
103
+ // minimal and delegate to the iterator for listing/classification logic.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  }
105
 
106
  // helper to extract generated text from various runtime outputs
src/worker/list-chat-models.js ADDED
@@ -0,0 +1,223 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ // Minimal async-iterator implementation of the listChatModels pipeline.
2
+ // Yields plain JSON-serializable progress objects. Uses per-request AbortControllers
3
+ // and a finally block so iterator.return() causes cleanup.
4
+
5
+ export async function* listChatModelsIterator(params = {}) {
6
+ const opts = Object.assign({ maxCandidates: 250, concurrency: 12, hfToken: null, timeoutMs: 10000, maxListing: 5000 }, params || {});
7
+ const { maxCandidates, concurrency, hfToken, timeoutMs, maxListing } = opts;
8
+ const MAX_TOTAL_TO_FETCH = Math.min(maxListing, 5000);
9
+ const PAGE_SIZE = 100;
10
+ const RETRIES = 3;
11
+ const BACKOFF_BASE_MS = 200;
12
+
13
+ const inFlight = new Set();
14
+
15
+ async function fetchWithController(url, init = {}) {
16
+ const c = new AbortController();
17
+ inFlight.add(c);
18
+ try {
19
+ const merged = Object.assign({}, init, { signal: c.signal });
20
+ const resp = await fetch(url, merged);
21
+ return resp;
22
+ } finally {
23
+ inFlight.delete(c);
24
+ }
25
+ }
26
+
27
+ // helper: fetchConfigForModel (tries multiple paths, per-request timeouts & retries)
28
+ async function fetchConfigForModel(modelId) {
29
+ const urls = [
30
+ `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config.json`,
31
+ `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/config/config.json`,
32
+ `https://huggingface.co/${encodeURIComponent(modelId)}/resolve/main/adapter_config.json`
33
+ ];
34
+ for (const url of urls) {
35
+ for (let attempt = 0; attempt <= RETRIES; attempt++) {
36
+ // per-request timeout via race
37
+ const controller = new AbortController();
38
+ inFlight.add(controller);
39
+ const timeout = setTimeout(() => controller.abort(), timeoutMs);
40
+ try {
41
+ const resp = await fetch(url, { signal: controller.signal, headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
42
+ clearTimeout(timeout);
43
+ inFlight.delete(controller);
44
+ if (resp.status === 200) {
45
+ const json = await resp.json();
46
+ return { status: 'ok', model_type: json.model_type || null, architectures: json.architectures || null };
47
+ }
48
+ if (resp.status === 401 || resp.status === 403) return { status: 'auth', code: resp.status };
49
+ if (resp.status === 404) break; // try next fallback
50
+ if (resp.status === 429) {
51
+ const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
52
+ await new Promise(r => setTimeout(r, backoff));
53
+ continue;
54
+ }
55
+ return { status: 'error', code: resp.status, message: `fetch failed ${resp.status}` };
56
+ } catch (err) {
57
+ clearTimeout(timeout);
58
+ inFlight.delete(controller);
59
+ if (attempt === RETRIES) return { status: 'error', message: String(err) };
60
+ const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
61
+ await new Promise(r => setTimeout(r, backoff));
62
+ }
63
+ }
64
+ }
65
+ return { status: 'no-config' };
66
+ }
67
+
68
+ function classifyModel(rawModel, fetchResult) {
69
+ const id = rawModel.modelId || rawModel.id || rawModel.model || rawModel.modelId;
70
+ const entry = { id, model_type: null, architectures: null, classification: 'unknown', confidence: 'low', fetchStatus: 'error' };
71
+ if (!fetchResult) return entry;
72
+ if (fetchResult.status === 'auth') {
73
+ entry.classification = 'auth-protected'; entry.confidence = 'high'; entry.fetchStatus = String(fetchResult.code || 401);
74
+ return entry;
75
+ }
76
+ if (fetchResult.status === 'ok') {
77
+ entry.model_type = fetchResult.model_type || null;
78
+ entry.architectures = Array.isArray(fetchResult.architectures) ? fetchResult.architectures : null;
79
+ entry.fetchStatus = 'ok';
80
+ const deny = ['bert','roberta','distilbert','electra','albert','deberta','mobilebert','convbert','sentence-transformers'];
81
+ const allow = ['gpt2','gptj','gpt_neox','llama','qwen','mistral','phi','gpt','t5','bart','pegasus'];
82
+ if (entry.model_type && deny.includes(entry.model_type)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
83
+ if (entry.model_type && allow.includes(entry.model_type)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
84
+ const arch = entry.architectures;
85
+ if (arch && Array.isArray(arch)) {
86
+ for (let i = 0; i < arch.length; i++) {
87
+ const a = String(arch[i]).toLowerCase();
88
+ if (allow.includes(a)) { entry.classification = 'gen'; entry.confidence = 'high'; return entry; }
89
+ if (deny.includes(a)) { entry.classification = 'encoder'; entry.confidence = 'high'; return entry; }
90
+ }
91
+ }
92
+ entry.classification = 'unknown'; entry.confidence = 'low'; return entry;
93
+ }
94
+ if (fetchResult.status === 'no-config') {
95
+ const pipeline = rawModel.pipeline_tag || '';
96
+ if (pipeline && pipeline.startsWith('text-generation')) { entry.classification = 'gen'; entry.confidence = 'medium'; }
97
+ else entry.classification = 'unknown'; entry.confidence = 'low';
98
+ entry.fetchStatus = '404';
99
+ return entry;
100
+ }
101
+ if (fetchResult.status === 'error') {
102
+ entry.classification = 'unknown'; entry.confidence = 'low'; entry.fetchStatus = 'error';
103
+ entry.fetchError = { message: fetchResult.message, code: fetchResult.code };
104
+ return entry;
105
+ }
106
+ return entry;
107
+ }
108
+
109
+ // Main pipeline
110
+ let listing = [];
111
+ try {
112
+ // 1) listing
113
+ let offset = 0;
114
+ while (listing.length < MAX_TOTAL_TO_FETCH) {
115
+ const url = `https://huggingface.co/api/models?full=true&limit=${PAGE_SIZE}&offset=${offset}`;
116
+ let ok = false;
117
+ for (let attempt = 0; attempt <= RETRIES && !ok; attempt++) {
118
+ try {
119
+ const resp = await fetch(url, { headers: hfToken ? { Authorization: `Bearer ${hfToken}` } : {} });
120
+ if (resp.status === 429) {
121
+ const backoff = BACKOFF_BASE_MS * Math.pow(2, attempt);
122
+ await new Promise(r => setTimeout(r, backoff));
123
+ continue;
124
+ }
125
+ if (!resp.ok) throw Object.assign(new Error(`listing-fetch-failed:${resp.status}`), { code: 'listing_fetch_failed', status: resp.status });
126
+ const page = await resp.json();
127
+ if (!Array.isArray(page) || page.length === 0) { ok = true; break; }
128
+ listing.push(...page);
129
+ offset += PAGE_SIZE;
130
+ ok = true;
131
+ } catch (err) {
132
+ if (attempt === RETRIES) throw err;
133
+ await new Promise(r => setTimeout(r, BACKOFF_BASE_MS * Math.pow(2, attempt)));
134
+ }
135
+ }
136
+ if (!ok) break;
137
+ }
138
+
139
+ // emit listing_done
140
+ yield { status: 'listing_done', totalFound: listing.length };
141
+
142
+ // 2) prefilter
143
+ const denyPipeline = new Set(['feature-extraction', 'fill-mask', 'sentence-similarity', 'masked-lm']);
144
+ const survivors = [];
145
+ for (const m of listing) {
146
+ if (survivors.length >= maxCandidates) break;
147
+ const pipeline = m.pipeline_tag;
148
+ if (pipeline && denyPipeline.has(pipeline)) continue;
149
+ if (typeof m.modelId === 'string' && m.modelId.includes('sentence-transformers')) continue;
150
+ const siblings = m.siblings || [];
151
+ const hasTokenizer = siblings.some(s => /tokenizer|vocab|merges|sentencepiece/i.test(s));
152
+ if (!hasTokenizer) continue;
153
+ survivors.push(m);
154
+ }
155
+
156
+ yield { status: 'prefiltered', survivors: survivors.length };
157
+
158
+ // 3) concurrent config fetch & classify using an event queue so workers can emit
159
+ // progress while the generator yields them.
160
+ const results = [];
161
+ const errors = [];
162
+ let idx = 0;
163
+ let processed = 0;
164
+ const events = [];
165
+ let resolveNext = null;
166
+ function emit(ev) {
167
+ events.push(ev);
168
+ if (resolveNext) {
169
+ resolveNext();
170
+ resolveNext = null;
171
+ }
172
+ }
173
+ async function nextEvent() {
174
+ while (events.length === 0) {
175
+ await new Promise(r => { resolveNext = r; });
176
+ }
177
+ return events.shift();
178
+ }
179
+
180
+ const workerCount = Math.min(concurrency, survivors.length || 1);
181
+ const pool = new Array(workerCount).fill(0).map(async () => {
182
+ while (true) {
183
+ const i = idx++;
184
+ if (i >= survivors.length) break;
185
+ const model = survivors[i];
186
+ const modelId = model.modelId || model.id || model.model || model.modelId;
187
+ try {
188
+ emit({ modelId, status: 'config_fetching' });
189
+ const fetchResult = await fetchConfigForModel(modelId);
190
+ const entry = classifyModel(model, fetchResult);
191
+ results.push(entry);
192
+ emit({ modelId, status: 'classified', data: entry });
193
+ } catch (err) {
194
+ errors.push({ modelId, message: String(err) });
195
+ emit({ modelId, status: 'error', data: { message: String(err) } });
196
+ } finally {
197
+ processed++;
198
+ }
199
+ }
200
+ });
201
+
202
+ // consume events as workers produce them
203
+ while (processed < survivors.length) {
204
+ const ev = await nextEvent();
205
+ yield ev;
206
+ }
207
+
208
+ // make sure any remaining events are yielded
209
+ while (events.length > 0) {
210
+ yield events.shift();
211
+ }
212
+
213
+ await Promise.all(pool);
214
+
215
+ // final
216
+ const models = results.map(r => ({ id: r.id, model_type: r.model_type, architectures: r.architectures, classification: r.classification, confidence: r.confidence, fetchStatus: r.fetchStatus }));
217
+ const meta = { fetched: listing.length, filtered: survivors.length, errors };
218
+ yield { status: 'done', models, meta };
219
+ } finally {
220
+ // abort any in-flight fetches if iteration stopped early
221
+ for (const c of Array.from(inFlight)) try { c.abort(); } catch (e) {}
222
+ }
223
+ }