Use generate_compiled (CUDA graph) for ~2.75x faster inference ca9ab6d verified McClain commited on 18 days ago
Filter threshold 90%, wrap annotation description text 76c68b9 McClain Claude Opus 4.6 commited on Mar 10
Fix BaseStreamer import — use duck-typed class instead ee47082 McClain Claude Opus 4.6 commited on Mar 10
Streaming progress bar, rename to Top N, fix scoring 74ad07d McClain Claude Opus 4.6 commited on Mar 10
Pin transformers<5: model built for 4.x, 5.3.0 breaks sampling 49d962d McClain Claude Opus 4.6 commited on Mar 10
Add startup diagnostic: test forward pass + greedy gen 985f40a McClain Claude Opus 4.6 commited on Mar 10
Match model card loading exactly: no dtype, no device_map b9574ff McClain Claude Opus 4.6 commited on Mar 10
Fix: load without device_map to avoid accelerate dispatch hooks bd5a9ee McClain Claude Opus 4.6 commited on Mar 10
Fix: use dtype= and device_map= instead of torch_dtype + .to() 4d923df McClain Claude Opus 4.6 commited on Mar 10
Add detailed logging to generate_dna for debugging 32d3452 McClain Claude Opus 4.6 commited on Mar 10
Simplify: remove ZeroGPU, use dedicated GPU directly de345f0 McClain Claude Opus 4.6 commited on Mar 10
Use KV cache in manual generation + stop cleanly on NaN 3919227 McClain Claude Opus 4.6 commited on Mar 10
Fix: force eos/pad token IDs from vocab when tokenizer doesn't set them 7bb8be1 McClain Claude Opus 4.6 commited on Mar 10
Fix: handle None pad_token_id in manual generate loop c8a599a McClain Claude Opus 4.6 commited on Mar 10
Replace model.generate() with manual loop + Gumbel-max sampling 9aebdcd McClain Claude Opus 4.6 commited on Mar 10
Fix numpy.bool8 compat: shim for bokeh 2.4.3 + numpy 2.x 199a96f McClain Claude Opus 4.6 commited on Mar 10
Fix ZeroGPU NaN: re-init non-persistent RoPE buffers on GPU c7935cf McClain Claude Opus 4.6 commited on Mar 10