Distilling the 13B SpaceLLaVA VLM-as-a-Judge into a Florence-2 model to efficiently quality filter spatialVQA datasets like OpenSpaces
Salma Mayorquin PRO
salma-remyx
AI & ML interests
None yet
Recent Activity
reacted to Banaxi-Tech's post with 🔥 about 4 hours ago
Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision.
Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2.
Model: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental
The important part: this is not just post-training KV cache quantization.
Instead we take the BitNet approach.
KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime.
During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte.
WikiText-2 eval vs 16-bit KV cache reference:
Mean KLD: 0.0916 nats/token
Mean KLD: 0.1322 bits/token
Average KV cache shrink vs FP16: 5.33x
Evaluated positions: 372,675
If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine!
Try it here: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental
Code: https://github.com/Banaxi-Tech/kv1 posted an update 2 days ago
What's holding your code back?
Outrider finds, implements, and validates methods for your repo.
While testing Outrider on a fork of huggingface/peft, I discovered "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models" (arxiv: 2402.02347)
The work offers improved stability and faster convergence in LoRA finetuning by adjusting updates for curvature that LoRA optimizers typically ignore.
Not the most recent paper, so I was pleasantly surprised my action surfaced this method as a candidate before implementing a PR. Even more surprised this method had not already been merged upstream.
Turns out, the author did try contributing to peft a couple years ago, but people get busy and the PR was closed after going stale.
So I decided to revive it! I opened an issue and soon after the author engaged to help land the feature. Now huggingface/peft #3382 is open, a joint effort with the paper's author.
This whole episode has me thinking about the future of OSS maintenance with AI coding. The software projects which endure will be well-shaped to quickly land and help test new ideas.
Across 30 forks, I've seen several papers land as clean PRs for multiple repos, which offers a perspective on how methods impact applications. Recent methods matching multiple frameworks: STARE, Entity Binding, BINEVAL
Get Outrider: https://github.com/remyxai/outrider