Salma Mayorquin PRO

salma-remyx

·

https://remyx.ai

smellslikeml

AI & ML interests

None yet

Recent Activity

reacted to Banaxi-Tech's post with 🔥 about 9 hours ago

Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision. Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2. Model: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental The important part: this is not just post-training KV cache quantization. Instead we take the BitNet approach. KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime. During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte. WikiText-2 eval vs 16-bit KV cache reference: Mean KLD: 0.0916 nats/token Mean KLD: 0.1322 bits/token Average KV cache shrink vs FP16: 5.33x Evaluated positions: 372,675 If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine! Try it here: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental Code: https://github.com/Banaxi-Tech/kv1

reacted to breitburg's post with 🔥 2 days ago

I've been experimenting with "pure" model alignment. The core idea is to only train a verifiable version of a capacity until the model generalizes it to the non-verifiable version. For example, training the model on factual self-knowledge, like the model's scale, architecture, runtime situation, and being able to predict its own behavior, betting this generalizes to real introspection about states that do not. The same principle applies to general instruction following -- no training on subjective judgement, only verifiable claims and inferences, betting the skill generalizes to instructions where correctness is a matter of judgment. The primary alignment claim is that an identity and taste that will emerge this way will be much more robust and honest than hand-scripted ones (e.g. "As an AI language model..."). During the training, we should never teach it to make any subjective claims or invent experiences that we assume it has, like "I don't have taste" or "I'm not self-aware in the way you think", as well as no narration of internal states like "I'm curious now". The main threat, of course, is that we'll simply inherit the training distribution of all the things like "taste", and we'll get an average. However, with the recent research about the models' introspection abilities, it might be as well the case that we'll get something that's more honest than something that tries to adhere to a specific spec file. I'm posting new experimental models trained that way in this collection: https://huggingface.co/collections/breitburg/neue

posted an update 2 days ago

What's holding your code back? Outrider finds, implements, and validates methods for your repo. While testing Outrider on a fork of huggingface/peft, I discovered "Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models" (arxiv: 2402.02347) The work offers improved stability and faster convergence in LoRA finetuning by adjusting updates for curvature that LoRA optimizers typically ignore. Not the most recent paper, so I was pleasantly surprised my action surfaced this method as a candidate before implementing a PR. Even more surprised this method had not already been merged upstream. Turns out, the author did try contributing to peft a couple years ago, but people get busy and the PR was closed after going stale. So I decided to revive it! I opened an issue and soon after the author engaged to help land the feature. Now huggingface/peft #3382 is open, a joint effort with the paper's author. This whole episode has me thinking about the future of OSS maintenance with AI coding. The software projects which endure will be well-shaped to quickly land and help test new ideas. Across 30 forks, I've seen several papers land as clean PRs for multiple repos, which offers a perspective on how methods impact applications. Recent methods matching multiple frameworks: STARE, Entity Binding, BINEVAL Get Outrider: https://github.com/remyxai/outrider

View all activity

Organizations

salma-remyx 's collections 2