open/ acc

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Ritvik19 authored a paper 34 minutes ago

Aryabhata: An exam-focused language model for JEE Math

MElHuseyni authored a paper 17 days ago

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

MElHuseyni submitted a paper 18 days ago

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

View all activity

Sri-Vigneshwar-DJ

posted an update 8 days ago

Post

101

![Feather DB LongMemEval Results]( Hawky-ai/longmemeval-results)

We ran Feather DB v0.8.0 on LongMemEval (ICLR 2025) — 500 questions across real multi-session conversations, up to 115K tokens each.

**Score: 0.693** · GPT-4o full-context baseline: 0.640
Full 500-question run with Gemini-Flash: **$2.40**

Per-axis breakdown:
→ Info-extraction: **0.942**
→ Knowledge-update: **0.714**
→ Multi-session: **0.606**
→ Temporal: **0.477** ← the hard one, Phase 9 addresses this

Architecture: Hybrid BM25+dense · adaptive temporal decay · embedded (no server) · p50 = 0.19ms · MIT

pip install feather-db

Raw results + audit JSONs: Hawky-ai/longmemeval-results

MElHuseyni

authored a paper 17 days ago

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

Paper • 2604.19321 • Published 20 days ago • 7

MElHuseyni

submitted a paper to Daily Papers 18 days ago

RDP LoRA: Geometry-Driven Identification for Parameter-Efficient Adaptation in Large Language Models

Paper • 2604.19321 • Published 20 days ago • 7

sdiazlor

posted an update 25 days ago

Post

138

As First Prune, the one-year Pruna OSS anniversary, is halfway.

We’re sharing a recap blog post about our OSS journey — how we started, what we’ve built so far, and what’s next.

Read it here: https://dev.to/pruna-ai/first-prune-celebrate-one-year-of-pruna-oss-50gp

sdiazlor

posted an update about 1 month ago

Post

Pruna OSS is turning 1! To mark this milestone, we're launching the First Prune initiative.

What's First Prune:
If you contribute to open issues at our GitHub repo, you earn Pruna Inference API credits.

How you can participate:
• Pick an open issue labelled "first-prune" and assign it to you
• Submit your PR and mark it ready for review by April 30
• Find out more in the PR template when you open a PR

Each merged PR scores 30 credits.

Let’s build something great together! Find your issue: https://github.com/PrunaAI/pruna/issues

1aurent

authored a paper about 1 month ago

Voxtral TTS

Paper • 2603.25551 • Published Mar 26 • 61

sdiazlor

posted an update 2 months ago

Post

2608

More OSS than ever with the latest pruna 0.3.2 release. It extends existing algorithm families, such as compilers, kernels, and pruners, and adds new ones, including decoders, distillers, enhancers, and recoverers. But it's not only a collection of algorithms; instead, you can easily combine them to get the biggest efficiency win.

Read the full blog here: https://huggingface.co/blog/PrunaAI/pruna-0-3-2-open-source-optimization-algorithms

mitkox

posted an update 3 months ago

Post

5605

My USB charger has a Blackwell GPU and 128GB RAM.
What. A. Time. To. Be. Alive.
People in Sofia: “It’s freezing.”
Me: sitting next to 3kW of space AI heaters on my desk 👀
1x GLM-5, 2x MiniMax-M2.5, 1x Qwen3 Coder Next; all on single Aibrix/K8s cluster

6 replies

mitkox

posted an update 3 months ago

Post

493

134,614 tok/sec input prefil max
1031 tokens/sec out gen max

At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue

On my GPU workstation:
- Z8 Fury G5 4x A6000
- MiniMax-M2.5
- Claude Code to localhost:8000

1 reply

1aurent

authored a paper 3 months ago

Ministral 3

Paper • 2601.08584 • Published Jan 13 • 61

mitkox

posted an update 3 months ago

Post

4819

I just pushed Claude Code Agent Swarm with 20 coding agents on my desktop GPU workstation.

With local AI, I don’t have /fast CC switch, but I have /absurdlyfast:
- 100’499 tokens/second read, yeah 100k, not a typo | 811 tok/sec generation
- KV cache: 707’200 tokens
- Hardware: 5+ year old GPUs 4xA6K gen1; It’s not the car. It’s the driver.

Qwen3 Coder Next AWQ with cache at BF16. Scores 82.1% in C# on 29-years-in-dev codebase vs Opus 4.5 at only 57.5%. When your codebase predates Stack Overflow, you don't need the biggest model; you need the one that actually remembers Windows 95.

My current bottleneck is my 27" monitor. Can't fit all 20 Theos on screen without squinting.

3 replies

Sri-Vigneshwar-DJ

posted an update 3 months ago

Post

1458

Just released a new dataset designed for training reasoning models on Meta (Facebook/Instagram) advertising fatigue detection!

What is it? A GRPO (Group Relative Policy Optimization) training dataset with 200+ carefully crafted scenarios covering:

🔍 Fatigue Signal Detection: CTR drops, CPM spikes, frequency analysis
🩺 Performance Diagnosis: Root cause analysis frameworks
📋 Strategy: Creative refresh cadence, testing frameworks
📊 Analysis: ROI calculations, metric interpretation
Why GRPO? GRPO training helps models learn structured reasoning. Each response follows the <thinking> and <answer> format.

Check it out here: Sri-Vigneshwar-DJ/meta-fatigue-grpo-dataset

julien-c

submitted a paper to Daily Papers 3 months ago

Shaping capabilities with token-level data filtering

Paper • 2601.21571 • Published Jan 29 • 29

cfahlgren1

submitted a paper to Daily Papers 3 months ago

How AI Impacts Skill Formation

Paper • 2601.20245 • Published Jan 28 • 10

mitkox

posted an update 3 months ago

Post

354

▐▛██▜▌ Claude Code v2.1.23
▝████▘ Kimi-K2.5 · API Usage Billing
▘▘ ▝▝ ~/dev/vllm
/model to try Opus 4.5
❯ hey
● Hello! How can I help you today?
❯ what model are you?
● I'm Claude Kimi-K2.5, running in a local environment on Linux.

Took some time to download and vLLM hybrid inferencing magic to get it running on my desktop workstation.

Sri-Vigneshwar-DJ

posted an update 3 months ago

Post

240

🏙️ Hugging Face Community Post
Title: 🧬 Experimenting with "Dynamic Chaos" in Tamil SLMs

Hi everyone! I just published a new experimental study on Small Language Model (SLM) resilience.

I took the Qwen2.5-0.5B model and put it through a "Chaos Phase" to see how much weight data a tiny model can lose before its understanding of classical Tamil grammar breaks.

Key highlights of the study:

Target Data: Fine-tuned on the Thirukkural (1,330 couplets + modern explanations).
The Chaos Step: Applied 20% random weight pruning but implemented "Layer Protection" for the Token Embeddings and LM Head to keep the characters readable.
Compression: 4-bit (Q4_K_M) quantization for extreme efficiency.
Result: A surrealist classical Tamil model that is ultra-light (~300MB) and ultra-fast!

Check out the model and the experiment logic here: Sri-Vigneshwar-DJ/qwen-tamil-chaos-v1

mitkox

posted an update 4 months ago

Post

1582

GLM-4.7-Flash is fast, good and cheap.
3,074 tokens/sec peak at 200k tokens context window on my desktop PC.
Works with Claude Code and opencode for hours. No errors, drop-in replacement of the Anthropic cloud AI.
MIT licensed, open weights, free for commercial use and modifications.
Supports speculative decoding using MTP, which is highly effective in mitigating latency.
Great for on device AI coding as AWQ 4bit at 18.5 GB. Hybrid inference on a single consumer GPU + CPU RAM.