Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
29.7
TFLOPS
27
11
28
Pavan Kumar Balijepalli
pavankumarbalijepalli
Follow
dark-pen's profile picture
Lingalingeswaran's profile picture
Prashantkh70's profile picture
5 followers
·
6 following
pavankumarbalijepalli
pavan-kumar-balijepalli
AI & ML interests
Learn. Build. Teach.
Recent Activity
posted
an
update
3 days ago
The quadratic bottleneck of long-context LLMs just hit a massive speed wall. Processing long-context sequences in LLMs is computationally expensive due to the quadratic complexity of self-attention. Existing sparse attention methods often rely on sorting or cumulative summation (Top-k/Top-p), which are slow and struggle to prune the "long-tail" of irrelevant tokens. - FlashPrefill achieves a 27.78× speedup on 256K sequences by replacing heavy sorting with a Max-based Dynamic Thresholding mechanism. - It introduces "Instantaneous Pattern Discovery" using block-level approximations, bypassing the need for expensive, full-attention score calculations. - Unlike previous methods that struggle with shorter contexts, it maintains a 1.71× speedup even at 4K, proving its robustness across all scales. - The framework is fully compatible with existing LLM/VLM architectures and integrates seamlessly into vLLM for real-world deployment. This breakthrough significantly reduces Time-to-First-Token (TTFT) for long-context applications, making massive document analysis and long-video understanding practical and cost-effective. It turns a major performance bottleneck into a streamlined, hardware-efficient process. How much compute are we wasting on "long-tail" tokens that don't actually matter? FlashPrefill suggests the answer is: a lot. #AI #LLMs #MachineLearning #DeepLearning #TechInnovation #GPUComputing Source: https://arxiv.org/pdf/2603.06199
updated
a Space
10 days ago
pavankumarbalijepalli/portfolio
published
a Space
10 days ago
pavankumarbalijepalli/portfolio
View all activity
Organizations
pavankumarbalijepalli
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
3 models
about 2 months ago
Disty0/LTX-2-SDNQ-4bit-dynamic
Updated
Jan 8
•
337
•
12
tencent/HunyuanVideo-1.5
Text-to-Video
•
Updated
Dec 25, 2025
•
637
•
•
582
meituan-longcat/LongCat-Video
Text-to-Video
•
Updated
Oct 29, 2025
•
821
•
•
445
liked
a model
4 months ago
neuphonic/neutts-air
Text-to-Speech
•
0.7B
•
Updated
29 days ago
•
11.9k
•
855
liked
a model
7 months ago
rednote-hilab/dots.ocr
Image-Text-to-Text
•
Updated
Oct 31, 2025
•
247k
•
1.27k
liked
a dataset
11 months ago
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
Jan 3
•
11.6k
•
4.79k
•
383
liked
2 models
11 months ago
OpenHands/openhands-lm-32b-v0.1
Text Generation
•
33B
•
Updated
Apr 16, 2025
•
340
•
•
392
Qwen/Qwen2.5-Omni-7B
Any-to-Any
•
Updated
Apr 30, 2025
•
435k
•
1.87k
liked
a model
12 months ago
rasbt/llama-3.2-from-scratch
Updated
Jun 12, 2025
•
284
liked
4 models
about 1 year ago
google/gemma-3-12b-it
Image-Text-to-Text
•
Updated
Mar 21, 2025
•
1.77M
•
•
676
pavankumarbalijepalli/telLM-gemma2-9b-16bit
Text Generation
•
9B
•
Updated
May 15, 2025
•
1
•
1
pavankumarbalijepalli/phi2-nl2sql-lora
Text Generation
•
3B
•
Updated
Feb 28, 2025
•
8
•
1
pavankumarbalijepalli/telLM-gemma2-9b
Updated
Mar 1, 2025
•
1
liked
a dataset
about 1 year ago
eswardivi/telugu_instruction_dataset
Viewer
•
Updated
Feb 1, 2024
•
145k
•
39
•
5
liked
2 models
about 1 year ago
sarvamai/sarvam-1
Text Generation
•
3B
•
Updated
Nov 8, 2024
•
8.66k
•
132
watt-ai/watt-tool-8B
Updated
Dec 20, 2024
•
192k
•
117
liked
3 datasets
about 1 year ago
Salesforce/xlam-function-calling-60k
Viewer
•
Updated
Jan 24, 2025
•
60k
•
6.51k
•
580
indiehackers/hellaswag-telugu-custom
Viewer
•
Updated
Apr 22, 2024
•
10k
•
9
•
1
indiehackers/Telugu_InstructData
Viewer
•
Updated
Mar 2, 2024
•
33.4k
•
7
•
1
liked
a model
about 1 year ago
microsoft/phi-4
Text Generation
•
Updated
Nov 24, 2025
•
1M
•
2.23k
Load more