Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
29.7
TFLOPS
28
11
28
Pavan Kumar Balijepalli
pavankumarbalijepalli
Follow
Lingalingeswaran's profile picture
regisss's profile picture
LeroyDyer's profile picture
5 followers
·
6 following
pavankumarbalijepalli
pavan-kumar-balijepalli
AI & ML interests
Learn. Build. Teach.
Recent Activity
new
activity
about 1 month ago
huggingface/InferenceSupport:
indiehackers/mistral-tenglish-april5_2
posted
an
update
about 2 months ago
The quadratic bottleneck of long-context LLMs just hit a massive speed wall. Processing long-context sequences in LLMs is computationally expensive due to the quadratic complexity of self-attention. Existing sparse attention methods often rely on sorting or cumulative summation (Top-k/Top-p), which are slow and struggle to prune the "long-tail" of irrelevant tokens. - FlashPrefill achieves a 27.78× speedup on 256K sequences by replacing heavy sorting with a Max-based Dynamic Thresholding mechanism. - It introduces "Instantaneous Pattern Discovery" using block-level approximations, bypassing the need for expensive, full-attention score calculations. - Unlike previous methods that struggle with shorter contexts, it maintains a 1.71× speedup even at 4K, proving its robustness across all scales. - The framework is fully compatible with existing LLM/VLM architectures and integrates seamlessly into vLLM for real-world deployment. This breakthrough significantly reduces Time-to-First-Token (TTFT) for long-context applications, making massive document analysis and long-video understanding practical and cost-effective. It turns a major performance bottleneck into a streamlined, hardware-efficient process. How much compute are we wasting on "long-tail" tokens that don't actually matter? FlashPrefill suggests the answer is: a lot. #AI #LLMs #MachineLearning #DeepLearning #TechInnovation #GPUComputing Source: https://arxiv.org/pdf/2603.06199
updated
a Space
2 months ago
pavankumarbalijepalli/portfolio
View all activity
Organizations
pavankumarbalijepalli
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
3 models
4 months ago
Disty0/LTX-2-SDNQ-4bit-dynamic
Updated
Jan 8
•
50
•
12
tencent/HunyuanVideo-1.5
Text-to-Video
•
Updated
Dec 25, 2025
•
1.84k
•
•
982
meituan-longcat/LongCat-Video
Text-to-Video
•
Updated
Oct 29, 2025
•
1.3k
•
•
470
liked
a model
6 months ago
neuphonic/neutts-air
Text-to-Speech
•
0.7B
•
Updated
Feb 12
•
13.3k
•
871
liked
a model
9 months ago
rednote-hilab/dots.ocr
Image-Text-to-Text
•
3B
•
Updated
Oct 31, 2025
•
203k
•
1.3k
liked
a dataset
about 1 year ago
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
Jan 3
•
11.6k
•
12.4k
•
404
liked
7 models
about 1 year ago
OpenHands/openhands-lm-32b-v0.1
Text Generation
•
33B
•
Updated
Apr 16, 2025
•
136
•
391
Qwen/Qwen2.5-Omni-7B
Any-to-Any
•
Updated
Apr 30, 2025
•
592k
•
1.89k
rasbt/llama-3.2-from-scratch
Updated
Jun 12, 2025
•
284
google/gemma-3-12b-it
Image-Text-to-Text
•
Updated
Mar 21, 2025
•
2.79M
•
•
713
pavankumarbalijepalli/telLM-gemma2-9b-16bit
Text Generation
•
9B
•
Updated
May 15, 2025
•
7
•
1
pavankumarbalijepalli/phi2-nl2sql-lora
Text Generation
•
3B
•
Updated
Feb 28, 2025
•
5
•
1
pavankumarbalijepalli/telLM-gemma2-9b
Updated
Mar 1, 2025
•
1
liked
a dataset
about 1 year ago
eswardivi/telugu_instruction_dataset
Viewer
•
Updated
Feb 1, 2024
•
145k
•
36
•
5
liked
a model
about 1 year ago
sarvamai/sarvam-1
Text Generation
•
3B
•
Updated
Nov 8, 2024
•
6.49k
•
133
liked
a model
over 1 year ago
watt-ai/watt-tool-8B
Updated
Dec 20, 2024
•
56.1k
•
117
liked
3 datasets
over 1 year ago
Salesforce/xlam-function-calling-60k
Viewer
•
Updated
Jan 24, 2025
•
60k
•
13.8k
•
610
indiehackers/hellaswag-telugu-custom
Viewer
•
Updated
Apr 22, 2024
•
10k
•
7
•
1
indiehackers/Telugu_InstructData
Viewer
•
Updated
Mar 2, 2024
•
33.4k
•
8
•
1
liked
a model
over 1 year ago
microsoft/phi-4
Text Generation
•
Updated
Nov 24, 2025
•
752k
•
2.24k
Load more