17 53 28

ct2

ct-2

AI & ML interests

None yet

Recent Activity

upvoted a paper about 15 hours ago

Variable-Width Transformers

upvoted a paper about 15 hours ago

Tapered Language Models

upvoted a collection about 21 hours ago

EdgeRazor-Nbit

View all activity

Organizations

None yet

upvoted 2 papers about 15 hours ago

Variable-Width Transformers

Paper • 2606.18246 • Published 10 days ago • 15

Tapered Language Models

Paper • 2606.23670 • Published 4 days ago • 7

upvoted a collection about 21 hours ago

EdgeRazor-Nbit

Collection

16 items • Updated May 7 • 9

upvoted a paper 5 days ago

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

Paper • 2606.20381 • Published 8 days ago • 9

upvoted a paper 15 days ago

Kwai Keye-VL-2.0 Technical Report

Paper • 2606.10651 • Published 17 days ago • 189

upvoted a paper 19 days ago

The Shape of Addition: Geometric Structures of Arithmetic in Large Language Models

Paper • 2606.03645 • Published 28 days ago • 5

upvoted a paper 20 days ago

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Paper • 2606.02553 • Published 25 days ago • 19

upvoted a paper about 1 month ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published May 22 • 13

upvoted a collection about 1 month ago

BitCPM-CANN

Collection

Full-pipeline ternary quantized model trained on CANN. • 12 items • Updated May 24 • 28

upvoted 5 papers about 1 month ago

StateSMix: Online Lossless Compression via Mamba State Space Models and Sparse N-gram Context Mixing

Paper • 2605.02904 • Published Apr 5 • 8

upvoted a paper 2 months ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82

upvoted a collection 3 months ago

Trinity-Large-Thinking

Collection

5 items • Updated Apr 10 • 32

upvoted 3 papers 3 months ago

InCoder-32B: Code Foundation Model for Industrial Scenarios

Paper • 2603.16790 • Published Mar 17 • 312

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

Paper • 2603.13364 • Published Mar 9 • 9

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Paper • 2603.10444 • Published Mar 11 • 12

upvoted a paper 4 months ago

Mixture of Attention Heads: Selecting Attention Heads Per Token

Paper • 2210.05144 • Published Oct 11, 2022 • 3

ct2

AI & ML interests

Recent Activity

Organizations

ct-2's activity