File size: 1,548 Bytes
f07198b
b988ebf
 
 
 
f07198b
5c971e3
f07198b
 
 
b988ebf
 
 
 
 
 
 
 
f07198b
b988ebf
f07198b
b988ebf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
title: Cache-to-Cache Communication Demo
emoji: πŸ”—
colorFrom: blue
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- llm
- cache-to-cache
- model-communication
- kv-cache
short_description: Compare Single, Text-to-Text, and Cache-to-Cache inference
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/6445fd9ba56444c355dcbcba/R5YOyw0aoBENYJs8Ugnbi.png
---
# Cache-to-Cache Communication Demo

This Space demonstrates **Cache-to-Cache (C2C)** communication between Large Language Models, comparing three inference approaches side-by-side:

1. **Single Model**: Standard inference with one model
2. **Text-to-Text (T2T)**: Two-stage communication where Sharer model generates text β†’ Receiver model processes text
3. **Cache-to-Cache (C2C)**: Direct KV-Cache communication between Sharer and Receiver

## What is Cache-to-Cache?

It makes language models talk without words.

Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation. 

The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2Γ— faster responses.

## Citation

```bibtex
@article{fu2025c2c,
    title={Cache-to-Cache: Direct Semantic Communication Between Large Language Models}, 
    author={Tianyu Fu and Zihan Min and Hanling Zhang and Jichao Yan and Guohao Dai and Wanli Ouyang and Yu Wang},
    journal={arXiv preprint arXiv:2510.03215},
    year={2025},
}
```