Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: Cache-to-Cache Communication Demo
emoji: π
colorFrom: blue
colorTo: blue
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
tags:
- llm
- cache-to-cache
- model-communication
- kv-cache
short_description: Compare Single, Text-to-Text, and Cache-to-Cache inference
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/6445fd9ba56444c355dcbcba/R5YOyw0aoBENYJs8Ugnbi.png
Cache-to-Cache Communication Demo
This Space demonstrates Cache-to-Cache (C2C) communication between Large Language Models, comparing three inference approaches side-by-side:
- Single Model: Standard inference with one model
- Text-to-Text (T2T): Two-stage communication where Sharer model generates text β Receiver model processes text
- Cache-to-Cache (C2C): Direct KV-Cache communication between Sharer and Receiver
What is Cache-to-Cache?
It makes language models talk without words.
Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation.
The payoff: up to 10% higher accuracy, 3β5% gains over text-based communication, and 2Γ faster responses.
Citation
@article{fu2025c2c,
title={Cache-to-Cache: Direct Semantic Communication Between Large Language Models},
author={Tianyu Fu and Zihan Min and Hanling Zhang and Jichao Yan and Guohao Dai and Wanli Ouyang and Yu Wang},
journal={arXiv preprint arXiv:2510.03215},
year={2025},
}