--- title: Cache-to-Cache Communication Demo emoji: 🔗 colorFrom: blue colorTo: blue sdk: gradio sdk_version: 5.49.1 app_file: app.py pinned: false license: apache-2.0 tags: - llm - cache-to-cache - model-communication - kv-cache short_description: Compare Single, Text-to-Text, and Cache-to-Cache inference thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/6445fd9ba56444c355dcbcba/R5YOyw0aoBENYJs8Ugnbi.png --- # Cache-to-Cache Communication Demo This Space demonstrates **Cache-to-Cache (C2C)** communication between Large Language Models, comparing three inference approaches side-by-side: 1. **Single Model**: Standard inference with one model 2. **Text-to-Text (T2T)**: Two-stage communication where Sharer model generates text → Receiver model processes text 3. **Cache-to-Cache (C2C)**: Direct KV-Cache communication between Sharer and Receiver ## What is Cache-to-Cache? It makes language models talk without words. Cache-to-Cache (C2C) lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation. The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses. ## Citation ```bibtex @article{fu2025c2c, title={Cache-to-Cache: Direct Semantic Communication Between Large Language Models}, author={Tianyu Fu and Zihan Min and Hanling Zhang and Jichao Yan and Guohao Dai and Wanli Ouyang and Yu Wang}, journal={arXiv preprint arXiv:2510.03215}, year={2025}, } ```