kyutai
/

ARC8_Encoder_Llama

@@ -1,13 +1,14 @@
 ---
-license: cc-by-4.0
 language:
 - en
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
- # ARC-Encoder models
  This page houses `ARC8-Encoder_Llama` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535). A code to reproduce the pretraining, further fine-tune the encoders or even evaluate them on dowstream tasks is available at [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder/tree/main).
@@ -15,7 +16,7 @@ tags:
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
-- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://github.com/mistralai/mistral-finetune?tab=readme-ov-file) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses

 ---
 language:
 - en
+license: cc-by-4.0
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
+pipeline_tag: feature-extraction
 ---
+# ARC-Encoder models
  This page houses `ARC8-Encoder_Llama` from three different versions of pretrained ARC-Encoders. Architectures and methods to train them are described in the paper *ARC-Encoder: learning compressed text representations for large language models* available [here](https://arxiv.org/abs/2510.20535). A code to reproduce the pretraining, further fine-tune the encoders or even evaluate them on dowstream tasks is available at [ARC-Encoder repository](https://github.com/kyutai-labs/ARC-Encoder/tree/main).
  All the encoders released here are trained on web crawl filtered using [Dactory](https://github.com/kyutai-labs/dactory) based on a [Llama3.2-3B](https://github.com/meta-llama/llama-cookbook) base backbone. It consists in two ARC-Encoder specifically trained for one decoder and one for two decoders in the same time:
 - `ARC8-Encoder_Llama`, trained on 2.6B tokens on [Llama3.1-8B](https://github.com/meta-llama/llama-cookbook) base specifically with a pooling factor of 8.
+- `ARC8-Encoder_Mistral`, trained on 2.6B tokens on [Mistral-7B](https://www.mistralai.com/tech/mistral-7b/) base specifically with a pooling factor of 8.
 - `ARC8-Encoder_multi`, trained by sampling among the two decoders with a pooling factor of 8.
  ### Uses