Upload Multilingual-e5-large-Q8

de631d7 verified 1 day ago

4.53 kB

	---
	library_name: mlx
	license: mit
	pipeline_tag: feature-extraction
	base_model: intfloat/multilingual-e5-large
	tags:
	- mlx
	- embeddings
	- sentence-transformers
	- xlm-roberta
	- multilingual
	- quantized
	- int8
	- q8
	- revis
	---

	# mavis-ai/Multilingual-e5-large-Q8

	This repository contains an 8-bit quantized MLX-compatible distribution of `intfloat/multilingual-e5-large`, prepared for use with R.E.V.I.S. as its local semantic embedding model.

	The model is intended for local text embedding, semantic recall, RAG retrieval, and multilingual semantic search workflows.

	## Important Notice

	This repository is hosted primarily as a dedicated download source for the R.E.V.I.S. application ecosystem. You are free to download and use this model package for your own local embedding or MLX workflows, subject to the MIT License and the attribution notices included in this repository.

	This package is not a new embedding model and has not been fine-tuned. It is a quantized redistribution of `intfloat/multilingual-e5-large`.

	For the original model card, training details, intended usage, and evaluation information, refer to the official upstream model:

	- Original model: <https://huggingface.co/intfloat/multilingual-e5-large>
	- Base architecture: XLM-RoBERTa large
	- Embedding size: 1024

	## Quantization

	This package stores selected 2D weight tensors using a R.E.V.I.S. Q8 format:

	- Quantization type: symmetric per-row int8
	- Scale format: per-row scale tensor
	- Expected dequantization: `weight = qweight.astype(float16) * scale[:, None].astype(float16)`

	Typical tensor layout:

	```text
	encoder.layer.0.attention.self.query.weight.qweight
	encoder.layer.0.attention.self.query.weight.scale
	```

	Non-quantized tensors, such as LayerNorm parameters, bias tensors, and other small metadata tensors, are preserved in their original floating-point representation.

	This format is optimized for smaller download and storage size. In the current R.E.V.I.S. runtime, q8 tensors may be dequantized to floating point at load time for compatibility with the existing embedding forward path.

	## Optimized for R.E.V.I.S. (Local Cognitive OS)

	We host this model package to serve as the local semantic embedding engine for R.E.V.I.S.

	R.E.V.I.S. is a 100% local Cognitive OS for Multi-Agentic AI. It transforms your Mac devices into a distributed Agentic Swarm via zero-config Wi-Fi clustering, allowing you to run heavy AI workloads—like recursive web research, dynamic RAG generation, and multi-step logic—without killing single-machine performance.

	If you are interested in pushing the absolute limits of local AI and open-weight models, check out our project.

	- Official Website: <https://mavis-ai.co.jp/revis/>
	- Watch the 13-min Raw Demo (Multi-node Dynamic RAG): <https://x.gd/LxaBF>
	- Follow our updates on X: <https://x.com/mavis_ai_jp>

	## Usage Notes

	For retrieval-style tasks, E5 models typically use different text prefixes for queries and passages. R.E.V.I.S. applies its own canonical query and passage formatting internally.

	If you use this package outside R.E.V.I.S., refer to the upstream E5 instructions for recommended prompt prefixes and pooling behavior.

	## Files

	Recommended repository files:

	```text
	README.md
	LICENSE
	NOTICE
	weights.00.safetensors
	config.json
	tokenizer.json
	tokenizer_config.json
	special_tokens_map.json
	quantization.json
	```

	## License

	This repository redistributes a quantized package derived from `intfloat/multilingual-e5-large`, which is released under the MIT License.

	The upstream copyright notice and MIT License text are preserved in `LICENSE`.

	Additional attribution and redistribution notes are included in `NOTICE`.

	## Attribution

	Original model:

	```text
	intfloat/multilingual-e5-large
	https://huggingface.co/intfloat/multilingual-e5-large
	```

	Original authors / associated paper:

	```text
	Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, Furu Wei
	Multilingual E5 Text Embeddings: A Technical Report
	```

	R.E.V.I.S. Q8 package:

	```text
	Prepared and redistributed by MAVIS / R.E.V.I.S.
	Quantization: symmetric per-row int8 Q8 package for local MLX embedding runtime
	```

	## Modification Notice

	Compared with the upstream `intfloat/multilingual-e5-large` release, this repository applies the following packaging modification:

	```text
	Selected 2D weight tensors were quantized to symmetric per-row int8 q8 representation.
	```

	No fine-tuning, additional training, or architecture-level modification has been applied.