leafspark
/

Mistral-Large-218B-Instruct

@@ -3,103 +3,53 @@ license: other
 license_name: mrl
 license_link: https://mistral.ai/licenses/MRL-0.1.md
 language:
-  - en
-  - fr
-  - de
-  - es
-  - it
-  - pt
-  - zh
-  - ja
-  - ru
-  - ko
 ---
 # Mistral-Large-218B-Instruct
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
-Mistral-Large-218B-Instruct is an advanced dense Large Language Model (LLM) with 218 billion parameters, featuring state-of-the-art reasoning, knowledge, and coding capabilities.
-Self-merged from the original Mistral Large 2, see mergekit config below.
 ## Key features
-- Massive scale: With 218 billion parameters, this model pushes the boundaries of language model capabilities.
-- Multi-lingual by design: Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
-- Proficient in coding: Trained on 80+ coding languages such as Python, Java, C, C++, JavaScript, and Bash, as well as more specific languages like Swift and Fortran.
-- Agentic-centric: Best-in-class agentic capabilities with native function calling and JSON outputting.
-- Advanced Reasoning: State-of-the-art mathematical and reasoning capabilities.
-- Mistral Research License: Allows usage and modification for research and non-commercial purposes.
-- Large Context: Features a large 128k context window for handling extensive input.
-## Metrics
-Note: The following metrics are based on the original model and may differ for this 218B parameter version. Updated benchmarks will be provided when available.
-**Base Pretrained Benchmarks**
-| Benchmark | Score |
-| --- | --- |
-| MMLU | 84.0% |
-**Base Pretrained Multilingual Benchmarks (MMLU)**
-| Benchmark | Score |
-| --- | --- |
-| French | 82.8% |
-| German | 81.6% |
-| Spanish | 82.7% |
-| Italian | 82.7% |
-| Dutch | 80.7% |
-| Portuguese | 81.6% |
-| Russian | 79.0% |
-| Korean | 60.1% |
-| Japanese | 78.8% |
-| Chinese | 74.8% |
-**Instruction Benchmarks**
-| Benchmark | Score |
-| --- | --- |
-| MT Bench | 8.63 |
-| Wild Bench | 56.3 |
-| Arena Hard| 73.2 |
-**Code & Reasoning Benchmarks**
-| Benchmark | Score |
-| --- | --- |
-| Human Eval | 92% |
-| Human Eval Plus| 87% |
-| MBPP Base| 80% |
-| MBPP Plus| 69% |
-**Math Benchmarks**
-| Benchmark | Score |
-| --- | --- |
-| GSM8K | 93% |
-| Math Instruct (0-shot, no CoT) | 70% |
-| Math Instruct (0-shot, CoT)| 71.5% |
-## Usage
-This model can be used with standard LLM frameworks and libraries. Specific usage instructions will be provided upon release.
 ## Hardware Requirements
 Given the size of this model (218B parameters), it requires substantial computational resources for inference:
 - Recommended: 8xH100 (640GB)
-- Alternatively: Distributed inference setup across multiple machines.
 ## Limitations
-- This model does not have built-in moderation mechanisms. Users should implement appropriate safeguards for deployment in production environments.
-- Due to its size, inference may be computationally expensive and require significant hardware resources.
-- As with all large language models, it may exhibit biases present in its training data.
-- The model's outputs should be critically evaluated, especially for sensitive applications.
 ## Notes
-This was just a fun testing model, merged with the `merge.py` script in the base of the repo. Find GGUFs at [mradermacher/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-GGUF)
 Compatible `mergekit` config:
 ```yaml

 license_name: mrl
 license_link: https://mistral.ai/licenses/MRL-0.1.md
 language:
+- en
+- fr
+- de
+- es
+- it
+- pt
+- zh
+- ja
+- ru
+- ko
+pipeline_tag: text-generation
 ---
 # Mistral-Large-218B-Instruct
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6604e5b21eb292d6df393365/P-BGJ5Ba2d1NkpdGXNThe.png)
+Mistral-Large-218B-Instruct is a dense Large Language Model (LLM) with 218 billion parameters. Self-merged from the original Mistral Large 2.
 ## Key features
+- 218 billion parameters
+- Multi-lingual support for dozens of languages
+- Trained on 80+ coding languages
+- 128k context window
+- Mistral Research License: Allows usage and modification for research and non-commercial purposes
 ## Hardware Requirements
 Given the size of this model (218B parameters), it requires substantial computational resources for inference:
 - Recommended: 8xH100 (640GB)
+- Alternatively: Distributed inference setup across multiple machines
 ## Limitations
+- No built-in moderation mechanisms
+- Computationally expensive inference
+- May exhibit biases present in training data
+- Outputs should be critically evaluated for sensitive applications
 ## Notes
+This was just a fun testing model, merged with the `merge.py` script in the base of the repo.
+## Quants
+GGUF: [mradermacher/Mistral-Large-218B-Instruct-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-GGUF)
+imatrix GGUF: [mradermacher/Mistral-Large-218B-Instruct-i1-GGUF](https://huggingface.co/mradermacher/Mistral-Large-218B-Instruct-i1-GGUF)
 Compatible `mergekit` config:
 ```yaml