Files changed (4) hide show
  1. README.md +2 -57
  2. tokenizer.json +0 -0
  3. tokenizer.model +2 -2
  4. tokenizer_config.json +0 -0
README.md CHANGED
@@ -1,72 +1,17 @@
1
  ---
2
- library_name: vllm
3
  language:
4
  - code
5
  license: other
6
  tags:
7
  - code
8
- - mistral-common
9
  inference: false
10
  license_name: mnpl
11
  license_link: https://mistral.ai/licences/MNPL-0.1.md
12
-
13
- extra_gated_description: If you want to learn more about how we process your personal data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
14
  ---
15
 
16
  # Model Card for Codestral-22B-v0.1
17
 
18
-
19
- ## Encode and Decode with `mistral_common`
20
-
21
- ```py
22
- from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
23
- from mistral_common.protocol.instruct.messages import UserMessage
24
- from mistral_common.protocol.instruct.request import ChatCompletionRequest
25
-
26
- mistral_models_path = "MISTRAL_MODELS_PATH"
27
-
28
- tokenizer = MistralTokenizer.v3()
29
-
30
- completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
31
-
32
- tokens = tokenizer.encode_chat_completion(completion_request).tokens
33
- ```
34
-
35
- ## Inference with `mistral_inference`
36
-
37
- ```py
38
- from mistral_inference.transformer import Transformer
39
- from mistral_inference.generate import generate
40
-
41
- model = Transformer.from_folder(mistral_models_path)
42
- out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
43
-
44
- result = tokenizer.decode(out_tokens[0])
45
-
46
- print(result)
47
- ```
48
-
49
- ## Inference with hugging face `transformers`
50
-
51
- ```py
52
- from transformers import AutoModelForCausalLM
53
-
54
- model = AutoModelForCausalLM.from_pretrained("mistralai/Codestral-22B-v0.1")
55
- model.to("cuda")
56
-
57
- generated_ids = model.generate(tokens, max_new_tokens=1000, do_sample=True)
58
-
59
- # decode with mistral tokenizer
60
- result = tokenizer.decode(generated_ids[0].tolist())
61
- print(result)
62
- ```
63
-
64
- > [!TIP]
65
- > PRs to correct the `transformers` tokenizer so that it gives 1-to-1 the same results as the `mistral_common` reference implementation are very welcome!
66
-
67
- ---
68
-
69
- Codestral-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the [Blogpost](https://mistral.ai/news/codestral/)). The model can be queried:
70
  - As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
71
  - As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)
72
 
@@ -126,7 +71,7 @@ This function uses recursion to calculate the Fibonacci number. However, it's no
126
  After installing `mistral_inference` and running `pip install --upgrade mistral_common` to make sure to have mistral_common>=1.2 installed:
127
 
128
  ```py
129
- from mistral_inference.transformer import Transformer
130
  from mistral_inference.generate import generate
131
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
132
  from mistral_common.tokens.instruct.request import FIMRequest
 
1
  ---
 
2
  language:
3
  - code
4
  license: other
5
  tags:
6
  - code
 
7
  inference: false
8
  license_name: mnpl
9
  license_link: https://mistral.ai/licences/MNPL-0.1.md
 
 
10
  ---
11
 
12
  # Model Card for Codestral-22B-v0.1
13
 
14
+ Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash (more details in the [Blogpost](https://mistral.ai/news/codestral/)). The model can be queried:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - As instruct, for instance to answer any questions about a code snippet (write documentation, explain, factorize) or to generate code following specific indications
16
  - As Fill in the Middle (FIM), to predict the middle tokens between a prefix and a suffix (very useful for software development add-ons like in VS Code)
17
 
 
71
  After installing `mistral_inference` and running `pip install --upgrade mistral_common` to make sure to have mistral_common>=1.2 installed:
72
 
73
  ```py
74
+ from mistral_inference.model import Transformer
75
  from mistral_inference.generate import generate
76
  from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
77
  from mistral_common.tokens.instruct.request import FIMRequest
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer.model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9addc8bdce5988448ae81b729336f43a81262160ae8da760674badab9d4c7d33
3
- size 587591
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
tokenizer_config.json CHANGED
The diff for this file is too large to render. See raw diff