llmat commited on
Commit
6869cb2
·
verified ·
1 Parent(s): 8d666bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -2
README.md CHANGED
@@ -1,3 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Mistral-7B-Instruct-v0.3-NVFP4
2
 
3
  NVFP4-quantized version of `mistralai/Mistral-7B-Instruct-v0.3` produced with [llmcompressor](https://github.com/vllm-project/llm-compressor).
@@ -18,7 +36,7 @@ from vllm import LLM, SamplingParams
18
  from transformers import AutoTokenizer
19
 
20
  model_id = "llmat/Mistral-7B-Instruct-v0.3-NVFP4"
21
- number_gpus = 2
22
 
23
  sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
24
 
@@ -39,4 +57,4 @@ generated_text = outputs[0].outputs[0].text
39
  print(generated_text)
40
  ```
41
 
42
- vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
 
1
+ ---
2
+ tags:
3
+ - fp4
4
+ - vllm
5
+ language:
6
+ - en
7
+ - de
8
+ - fr
9
+ - it
10
+ - pt
11
+ - hi
12
+ - es
13
+ - th
14
+ pipeline_tag: text-generation
15
+ license: apache-2.0
16
+ base_model: mistral/Mistral-7B-Instruct-v0.3
17
+ ---
18
+
19
  # Mistral-7B-Instruct-v0.3-NVFP4
20
 
21
  NVFP4-quantized version of `mistralai/Mistral-7B-Instruct-v0.3` produced with [llmcompressor](https://github.com/vllm-project/llm-compressor).
 
36
  from transformers import AutoTokenizer
37
 
38
  model_id = "llmat/Mistral-7B-Instruct-v0.3-NVFP4"
39
+ number_gpus = 1
40
 
41
  sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
42
 
 
57
  print(generated_text)
58
  ```
59
 
60
+ vLLM also supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.