inference-optimization
/

sarvam-30b-FP8-Dynamic

Text Generation

compressed-tensors

Model card Files Files and versions

krishnateja95 commited on 28 days ago

Commit

c2fa53d

·

verified ·

1 Parent(s): 0963fc9

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -76,7 +76,16 @@ vLLM also supports OpenAI-compatible serving. See the [documentation](https://do
 This model was created by applying [LLM Compressor](https://github.com/vllm-project/llm-compressor), as presented in the code snippet below.
 <details>
 ```python
 from compressed_tensors.offload import dispatch_model
 from transformers import AutoModelForCausalLM, AutoTokenizer

 This model was created by applying [LLM Compressor](https://github.com/vllm-project/llm-compressor), as presented in the code snippet below.
 <details>
+  <summary>Creation details</summary>
+Install specific llm-compression version:
+```
+uv pip install git+https://github.com/vllm-project/llm-compressor.git
+uv pip install --upgrade torchvision --break-system-packages --no-cache
+```
 ```python
 from compressed_tensors.offload import dispatch_model
 from transformers import AutoModelForCausalLM, AutoTokenizer