Instructions to use amazon/MistralLite with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use amazon/MistralLite with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="amazon/MistralLite")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("amazon/MistralLite") model = AutoModelForCausalLM.from_pretrained("amazon/MistralLite") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use amazon/MistralLite with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "amazon/MistralLite" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amazon/MistralLite", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/amazon/MistralLite
- SGLang
How to use amazon/MistralLite with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "amazon/MistralLite" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amazon/MistralLite", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "amazon/MistralLite" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "amazon/MistralLite", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use amazon/MistralLite with Docker Model Runner:
docker model run hf.co/amazon/MistralLite
Commit ·
114c6dc
1
Parent(s): 52d2d5a
Update README.md
Browse files
README.md
CHANGED
|
@@ -81,7 +81,7 @@ there were some limitations on its performance on longer context. Motivated by i
|
|
| 81 |
- **Contact:** [GitHub issues](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/issues)
|
| 82 |
- **Inference Code** [Github Repo](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/)
|
| 83 |
|
| 84 |
-
## How to Use
|
| 85 |
|
| 86 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/huggingface-transformers/example_usage.ipynb).
|
| 87 |
|
|
@@ -132,7 +132,7 @@ for seq in sequences:
|
|
| 132 |
<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>
|
| 133 |
```
|
| 134 |
|
| 135 |
-
## How to Serve
|
| 136 |
**Important:**
|
| 137 |
- For an end-to-end example Jupyter notebook using the native TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi/example_usage.ipynb).
|
| 138 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi-custom/example_usage.ipynb).
|
|
@@ -199,7 +199,7 @@ result = invoke_tgi(prompt)
|
|
| 199 |
**Important** - When using MistralLite for inference for the first time, it may require a brief 'warm-up' period that can take 10s of seconds. However, subsequent inferences should be faster and return results in a more timely manner. This warm-up period is normal and should not affect the overall performance of the system once the initialisation period has been completed.
|
| 200 |
|
| 201 |
|
| 202 |
-
## How to Deploy
|
| 203 |
**Important:**
|
| 204 |
- For an end-to-end example Jupyter notebook using the SageMaker built-in container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi/example_usage.ipynb).
|
| 205 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom docker container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb).
|
|
@@ -307,7 +307,7 @@ print(result)
|
|
| 307 |
```
|
| 308 |
|
| 309 |
|
| 310 |
-
## How to Serve
|
| 311 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
| 312 |
|
| 313 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/vllm/example_usage.ipynb).
|
|
|
|
| 81 |
- **Contact:** [GitHub issues](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/issues)
|
| 82 |
- **Inference Code** [Github Repo](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/)
|
| 83 |
|
| 84 |
+
## How to Use MistralLite from Python Code (HuggingFace transformers) ##
|
| 85 |
|
| 86 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/huggingface-transformers/example_usage.ipynb).
|
| 87 |
|
|
|
|
| 132 |
<|prompter|>What are the main challenges to support a long context for LLM?</s><|assistant|>
|
| 133 |
```
|
| 134 |
|
| 135 |
+
## How to Serve MistralLite on TGI ##
|
| 136 |
**Important:**
|
| 137 |
- For an end-to-end example Jupyter notebook using the native TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi/example_usage.ipynb).
|
| 138 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom TGI container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/tgi-custom/example_usage.ipynb).
|
|
|
|
| 199 |
**Important** - When using MistralLite for inference for the first time, it may require a brief 'warm-up' period that can take 10s of seconds. However, subsequent inferences should be faster and return results in a more timely manner. This warm-up period is normal and should not affect the overall performance of the system once the initialisation period has been completed.
|
| 200 |
|
| 201 |
|
| 202 |
+
## How to Deploy MistralLite on Amazon SageMaker ##
|
| 203 |
**Important:**
|
| 204 |
- For an end-to-end example Jupyter notebook using the SageMaker built-in container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi/example_usage.ipynb).
|
| 205 |
- If the **input context length is greater than 12K tokens**, it is recommended using a custom docker container, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/sagemaker-tgi-custom/example_usage.ipynb).
|
|
|
|
| 307 |
```
|
| 308 |
|
| 309 |
|
| 310 |
+
## How to Serve MistralLite on vLLM ##
|
| 311 |
Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
|
| 312 |
|
| 313 |
**Important** - For an end-to-end example Jupyter notebook, please refer to [this link](https://github.com/awslabs/extending-the-context-length-of-open-source-llms/blob/main/MistralLite/vllm/example_usage.ipynb).
|