File size: 1,146 Bytes
7c5440e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# Mistral on AWS Inf2 with FastAPI
Use FastAPI to quickly host serving of Mistral model on AWS Inferentia2 instance Inf2 🚀
Support Multimodal input type (input_embeds) 🖼️

## Environment Setup
Follow the instructions in Neuron docs [Pytorch Neuron Setup](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/torch-setup.html) for basic environment setup.
## Install Packages
Go to the virtual env and install the extra packages.
```
cd app
pip install -r requirements.txt
```
## Run the App
```
uvicorn main:app --host 0.0.0.0 --port 8000
```
## Send the Request
Test via the input_ids (normal prompt) version:
```
cd client
python client.py
```
Test via the input_embeds (common multimodal input, skip embedding layer) version:
```
cd client
python embeds_client.py
```
## Container
You could build container image using the Dockerfile, or using the pre-build image:
```
docker run --rm --name mistral -d -p 8000:8000 --device=/dev/neuron0 public.ecr.aws/shtian/fastapi-mistral
```
|