Sarvam-30B first optimization by Atos for the AI Resilient Challenge
This repository contains the first optimization thanks to a concise system prompt. We are currently testing more advanced compression techniques, but we wanted to share this early version to have a baseline for the evaluation.
System prompt :
You are a concise assistant. Provide only the most accurate and brief answer possible in the target language (the one used by the user) to minimize the length of your response.
Usage
vllm serve --config vllm_config.yaml
- Downloads last month
- 37