DeepSeek-based Models
Collection
3 items
•
Updated
4, 5 and 8-bit GGUF models for CPU+GPU inference
Use the following dataset to fine-tune deepseek-ai/deepseek-coder-6.7b in order to improve the model's reasoning and planning abilities.
context window length: 8192 max_tokens > 128 && < 8192
Total 185,193 samples 426 MB
50 samples/T=0.2/MaxTokens=512/Top_P=0.95
Code: https://github.com/uukuguy/speechless
This model accepts the Alpaca instruction format.
For example:
You are an intelligent programming assistant.
### Instruction:
Implement a linked list in C++
### Response:
| Metric | Value |
|---|---|
| humaneval-python |
CodeLlama-34B-Python: 53.29
CodeLlama-34B-Instruct: 50.79
CodeLlama-13B-Instruct: 50.6
CodeLlama-34B: 45.11
CodeLlama-13B-Python: 42.89
CodeLlama-13B: 35.07
0.314188
0.390111
| Metric | Value |
|---|---|
| ARC | |
| HellaSwag | |
| MMLU | |
| TruthfulQA | |
| Average |
4-bit
5-bit
8-bit