File size: 1,723 Bytes
0a52707
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
language:
- ko
- en
base_model:
- LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
pipeline_tag: text-generation
tags:
- llm
- exaone
- instruction-tuned
- quantized
- awq
- vllm
- medical
---

# Exaone3.5-7.8B_ReST_V0_Quantized

This model is a fine-tuned and AWQ-quantized version of EXAONE 3.5 7.8B (Instruct), optimized for efficient inference and structured text generation.

## Overview

- Base Model: EXAONE 3.5 7.8B (Instruct)
- Fine-tuning: Supervised fine-tuning on domain-specific data
- Quantization: 4-bit AWQ
- Inference: Optimized for vLLM
- Context Length: up to 32K tokens

## Model Details

- Architecture: ExaoneForCausalLM  
- Hidden Size: 4096  
- Layers: 32  
- Attention Heads: 32  
- Max Position Embeddings: 32768  
- Quantization: 4-bit AWQ  
- Torch dtype: float16  

## Intended Use

- Instruction-based text generation  
- Structured output generation (JSON)  
- LLM-based data pipelines  
- RAG systems  
- Efficient inference  

## Example Usage

```python
from vllm import LLM, SamplingParams

llm = LLM(
    model="cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized",
    quantization="AWQ",
)

sampling_params = SamplingParams(
    temperature=0.2,
    top_p=0.8,
    max_tokens=1024,
)

outputs = llm.generate(["Your prompt here"], sampling_params)
print(outputs[0].outputs[0].text)
```

## Training

Fine-tuned using supervised learning on domain-specific data.  
Dataset is not included due to privacy constraints.

## Limitations

- May produce incorrect outputs  
- Sensitive to prompt quality  
- Domain bias may exist  

## Safety

Not intended for critical decision-making without human validation.

## Evaluation

- BLEU  
- ROUGE  

## Deployment

Optimized for vLLM and GPU-efficient inference.