cococoomo
/

Exaone3.5-7.8B_ReST_V0_Quantized

Text Generation

instruction-tuned

4-bit precision

Model card Files Files and versions

Exaone3.5-7.8B_ReST_V0_Quantized / README.md

cococoomo's picture

Create README.md

0a52707 verified about 2 months ago

|

history blame contribute delete

1.72 kB

	---
	language:
	- ko
	- en
	base_model:
	- LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct
	pipeline_tag: text-generation
	tags:
	- llm
	- exaone
	- instruction-tuned
	- quantized
	- awq
	- vllm
	- medical
	---

	# Exaone3.5-7.8B_ReST_V0_Quantized

	This model is a fine-tuned and AWQ-quantized version of EXAONE 3.5 7.8B (Instruct), optimized for efficient inference and structured text generation.

	## Overview

	- Base Model: EXAONE 3.5 7.8B (Instruct)
	- Fine-tuning: Supervised fine-tuning on domain-specific data
	- Quantization: 4-bit AWQ
	- Inference: Optimized for vLLM
	- Context Length: up to 32K tokens

	## Model Details

	- Architecture: ExaoneForCausalLM
	- Hidden Size: 4096
	- Layers: 32
	- Attention Heads: 32
	- Max Position Embeddings: 32768
	- Quantization: 4-bit AWQ
	- Torch dtype: float16

	## Intended Use

	- Instruction-based text generation
	- Structured output generation (JSON)
	- LLM-based data pipelines
	- RAG systems
	- Efficient inference

	## Example Usage

	```python
	from vllm import LLM, SamplingParams

	llm = LLM(
	model="cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized",
	quantization="AWQ",
	)

	sampling_params = SamplingParams(
	temperature=0.2,
	top_p=0.8,
	max_tokens=1024,
	)

	outputs = llm.generate(["Your prompt here"], sampling_params)
	print(outputs[0].outputs[0].text)
	```

	## Training

	Fine-tuned using supervised learning on domain-specific data.
	Dataset is not included due to privacy constraints.

	## Limitations

	- May produce incorrect outputs
	- Sensitive to prompt quality
	- Domain bias may exist

	## Safety

	Not intended for critical decision-making without human validation.

	## Evaluation

	- BLEU
	- ROUGE

	## Deployment

	Optimized for vLLM and GPU-efficient inference.