--- language: - ko - en base_model: - LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct pipeline_tag: text-generation tags: - llm - exaone - instruction-tuned - quantized - awq - vllm - medical --- # Exaone3.5-7.8B_ReST_V0_Quantized This model is a fine-tuned and AWQ-quantized version of EXAONE 3.5 7.8B (Instruct), optimized for efficient inference and structured text generation. ## Overview - Base Model: EXAONE 3.5 7.8B (Instruct) - Fine-tuning: Supervised fine-tuning on domain-specific data - Quantization: 4-bit AWQ - Inference: Optimized for vLLM - Context Length: up to 32K tokens ## Model Details - Architecture: ExaoneForCausalLM - Hidden Size: 4096 - Layers: 32 - Attention Heads: 32 - Max Position Embeddings: 32768 - Quantization: 4-bit AWQ - Torch dtype: float16 ## Intended Use - Instruction-based text generation - Structured output generation (JSON) - LLM-based data pipelines - RAG systems - Efficient inference ## Example Usage ```python from vllm import LLM, SamplingParams llm = LLM( model="cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized", quantization="AWQ", ) sampling_params = SamplingParams( temperature=0.2, top_p=0.8, max_tokens=1024, ) outputs = llm.generate(["Your prompt here"], sampling_params) print(outputs[0].outputs[0].text) ``` ## Training Fine-tuned using supervised learning on domain-specific data. Dataset is not included due to privacy constraints. ## Limitations - May produce incorrect outputs - Sensitive to prompt quality - Domain bias may exist ## Safety Not intended for critical decision-making without human validation. ## Evaluation - BLEU - ROUGE ## Deployment Optimized for vLLM and GPU-efficient inference.