| | --- |
| | license: other |
| | license_name: hyperclovax-seed |
| | license_link: LICENSE |
| | base_model: |
| | - exp-models/HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied |
| | --- |
| | |
| |  |
| |
|
| | ## Overview |
| |
|
| | HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied is based on a model developed by NAVER that can understand and generate text. |
| | It demonstrates competitive performance on major benchmarks related to Korean language and culture. In addition, it supports a context length of up to 16k tokens, enabling it to handle a wide range of tasks. |
| |
|
| | ## Basic Information |
| |
|
| | - Model Architecture: Transformer-based architecture (Dense Model) |
| | - Number of Parameters: 3.26B |
| | - Input/Output Format: Text / Text (both input and output are in text format) |
| | - Context Length: 16k |
| | - Knowledge Cutoff Date: The model was trained on data prior to August 2024. |
| |
|
| |
|
| | ## Training and Data |
| |
|
| | The training data for HyperCLOVA-X-SEED-Vision-Instruct-3B-Llamafied consists of diverse sources, including high-quality datasets. The training process was carried out in four main stages: Pretraining Stage 1, where the model learns from a large volume of documents; Pretraining Stage 2, which focuses on additional training with high-quality data; Rejection sampling Fine-Tuning (RFT), aimed at enhancing the modelโs knowledge across various domains and its complex reasoning abilities; and Supervised Fine-Tuning (SFT), which improves the modelโs instruction-following capabilities. Furthermore, due to the characteristics of smaller models, vulnerability to long-context handling was observed. To address this, reinforcement for long-context understanding was incorporated from the pretraining stages through to the SFT stage, enabling the model to stably support context lengths of up to 16k tokens. |
| |
|
| | ## Huggingface Usage Example |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | model = AutoModelForCausalLM.from_pretrained("/path/to/ckpt") |
| | tokenizer = AutoTokenizer.from_pretrained("/path/to/ckpt") |
| | |
| | chat = [ |
| | {"role": "tool_list", "content": ""}, |
| | {"role": "system", "content": "- AI ์ธ์ด๋ชจ๋ธ์ ์ด๋ฆ์ \"CLOVA X\" ์ด๋ฉฐ ๋ค์ด๋ฒ์์ ๋ง๋ค์๋ค.\n- ์ค๋์ 2025๋
04์ 24์ผ(๋ชฉ)์ด๋ค."}, |
| | {"role": "user", "content": "์๋ขฐ๋ฉ๊ฑฐ ๋ฐฉ์ ์๊ณผ ์์์ญํ์ ๊ด๊ณ๋ฅผ ์ต๋ํ ์์ธํ ์๋ ค์ค."}, |
| | ] |
| | |
| | inputs = tokenizer.apply_chat_template(chat, add_generation_prompt=True, return_dict=True, return_tensors="pt") |
| | output_ids = model.generate(**inputs, max_length=1024, stop_strings=["<|endofturn|>", "<|stop|>"], tokenizer=tokenizer) |
| | print(tokenizer.batch_decode(output_ids)) |
| | ``` |