| | --- |
| | library_name: transformers |
| | tags: |
| | - text-generation-inference |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - amd/Instella-3B-Instruct |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # **Instella-3B-Instruct-Abliterated** |
| |
|
| | > The Instella models are text-only, autoregressive transformer-based LMs having 3 billion parameters. Architecture-wise, Instella is packed with 36 decoder layers, each having 32 attention heads. These models support a sequence length of up to 4,096 tokens and have a vocabulary size of ~50,000 tokens using the OLMo tokenizer. During both pre-training and fine-tuning, we utilized FlashAttention-2, Torch Compile, and bfloat16 mixed-precision training to reduce memory usage, leading to computational speedups and optimal resource utilization. To balance inter-node memory efficiency and intra-node communication overhead within our cluster, we employed fully sharded data parallelism (FSDP) with hybrid sharding, with model parameters, gradients, and optimizer states sharded within a node and replicated across the nodes. |
| |
|
| | ### Example Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | checkpoint = "prithivMLmods/Instella-3B-Instruct-abliterated" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True) |
| | |
| | prompt = [{"role": "user", "content": "What are the benefits of open-source AI research?"}] |
| | inputs = tokenizer.apply_chat_template( |
| | prompt, |
| | add_generation_prompt=True, |
| | return_tensors='pt' |
| | ) |
| | |
| | tokens = model.generate( |
| | inputs.to(model.device), |
| | max_new_tokens=1024, |
| | temperature=0.8, |
| | do_sample=True |
| | ) |
| | |
| | print(tokenizer.decode(tokens[0], skip_special_tokens=False)) |
| | ``` |
| |
|
| | > Overall, Instella-3B-Instruct excels in instruction following tasks and multi-turn QA tasks like TruthfulQA, GPQA, IFEval and MT-Bench, while being highly competitive compared to existing state-of-the-art open weight models on other knowledge recall and math benchmarks, while being trained on significantly fewer training tokens. |