--- library_name: transformers tags: - text-generation-inference license: apache-2.0 language: - en base_model: - amd/Instella-3B-Instruct pipeline_tag: text-generation --- # **Instella-3B-Instruct-Abliterated** > The Instella models are text-only, autoregressive transformer-based LMs having 3 billion parameters. Architecture-wise, Instella is packed with 36 decoder layers, each having 32 attention heads. These models support a sequence length of up to 4,096 tokens and have a vocabulary size of ~50,000 tokens using the OLMo tokenizer. During both pre-training and fine-tuning, we utilized FlashAttention-2, Torch Compile, and bfloat16 mixed-precision training to reduce memory usage, leading to computational speedups and optimal resource utilization. To balance inter-node memory efficiency and intra-node communication overhead within our cluster, we employed fully sharded data parallelism (FSDP) with hybrid sharding, with model parameters, gradients, and optimizer states sharded within a node and replicated across the nodes. ### Example Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "prithivMLmods/Instella-3B-Instruct-abliterated" tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", trust_remote_code=True) prompt = [{"role": "user", "content": "What are the benefits of open-source AI research?"}] inputs = tokenizer.apply_chat_template( prompt, add_generation_prompt=True, return_tensors='pt' ) tokens = model.generate( inputs.to(model.device), max_new_tokens=1024, temperature=0.8, do_sample=True ) print(tokenizer.decode(tokens[0], skip_special_tokens=False)) ``` > Overall, Instella-3B-Instruct excels in instruction following tasks and multi-turn QA tasks like TruthfulQA, GPQA, IFEval and MT-Bench, while being highly competitive compared to existing state-of-the-art open weight models on other knowledge recall and math benchmarks, while being trained on significantly fewer training tokens.