Severian
/

Jamba-Hercules

Text Generation

4-bit precision

Model card Files Files and versions

Severian commited on Mar 31, 2024

Commit

88d4132

·

verified ·

1 Parent(s): b60c78a

Update README.md

Files changed (1) hide show

README.md +63 -2

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-license: mit
 tags:
 - jamba
 datasets:
@@ -7,4 +7,65 @@ datasets:
 pipeline_tag: text-generation
 ---
-# PLACEHOLDER - Currently training. This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here

 ---
+license: apache-2.0
 tags:
 - jamba
 datasets:
 pipeline_tag: text-generation
 ---
+# This is highly experimental and should be viewed as purely testing right now. Jamba has been very hard to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here
+---
+## Training
+### Open-Hermes-2.0 (Only first 1500 examples): **[ 1530/125193 4:46:45 < 386:48:08, 0.09 it/s, Epoch 0.01/1]**
+```py
+from trl import SFTTrainer
+import torch
+from peft import LoraConfig
+from transformers import AutoTokenizer, TrainingArguments
+from transformers import BitsAndBytesConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Initialize or load your tokenizer and model here
+tokenizer = AutoTokenizer.from_pretrained("ai21labs/Jamba-v0.1")
+tokenizer.padding_side = 'right'
+tokenizer.padding_side  = 'left'
+max_seq_length = 4096
+lora_config = LoraConfig(
+    r=8,
+    lora_alpha=16,
+    target_modules=["embed_tokens", "x_proj", "in_proj", "out_proj"],
+    lora_dropout=0.2,
+    task_type="CAUSAL_LM",
+    bias="none"
+)
+trainer = SFTTrainer(
+    model=model,
+    train_dataset=train_dataset,
+    dataset_text_field="text",
+    max_seq_length=max_seq_length,
+    tokenizer=tokenizer,
+    args=TrainingArguments(
+        num_train_epochs=1,
+        lr_scheduler_type='linear',
+        learning_rate=2e-5,
+        per_device_train_batch_size=1,
+        gradient_accumulation_steps=8,
+        gradient_checkpointing=True,
+        warmup_steps=10,
+        weight_decay=0.2,
+        fp16=not torch.cuda.is_bf16_supported(),
+        bf16=torch.cuda.is_bf16_supported(),
+        logging_steps=1,
+        save_steps=100,
+        output_dir="outputs",
+        optim="paged_adamw_8bit",
+        seed=42,
+    ),
+)
+# Set environment variables for PyTorch memory management
+import os
+os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128,expandable_segments:True"
+```