Severian
/

Jamba-Hercules

Text Generation

4-bit precision

Model card Files Files and versions

Severian commited on Apr 2, 2024

Commit

8a199a7

·

verified ·

1 Parent(s): 7f2bcbe

Update README.md

Files changed (1) hide show

README.md +8 -6

README.md CHANGED Viewed

@@ -3,17 +3,19 @@ license: apache-2.0
 tags:
 - jamba
 datasets:
-- teknium/OpenHermes-2.5
 base_model: ai21labs/Jamba-v0.1
 pipeline_tag: text-generation
 ---
-# Jamba-Open-Hermes
 <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">
-# Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
 ---
 ## Example Output:
@@ -97,11 +99,11 @@ print(tokenizer.batch_decode(outputs)[0])
 ## Training
-### **Open-Hermes-2.0:**
 **FIRST TEST:**
-- *1000 Steps (5 hours x A100)*
-- *Final Loss: 3.48*
 ### Hyperparameters

 tags:
 - jamba
 datasets:
+- Locutusque/hercules-v4.0
 base_model: ai21labs/Jamba-v0.1
 pipeline_tag: text-generation
 ---
+# Jamba-Hercules
 <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">
+# *Name was changed from Open-Hermes to Hercules. During multiple trainings and testings with lots of different datasets, I found that Jamba has BY FAR reacted the best to this dataset. It contains Open-Hermes-2.0 examples but offers A LOT more in diversity and complexity. Thanks to @Locutusque for the amazing work!
+## Datset used: Locutusque/hercules-v4.0
+*- First 10k Examples*
 ---
 ## Example Output:
 ## Training
+### **Hercules-v4.0:**
 **FIRST TEST:**
+- *1250 Steps (5 hours x A100)*
+- *Final Loss: 0.98*
 ### Hyperparameters