danielpark
/

asp-9b-inst-base

Text Generation

Mixture of Experts

Model card Files Files and versions

danielpark commited on Apr 19, 2024

Commit

8ff94ab

·

verified ·

1 Parent(s): 5610aae

doc: update model cards

Files changed (1) hide show

README.md +7 -4

README.md CHANGED Viewed

@@ -7,11 +7,14 @@ tags:
 - moe
 ---
-# Jamba-v0.1-9B
-A dense version of [Jamba-v0.1](https://huggingface.co/ai21labs/Jamba-v0.1), which extracts the weights of the first expert.
-It no longer uses MoE. Please refer to [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) for details.
-It can use single 3090/4090 for inference, and the usage method is exactly the same as Jamba-v0.1.
 ---

 - moe
 ---
+### Required Weights for Follow-up Research
+The original model is **AI21lab's Jamba-v0.1**, which requires an **A100 80GB GPU**. Unfortunately, this was not available via Google Colab or cloud computing services. Attempts were made to perform **MoE (Mixture of Experts) splitting**, using the following resources as a basis:
+- **Base creation**: Referenced for subsequent tasks.
+- **MoE Layer Separation**: Consult [this script](https://github.com/TechxGenus/Jamba-utils/blob/main/dense_downcycling.py) from [TechxGenus/Jamba-v0.1-9B](https://huggingface.co/TechxGenus/Jamba-v0.1-9B).
 ---