learning-unit
/

L1-16B-A3B

Text Generation

mixture-of-experts

Model card Files Files and versions

jeffkang-lunit commited on 6 days ago

Commit

2d5fea2

·

verified ·

1 Parent(s): 3e93565

Delete README.md with huggingface_hub

Files changed (1) hide show

README.md +0 -69

README.md DELETED Viewed

@@ -1,69 +0,0 @@
----
-license: apache-2.0
-language:
-  - en
-tags:
-  - medical
-  - clinical
-  - moe
-  - mixture-of-experts
-  - gravity-moe
-  - sft
-library_name: transformers
-pipeline_tag: text-generation
----
-# L1
-L1 is a clinical language model built on the **GravityMoE** (Mixture-of-Experts) architecture, fine-tuned for medical and clinical decision support tasks.
-## Model Details
-| Property | Value |
-|---|---|
-| Architecture | GravityMoE (Mixture-of-Experts) |
-| Total Parameters | ~16B |
-| Active Parameters | ~4.5B per token |
-| Routed Experts | 64 |
-| Shared Experts | 1 |
-| Experts per Token | 8 |
-| Hidden Size | 2048 |
-| Layers | 28 |
-| Attention Heads | 16 |
-| KV LoRA Rank | 512 |
-| Max Context Length | 32,768 tokens |
-| Precision | bfloat16 |
-| Vocab Size | 151,552 |
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "learning-unit/L1"
-tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype="auto",
-    device_map="auto",
-    trust_remote_code=True,
-)
-messages = [
-    {"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
-]
-inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
-inputs = inputs.to(model.device)
-outputs = model.generate(inputs, max_new_tokens=512)
-print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
-```
-## Training
-- **Method**: Supervised Fine-Tuning (SFT)
-- **Epochs**: 3
-- **Final Training Loss**: 0.247
-## License
-Apache 2.0