Sidharthan commited on
Commit
cc79560
ยท
verified ยท
1 Parent(s): 0ed2b3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -22
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  license: apache-2.0
3
  language:
@@ -10,6 +11,7 @@ tags:
10
  - pytorch
11
  - text-generation
12
  - openwebtext
 
13
  ---
14
 
15
  # Q-MoE-400
@@ -18,6 +20,39 @@ tags:
18
 
19
  This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## ๐ŸŽฏ Project Goal
22
 
23
  The primary goal of the Q-MoE project is to investigate:
@@ -31,7 +66,7 @@ The model was evaluated at step **79,100**. The final validation metrics indicat
31
 
32
  | Metric | Value | Description |
33
  | :--- | :--- | :--- |
34
- | **Step** | 90,000 | Total training steps |
35
  | **Train Loss** | 3.2190 | Total training loss (CE + Aux) |
36
  | **Train CE** | 3.0987 | Cross-Entropy loss on training data |
37
  | **Val Loss** | 3.2028 | Total validation loss |
@@ -41,7 +76,7 @@ The model was evaluated at step **79,100**. The final validation metrics indicat
41
 
42
  ### Training Progress
43
 
44
- ![image](https://cdn-uploads.huggingface.co/production/uploads/64054e5e0ab5e22719fc179f/CALqiEjv1HahbLnZrbLPi.png)
45
 
46
  ## ๐Ÿ“ Generation Example
47
 
@@ -57,24 +92,6 @@ The following example demonstrates the model's generation capabilities after tra
57
  > While this is an obvious disadvantage for software development, the reality is that there are many aspects of software that are highly important to a programmer's day-to-day life. This is why even a moderately experienced programmer should never be concerned about this.
58
  >
59
  > The best way to learn about your code is by going through its source code. That way, it's always safe to do something new when writing code. This gives your programmer freedom and confidence.
60
- >
61
- > One of the most popular techniques for coding small functions in your computer is "code reuse." The same technique can be used by programmers in any number of different ways. Some programmers might write code to get the job done, and others develop it to get it to the end. They use the same tools as most programmers to get the job done.
62
- >
63
- > The best way to learn about your code is by going through its source code. That way, it's always safe to do something new when writing code. That technique can help you get the job done. It provides a way to write code that is easy to understand and maintain, and makes debugging easier.
64
-
65
- ## ๐Ÿ› ๏ธ Repository Contents
66
-
67
- This repository contains checkpoints compatible with both major frameworks:
68
- - **JAX/Flax:** The original training checkpoints (Orbit/Orbax format).
69
- - **PyTorch:** Converted weights for easier integration with the Hugging Face ecosystem (Safetensors).
70
-
71
- ## ๐Ÿ’ป Inference & Usage
72
-
73
- For inference code, architectural details, training pipeline and conversion scripts, please visit the official GitHub repository:
74
-
75
- ๐Ÿ‘‰ **[https://github.com/sidharth72/Q-MoE-400 ]**
76
-
77
- To run the model, you will likely need the custom modeling code provided in the GitHub repo, as this uses a specialized sparse MoE architecture.
78
 
79
  ## โš™๏ธ Training Details
80
 
@@ -95,5 +112,6 @@ If you find this model or the associated research useful, please cite:
95
  year = {2025},
96
  publisher = {Hugging Face},
97
  journal = {Hugging Face Repository},
98
- howpublished = {\url{[https://huggingface.co/QuarkML/Q-MoE-400](https://huggingface.co/QuarkML/Q-MoE-400)}}
99
- }
 
 
1
+
2
  ---
3
  license: apache-2.0
4
  language:
 
11
  - pytorch
12
  - text-generation
13
  - openwebtext
14
+ - custom_code
15
  ---
16
 
17
  # Q-MoE-400
 
20
 
21
  This model serves as a research artifact for studying the compute efficiency of sparse architectures compared to dense transformers. It demonstrates how routing mechanisms can enable high-capacity models with lower inference costs.
22
 
23
+ ## ๐Ÿ’ป Usage
24
+
25
+ You can use this model directly with the Hugging Face `transformers` library. Since this model uses a custom architecture, `trust_remote_code=True` is required.
26
+
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForCausalLM
29
+ import torch
30
+
31
+ path = "QuarkML/Q-MoE-400"
32
+
33
+ # Load tokenizer and model
34
+ tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ path,
37
+ trust_remote_code=True,
38
+ dtype=torch.float16, # optional but recommended for GPU
39
+ device_map="auto" # automatically maps to available device (CUDA/CPU)
40
+ )
41
+
42
+ # Generate text
43
+ inputs = tok("Artificial Neural network are ", return_tensors="pt")
44
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
45
+
46
+ out = model.generate(
47
+ **inputs,
48
+ max_new_tokens=50,
49
+ do_sample=True,
50
+ temperature=0.8
51
+ )
52
+
53
+ print(tok.decode(out[0], skip_special_tokens=True))
54
+ ```
55
+
56
  ## ๐ŸŽฏ Project Goal
57
 
58
  The primary goal of the Q-MoE project is to investigate:
 
66
 
67
  | Metric | Value | Description |
68
  | :--- | :--- | :--- |
69
+ | **Step** | 79,100 | Total training steps |
70
  | **Train Loss** | 3.2190 | Total training loss (CE + Aux) |
71
  | **Train CE** | 3.0987 | Cross-Entropy loss on training data |
72
  | **Val Loss** | 3.2028 | Total validation loss |
 
76
 
77
  ### Training Progress
78
 
79
+ ![Training Curve](https://cdn-uploads.huggingface.co/production/uploads/64054e5e0ab5e22719fc179f/CALqiEjv1HahbLnZrbLPi.png)
80
 
81
  ## ๐Ÿ“ Generation Example
82
 
 
92
  > While this is an obvious disadvantage for software development, the reality is that there are many aspects of software that are highly important to a programmer's day-to-day life. This is why even a moderately experienced programmer should never be concerned about this.
93
  >
94
  > The best way to learn about your code is by going through its source code. That way, it's always safe to do something new when writing code. This gives your programmer freedom and confidence.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
  ## โš™๏ธ Training Details
97
 
 
112
  year = {2025},
113
  publisher = {Hugging Face},
114
  journal = {Hugging Face Repository},
115
+ howpublished = {\url{https://huggingface.co/QuarkML/Q-MoE-400}}
116
+ }
117
+ ```