nimafathi commited on
Commit
a11c2c2
·
verified ·
1 Parent(s): 7151033

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +138 -5
README.md CHANGED
@@ -1,9 +1,142 @@
1
  ---
 
 
2
  tags:
3
- - model_hub_mixin
4
- - pytorch_model_hub_mixin
 
 
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  tags:
5
+ - text-generation
6
+ - diffusion
7
+ - language-model
8
+ license: mit
9
  ---
10
 
11
+ # hdlm-group/hdlm-base-epsilon-0.0
12
+
13
+ This is a epsilon_hybrid diffusion language model trained on text data.
14
+
15
+ ## Model Details
16
+
17
+ - **Model Type**: epsilon_hybrid
18
+ - **Architecture**: Diffusion-based language model
19
+ - **Training Method**: Epsilon-hybrid diffusion training
20
+
21
+ ## Configuration
22
+
23
+ ```yaml
24
+ ngpus: 4
25
+ type: aligned
26
+ gradient_accumulation_steps: 2
27
+ tokenizer:
28
+ tokens: 50257
29
+ model: gpt2
30
+ training:
31
+ batch_size: 128
32
+ accum: ${gradient_accumulation_steps}
33
+ n_iters: 1250000
34
+ snapshot_freq: 10000
35
+ log_freq: 500
36
+ eval_freq: 10000
37
+ snapshot_freq_for_preemption: 3000
38
+ snapshot_sampling: true
39
+ ema: 0.9999
40
+ warmup_iter: -1
41
+ loss_type: hybrid
42
+ epsilon: 0.0
43
+ lambda: 0.0
44
+ data:
45
+ train: openwebtext-train
46
+ valid: wikitext103
47
+ cache_dir: /home/toolkit/research-diffcodegen/data
48
+ debug: false
49
+ graph:
50
+ type: absorb
51
+ gamma: 1.0
52
+ file: /home/toolkit/research-diffcodegen/data
53
+ report_all: false
54
+ expanded_sigma: true
55
+ noise:
56
+ type: loglinear
57
+ sigma_min: 0.0001
58
+ sigma_max: 2.0
59
+ ar_diffusion: false
60
+ expanded_sigma: ${graph.expanded_sigma}
61
+ sampling:
62
+ predictor: analytic
63
+ steps_per_level: 1
64
+ noise_removal: true
65
+ strategy: direct
66
+ strategy_param: 0.9
67
+ annealing:
68
+ type: none
69
+ efficient: false
70
+ width: 1024
71
+ tau: 1024
72
+ eval_tau: 1024
73
+ steps_per_level: ${sampling.steps_per_level}
74
+ sampling_method: sdlm
75
+ diffusion_loss_weight: 1.0
76
+ ce_loss_weight: 1.0
77
+ sampling_eps: 0.0001
78
+ attention:
79
+ context_type: block_causal
80
+ block_type: full
81
+ match_inference: false
82
+ eval:
83
+ batch_size: 16
84
+ perplexity: true
85
+ perplexity_batch_size: 8
86
+ optim:
87
+ weight_decay: 0.1
88
+ optimizer: AdamW
89
+ lr: 0.0002
90
+ beta1: 0.9
91
+ beta2: 0.95
92
+ eps: 1.0e-08
93
+ warmup: 10000
94
+ grad_clip: 1.0
95
+ scheduler: cosine
96
+ experiment:
97
+ name: MDLM
98
+ wandb_project: Hybrid-SDLM-ALIGNED
99
+ model:
100
+ name: HDLM
101
+ type: ddit
102
+ hidden_size: 768
103
+ cond_dim: 128
104
+ length: 1024
105
+ n_blocks: 12
106
+ n_heads: 12
107
+ dropout: 0.1
108
+ scale_by_sigma: false
109
+ transformer_sigma_conditioning: false
110
+ hybrid_sigma_embedding: false
111
+ post_process_logits: false
112
+ use_timestep_embedding: false
113
+ model_type: epsilon_hybrid
114
+
115
+ ```
116
+
117
+ ## Usage
118
+
119
+ ```python
120
+ from our.hf_utils import smart_model_loader
121
+
122
+ # Load the model
123
+ model, config, device, accelerator, metaschedule = smart_model_loader(
124
+ "hdlm-group/hdlm-base-epsilon-0.0",
125
+ model_type="epsilon_hybrid"
126
+ )
127
+
128
+ # Use the model for text generation
129
+ # (Add specific usage examples based on your model's capabilities)
130
+ ```
131
+
132
+ ## Training Details
133
+
134
+ This model was trained using the research-diffcodegen framework.
135
+
136
+ ## Citation
137
+
138
+ If you use this model in your research, please cite the original paper and this implementation.
139
+
140
+ ## License
141
+
142
+ This model is released under the MIT License.