marksverdhei commited on
Commit
02419e3
·
verified ·
1 Parent(s): 82cc0ee

Fix license to MIT

Browse files
Files changed (1) hide show
  1. README.md +49 -17
README.md CHANGED
@@ -1,27 +1,46 @@
1
  ---
2
- license: other
3
- license_name: glm-4
4
- license_link: https://huggingface.co/THUDM/glm-4-9b/blob/main/LICENSE
 
 
5
  base_model: zai-org/GLM-4.7-Flash
6
- base_model_relation: quantized
7
  tags:
8
- - fp8
9
- - quantized
10
- - glm4
11
- library_name: transformers
 
12
  ---
13
 
14
- # GLM-4.7-Flash FP8 (Work in progress)
15
 
16
- This is an FP8 (E4M3) quantized version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash).
17
 
18
- ## Quantization Details
19
 
20
- - **Format**: FP8 E4M3
21
- - **Quantized layers**: All Linear layers except embeddings and layer norms
22
- - **Scale storage**: Per-tensor scales stored alongside weights
23
 
24
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ```python
27
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -33,9 +52,22 @@ model = AutoModelForCausalLM.from_pretrained(
33
  device_map="auto",
34
  trust_remote_code=True
35
  )
36
- tokenizer = AutoTokenizer.from_pretrained("marksverdhei/GLM-4.7-Flash-fp8", trust_remote_code=True)
 
 
 
37
  ```
38
 
39
  ## Original Model
40
 
41
- See the original model card at [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) for full details on capabilities and usage.
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ library_name: transformers
6
+ license: mit
7
  base_model: zai-org/GLM-4.7-Flash
 
8
  tags:
9
+ - fp8
10
+ - quantized
11
+ - glm4
12
+ - vllm
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
+ # GLM-4.7-Flash FP8
17
 
18
+ FP8 quantized version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) for efficient inference.
19
 
20
+ ## Model Details
21
 
22
+ - **Base Model**: zai-org/GLM-4.7-Flash
23
+ - **Quantization**: FP8 (E4M3) weight quantization
24
+ - **Architecture**: GLM-4 MoE Lite (47 layers)
25
 
26
+ ## Usage with vLLM
27
+
28
+ ```python
29
+ from vllm import LLM, SamplingParams
30
+
31
+ llm = LLM(
32
+ model="marksverdhei/GLM-4.7-Flash-fp8",
33
+ trust_remote_code=True,
34
+ dtype="bfloat16",
35
+ quantization="fp8"
36
+ )
37
+
38
+ sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
39
+ outputs = llm.generate(["Hello, how are you?"], sampling_params)
40
+ print(outputs[0].outputs[0].text)
41
+ ```
42
+
43
+ ## Usage with Transformers
44
 
45
  ```python
46
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
52
  device_map="auto",
53
  trust_remote_code=True
54
  )
55
+ tokenizer = AutoTokenizer.from_pretrained(
56
+ "marksverdhei/GLM-4.7-Flash-fp8",
57
+ trust_remote_code=True
58
+ )
59
  ```
60
 
61
  ## Original Model
62
 
63
+ See the original model at [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) for full capabilities and benchmarks.
64
+
65
+ GLM-4.7 features improvements in:
66
+ - Core coding and agentic tasks
67
+ - UI/Vibe coding
68
+ - Tool using
69
+ - Complex reasoning
70
+
71
+ ## License
72
+
73
+ MIT License (same as base model)