GLM-4.6V-Flash / README.md
erfanzar's picture
Update README.md (EasyDeL auto-generated)
a7c9145 verified
---
library_name: easydel
pipeline_tag: text-generation
tags:
- easydel
- jax
- "glm4v"
- "CausalLM"
- "blocksparse"
---
<p align="center">
<img alt="easydel" src="https://raw.githubusercontent.com/erfanzar/easydel/main/images/easydel-logo-with-text.png">
</p>
<h1 align="center">GLM-4.6V-Flash</h1>
<div align="center">
A model compatible with the EasyDeL JAX stack.
</div>
## Overview
This checkpoint is intended to be loaded with EasyDeL on JAX (CPU/GPU/TPU). It supports sharded loading with `auto_shard_model=True` and configurable precision via `dtype`, `param_dtype`, and `precision`.
## Quickstart
```python
import easydel as ed
from jax import numpy as jnp, lax
repo_id = "EasyDeL/GLM-4.6V-Flash"
dtype = jnp.bfloat16 # try jnp.float16 on many GPUs
model = ed.AutoEasyDeLModelForCausalLM.from_pretrained(
repo_id,
dtype=dtype,
param_dtype=dtype,
precision=lax.Precision("fastest"),
sharding_axis_names=("dp", "fsdp", "ep", "tp", "sp"),
sharding_axis_dims=(1, -1, 1, 1, 1),
config_kwargs=ed.EasyDeLBaseConfigDict(
attn_dtype=dtype,
attn_mechanism=ed.AttentionMechanisms.SPLASH,
fsdp_is_ep_bound=True,
sp_is_ep_bound=True,
moe_method=ed.MoEMethods.FUSED_MOE,
),
auto_shard_model=True,
partition_axis=ed.PartitionAxis(),
)
```
If the repository only provides PyTorch weights, pass `from_torch=True` to `from_pretrained(...)`.
## Sharding & Parallelism (Multi-Device)
EasyDeL can scale to multiple devices by creating a logical device mesh. Most EasyDeL loaders use a 5D mesh:
- `dp`: data parallel (replicated parameters, different batch shards)
- `fsdp`: parameter sharding (memory saver; often the biggest axis)
- `ep`: expert parallel (MoE; keep `1` for non-MoE models)
- `tp`: tensor parallel (splits large matmuls)
- `sp`: sequence parallel (splits sequence dimension)
Use `sharding_axis_names=("dp","fsdp","ep","tp","sp")` and choose `sharding_axis_dims` so that their product equals your device count.
You can use `-1` in `sharding_axis_dims` to let EasyDeL infer the remaining dimension.
<details>
<summary>Example sharding configs</summary>
```python
# 8 devices, pure FSDP
sharding_axis_dims = (1, 8, 1, 1, 1)
# 8 devices, 2-way DP x 4-way FSDP
sharding_axis_dims = (2, 4, 1, 1, 1)
# 8 devices, 4-way FSDP x 2-way TP
sharding_axis_dims = (1, 4, 1, 2, 1)
```
</details>
## Using via `eLargeModel` (ELM)
`eLargeModel` is a higher-level interface that wires together loading, sharding, training, and eSurge inference from a single config.
```python
from easydel import eLargeModel
repo_id = "EasyDeL/GLM-4.6V-Flash"
elm = eLargeModel.from_pretrained(repo_id) # task is auto-detected
elm.set_dtype("bf16")
elm.set_sharding(axis_names=("dp", "fsdp", "ep", "tp", "sp"), axis_dims=(1, -1, 1, 1, 1))
model = elm.build_model()
# Optional: build an inference engine
# engine = elm.build_esurge()
```
<details>
<summary>ELM YAML config example</summary>
```yaml
model:
name_or_path: "EasyDeL/GLM-4.6V-Flash"
loader:
dtype: bf16
param_dtype: bf16
sharding:
axis_dims: [1, -1, 1, 1, 1]
auto_shard_model: true
```
</details>
## Features
**EasyDeL:**
- JAX native implementation and sharded execution
- Configurable attention backends via `AttentionMechanisms.*`
- Precision control via `dtype`, `param_dtype`, and `precision`
## Installation
```bash
pip install easydel
```
## Links
- EasyDeL GitHub: https://github.com/erfanzar/EasyDeL
- Docs: https://easydel.readthedocs.io/en/latest/
## Supported Tasks
- CausalLM
## Limitations
- Refer to the original model card for training data, evaluation, and intended use.
## License
EasyDeL is released under the Apache-2.0 license. The license for this model's weights may differ; please consult the original repository.
## Citation
```bibtex
@misc{Zare Chavoshi_2023,
title={EasyDeL: An open-source library for enhancing and streamlining the training process of machine learning models},
url={https://github.com/erfanzar/EasyDeL},
author={Zare Chavoshi, Erfan},
year={2023}
}
```