Maxtimer97
/

GLM2NSA

Text Generation

Model card Files Files and versions

GLM2NSA / compressed_attention.py

Commit History

Added num stages tuning

42f4907

Maxtimer97 commited on Oct 6

Changed to autotune triton for 48G GPU deployment

4ee9d9e

Maxtimer97 commited on Oct 3

Removed assertion

22ba83b

Maxtimer97 commited on Oct 3

Removed wrong edge case assertion

bb26ab9

Maxtimer97 commited on Oct 3

Flattened repo

a2f57c7

Maxtimer97 commited on Sep 29