daslab-testing
/

CloverLM

Text Generation

low-precision-training

Model card Files Files and versions

mansaripo commited on Mar 20

Commit

afd6f57

·

verified ·

1 Parent(s): ac62373

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -122,7 +122,7 @@ model = AutoModelForCausalLM.from_pretrained(
     "daslab-testing/CloverLM",
     trust_remote_code=True,
     dtype="bfloat16",
-    quartet_2_impl="quartet2",  # native NVFP4 kernel or "pseudoquant" on non-Blackwell GPUs
 ).to("cuda")  # for GPU usage or "cpu" for CPU usage
 tokenizer = AutoTokenizer.from_pretrained(
@@ -134,6 +134,7 @@ input_ids = tokenizer("The capital of France is", return_tensors="pt").input_ids
 output = model.generate(input_ids.to(model.device), max_new_tokens=32)
 print(tokenizer.decode(output[0]))
 ```
 ### Running Evaluations
@@ -164,7 +165,7 @@ Attention backend options: `pytorch` (default), `flash2`, `flash3`, `flash4`.
 - PyTorch 2.10+ with CUDA 13.0
 - `transformers ≥ 5.3.0`
 - `tokenmonster ≥ 1.1.12`
-- [Quartet II kernels](https://github.com/IST-DASLab/Quartet-II) (for native FP4; `pseudoquant` mode works without them)
 ## Architecture Details
@@ -190,8 +191,8 @@ The model uses 264 weight tensors totaling ~4.14 B parameters.
 @article{cloverlm2026,
   title   = {Speedrunning GPT3: Pretraining an OPT-175B-Quality Model Cheaply
              by Leveraging Native NVFP4},
-  author  = {Erik Schultheis and Matin Ansaripour and Andrei Panferov and
-             Georgios Vlassis and Dan Alistarh},
   year    = {2026},
 }
-```

     "daslab-testing/CloverLM",
     trust_remote_code=True,
     dtype="bfloat16",
+    quartet_2_impl="pseudoquant",  # on non-Blackwell GPUs or "quartet2" for native NVFP4 kernel
 ).to("cuda")  # for GPU usage or "cpu" for CPU usage
 tokenizer = AutoTokenizer.from_pretrained(
 output = model.generate(input_ids.to(model.device), max_new_tokens=32)
 print(tokenizer.decode(output[0]))
 ```
+Note that `quartet_2_impl="quartet2"` only supports inputs with `(micro_batch_size * seq_length) % 128 == 0`.
 ### Running Evaluations
 - PyTorch 2.10+ with CUDA 13.0
 - `transformers ≥ 5.3.0`
 - `tokenmonster ≥ 1.1.12`
+- [Quartet II kernels](https://github.com/IST-DASLab/Quartet-II)
 ## Architecture Details
 @article{cloverlm2026,
   title   = {Speedrunning GPT3: Pretraining an OPT-175B-Quality Model Cheaply
              by Leveraging Native NVFP4},
+  author  = {Erik Schultheis and Georgios Vlassis and Matin Ansaripour and
+             Andrei Panferov and Dan Alistarh},
   year    = {2026},
 }
+```