geetu040
/

Timer-S1-quantized-4bit

Time Series Forecasting

text-generation

foundation models

pretrained models

time series foundation models

4-bit precision

Model card Files Files and versions

geetu040 commited on 3 days ago

Commit

f3a92fc

·

verified ·

1 Parent(s): f8acd96

Add files using upload-large-folder tool

Files changed (1) hide show

README.md +6 -2

README.md CHANGED Viewed

@@ -55,12 +55,12 @@ The checkpoint configuration records the following quantization settings:
   "quant_method": "bitsandbytes",
   "bnb_4bit_quant_type": "fp4",
   "bnb_4bit_quant_storage": "uint8",
-  "bnb_4bit_compute_dtype": "float32",
   "bnb_4bit_use_double_quant": false
 }
 ```
-The model config also sets `use_cache` to `false`, matching the local quantized checkpoint.
 ## Quickstart
@@ -82,6 +82,10 @@ model = AutoModelForCausalLM.from_pretrained(
     device_map="auto",
 )
 batch_size, lookback_length = 1, 2880
 seqs = torch.randn(batch_size, lookback_length).to(model.device)

   "quant_method": "bitsandbytes",
   "bnb_4bit_quant_type": "fp4",
   "bnb_4bit_quant_storage": "uint8",
+  "bnb_4bit_compute_dtype": "bfloat16",
   "bnb_4bit_use_double_quant": false
 }
 ```
+The model config also sets `use_cache` to `true`, matching the local quantized checkpoint. For lower memory usage during generation, set `model.config.use_cache = False` after loading the model.
 ## Quickstart
     device_map="auto",
 )
+# Optional: reduce generation memory usage by disabling the KV cache.
+# This can be useful on smaller GPUs or for longer lookback windows.
+model.config.use_cache = False
 batch_size, lookback_length = 1, 2880
 seqs = torch.randn(batch_size, lookback_length).to(model.device)