{
"model_id": "1bitLLM/bitnet_b1_58-large",
"downloads": 10843,
"tags": [
"transformers",
"safetensors",
"llama",
"text-generation",
"arxiv:2402.17764",
"license:mit",
"autotrain_compatible",
"text-generation-inference",
"endpoints_compatible",
"region:us"
],
"description": "--- license: mit --- This is a reproduction of the paper. The models are trained with for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following