{ "model_id": "1bitLLM/bitnet_b1_58-large", "downloads": 10843, "tags": [ "transformers", "safetensors", "llama", "text-generation", "arxiv:2402.17764", "license:mit", "autotrain_compatible", "text-generation-inference", "endpoints_compatible", "region:us" ], "description": "--- license: mit --- This is a reproduction of the paper. The models are trained with for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following