This is a d-Matrix functional reference of the Llama-3.1-70B model. The reference provides the following functional configurations:
| Configuration | Explanation |
|---|---|
BASELINE |
a reference functionally equivalent to the original model |
BASIC |
all linear algebraic operands quantized to MXINT8-64 |
Usage
Install d-Matrix Dmx_Compressor first.
pip install dmx_compressor
The following is an example of how to instantiate the model. Note that this repo only contains custom modeling code and still uses weights from the official model repo.
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
from dmx.compressor import DmxModel
import torch
model_name = "d-matrix/Llama-3.1-70B"
official_model = "meta-llama/Llama-3.1-70B"
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
official_model,
config=config,
trust_remote_code=True,
device_map="auto",
)
model = DmxModel.from_torch(model)
x = torch.rand(1, 1024)
model(x)
- Downloads last month
- 22
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- perplexity (BASELINE) on Wikitextself-reported3.193
- perplexity (BASIC) on Wikitextself-reported3.238