This is a d-Matrix functional reference of the Llama-3.1-70B model. The reference provides the following functional configurations:

Configuration	Explanation
`BASELINE`	a reference functionally equivalent to the original model
`BASIC`	all linear algebraic operands quantized to `MXINT8-64`

Usage

Install d-Matrix Dmx_Compressor first.

pip install dmx_compressor

The following is an example of how to instantiate the model. Note that this repo only contains custom modeling code and still uses weights from the official model repo.

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
from dmx.compressor import DmxModel
import torch

model_name = "d-matrix/Llama-3.1-70B"
official_model = "meta-llama/Llama-3.1-70B"
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    official_model,
    config=config,
    trust_remote_code=True,
    device_map="auto",
)
model = DmxModel.from_torch(model)
x = torch.rand(1, 1024)
model(x)

Downloads last month: 22

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

perplexity (BASELINE) on Wikitext
self-reported

3.193
perplexity (BASIC) on Wikitext
self-reported

3.238