Peacemann commited on
Commit
309536f
·
verified ·
1 Parent(s): 555bb8b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -0
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-3-4b-it
3
+ tags:
4
+ - L-Mul,
5
+ - optimazation
6
+ - quantization
7
+ - text-generation
8
+ - research
9
+ - experimental
10
+ license: gemma
11
+ ---
12
+
13
+ # L-Mul Optimized: google/gemma-3-4b-it
14
+
15
+ This is a modified version of Google's [gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) model. The modification consists of replacing the standard attention mechanism with one that uses a custom, approximate matrix multiplication algorithm termed "L-Mul".
16
+
17
+ This work was performed as part of a research project to evaluate the performance and accuracy trade-offs of algorithmic substitutions in transformer architectures.
18
+
19
+ **This model is intended strictly for educational and scientific purposes.**
20
+
21
+ ## Model Description
22
+
23
+ The core architecture of `google/gemma-3-4b-it` is preserved. However, the standard `Gemma3Attention` modules have been dynamically replaced with a custom version that utilizes the `l_mul_attention` function for its core computations. This function is defined in the `lmul.py` file included in this repository.
24
+
25
+ - **Base Model:** [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
26
+ - **Modification:** Replacement of standard attention with L-Mul approximate attention.
27
+ - **Primary Use-Case:** Research and educational analysis of algorithmic impact on LLMs.
28
+
29
+ ## How to Get Started
30
+
31
+ To use this model, you must use the `trust_remote_code=True` flag when loading it. This is required to execute the custom `lmul.py` file that defines the new attention mechanism.
32
+
33
+ You can load the model directly from this repository using the `transformers` library:
34
+
35
+ ```python
36
+ from transformers import AutoTokenizer, AutoModelForCausalLM
37
+ import torch
38
+
39
+ # Define the repository ID for the specific model
40
+ repo_id = "Peacemann/google_gemma-3-4b-it-lmul-attention" # Replace with the correct repo ID if different
41
+
42
+ # Load the tokenizer and model, trusting the remote code to load lmul.py
43
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ repo_id,
46
+ trust_remote_code=True,
47
+ torch_dtype=torch.bfloat16,
48
+ device_map="auto",
49
+ )
50
+
51
+ # Example usage
52
+ prompt = "The L-Mul algorithm is an experimental method for..."
53
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
54
+ outputs = model.generate(**inputs, max_new_tokens=50)
55
+
56
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
57
+ ```
58
+
59
+ ## Intended Uses & Limitations
60
+
61
+ This model is intended for researchers and students exploring the internal workings of LLMs. It is a tool for visualizing and analyzing the effects of fundamental algorithmic changes.
62
+
63
+ **This model is NOT intended for any commercial or production application.**
64
+
65
+ The modification is experimental. The impact on the model's performance, safety alignment, accuracy, and potential for generating biased or harmful content is **unknown and untested**.
66
+
67
+ ## Licensing Information
68
+
69
+ The use of this model is subject to the original **Gemma 3 Community License**. By using this model, you agree to the terms outlined in the license.