theprint commited on
Commit
7f5430b
·
verified ·
1 Parent(s): e612519

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +125 -0
  2. config.json +97 -0
  3. generation_config.json +13 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-3-4b-it
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ language: en
6
+ license: apache-2.0
7
+ tags:
8
+ - lora
9
+ - sft
10
+ - transformers
11
+ - trl
12
+ - unsloth
13
+ - fine-tuned
14
+ datasets:
15
+ - theprint/Zeth
16
+ ---
17
+ # Zeth-Gemma3-4B
18
+
19
+ A fine-tuned Gemma3 4B model, specialized in pragmatic empathy, or perhaps it is empathic pragmatism?
20
+
21
+ ## Model Details
22
+
23
+ This model is a fine-tuned version of google/gemma-3-4b-it using the Unsloth framework with LoRA (Low-Rank Adaptation) for efficient training.
24
+
25
+ - **Developed by:** theprint
26
+ - **Model type:** Causal Language Model (Fine-tuned with LoRA)
27
+ - **Language:** en
28
+ - **License:** apache-2.0
29
+ - **Base model:** google/gemma-3-4b-it
30
+ - **Fine-tuning method:** LoRA with rank 128
31
+
32
+ ## Intended Use
33
+
34
+ Conversation, brainstorming, and general instruction following
35
+
36
+ ## Training Details
37
+
38
+ ### Training Data
39
+
40
+ The Zeth data set was specifically created for finetuning models on empathic explanation. This was done by taking premade data sets and rewording the replies to be in line with the style for Zeth.
41
+
42
+ - **Dataset:** theprint/Zeth
43
+ - **Format:** alpaca
44
+
45
+ ### Training Procedure
46
+
47
+ - **Training epochs:** 3
48
+ - **LoRA rank:** 128
49
+ - **Learning rate:** 0.0002
50
+ - **Batch size:** 4
51
+ - **Framework:** Unsloth + transformers + PEFT
52
+ - **Hardware:** NVIDIA RTX 5090
53
+
54
+ ## Usage
55
+
56
+ ```python
57
+ from unsloth import FastLanguageModel
58
+ import torch
59
+
60
+ # Load model and tokenizer
61
+ model, tokenizer = FastLanguageModel.from_pretrained(
62
+ model_name="theprint/Zeth-Gemma3-4B",
63
+ max_seq_length=4096,
64
+ dtype=None,
65
+ load_in_4bit=True,
66
+ )
67
+
68
+ # Enable inference mode
69
+ FastLanguageModel.for_inference(model)
70
+
71
+ # Example usage
72
+ inputs = tokenizer(["Your prompt here"], return_tensors="pt")
73
+ outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
74
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
75
+ print(response)
76
+ ```
77
+
78
+ ### Alternative Usage (Standard Transformers)
79
+
80
+ ```python
81
+ from transformers import AutoModelForCausalLM, AutoTokenizer
82
+ import torch
83
+
84
+ model = AutoModelForCausalLM.from_pretrained(
85
+ "theprint/Zeth-Gemma3-4B",
86
+ torch_dtype=torch.float16,
87
+ device_map="auto"
88
+ )
89
+ tokenizer = AutoTokenizer.from_pretrained("theprint/Zeth-Gemma3-4B")
90
+
91
+ # Example usage
92
+ messages = [
93
+ {"role": "system", "content": "You are a helpful assistant."},
94
+ {"role": "user", "content": "Your question here"}
95
+ ]
96
+
97
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
98
+ outputs = model.generate(inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
99
+ response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
100
+ print(response)
101
+ ```
102
+ ## Limitations
103
+
104
+ May hallucinate or provide incorrect information.
105
+
106
+ ## Citation
107
+
108
+ If you use this model, please cite:
109
+
110
+ ```bibtex
111
+ @misc{zeth_gemma3_4b,
112
+ title={Zeth-Gemma3-4B: Fine-tuned google/gemma-3-4b-it},
113
+ author={theprint},
114
+ year={2025},
115
+ publisher={Hugging Face},
116
+ url={https://huggingface.co/theprint/Zeth-Gemma3-4B}
117
+ }
118
+ ```
119
+
120
+ ## Acknowledgments
121
+
122
+ - Base model: [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
123
+ - Training dataset: [theprint/Zeth](https://huggingface.co/datasets/theprint/Zeth)
124
+ - Fine-tuning framework: [Unsloth](https://github.com/unslothai/unsloth)
125
+ - Quantization: [llama.cpp](https://github.com/ggerganov/llama.cpp)
config.json ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Gemma3ForConditionalGeneration"
4
+ ],
5
+ "boi_token_index": 255999,
6
+ "eoi_token_index": 256000,
7
+ "eos_token_id": [
8
+ 1,
9
+ 106
10
+ ],
11
+ "image_token_index": 262144,
12
+ "initializer_range": 0.02,
13
+ "mm_tokens_per_image": 256,
14
+ "model_type": "gemma3",
15
+ "text_config": {
16
+ "_sliding_window_pattern": 6,
17
+ "attention_bias": false,
18
+ "attention_dropout": 0.0,
19
+ "attn_logit_softcapping": null,
20
+ "final_logit_softcapping": null,
21
+ "head_dim": 256,
22
+ "hidden_activation": "gelu_pytorch_tanh",
23
+ "hidden_size": 2560,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 10240,
26
+ "layer_types": [
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "sliding_attention",
31
+ "sliding_attention",
32
+ "full_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "sliding_attention",
37
+ "sliding_attention",
38
+ "full_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "sliding_attention",
42
+ "sliding_attention",
43
+ "sliding_attention",
44
+ "full_attention",
45
+ "sliding_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "sliding_attention",
49
+ "sliding_attention",
50
+ "full_attention",
51
+ "sliding_attention",
52
+ "sliding_attention",
53
+ "sliding_attention",
54
+ "sliding_attention",
55
+ "sliding_attention",
56
+ "full_attention",
57
+ "sliding_attention",
58
+ "sliding_attention",
59
+ "sliding_attention",
60
+ "sliding_attention"
61
+ ],
62
+ "max_position_embeddings": 131072,
63
+ "model_type": "gemma3_text",
64
+ "num_attention_heads": 8,
65
+ "num_hidden_layers": 34,
66
+ "num_key_value_heads": 4,
67
+ "query_pre_attn_scalar": 256,
68
+ "rms_norm_eps": 1e-06,
69
+ "rope_local_base_freq": 10000.0,
70
+ "rope_scaling": {
71
+ "factor": 8.0,
72
+ "rope_type": "linear"
73
+ },
74
+ "rope_theta": 1000000.0,
75
+ "sliding_window": 1024,
76
+ "torch_dtype": "float16",
77
+ "use_cache": true,
78
+ "vocab_size": 262208
79
+ },
80
+ "torch_dtype": "float16",
81
+ "transformers_version": "4.53.2",
82
+ "vision_config": {
83
+ "attention_dropout": 0.0,
84
+ "hidden_act": "gelu_pytorch_tanh",
85
+ "hidden_size": 1152,
86
+ "image_size": 896,
87
+ "intermediate_size": 4304,
88
+ "layer_norm_eps": 1e-06,
89
+ "model_type": "siglip_vision_model",
90
+ "num_attention_heads": 16,
91
+ "num_channels": 3,
92
+ "num_hidden_layers": 27,
93
+ "patch_size": 14,
94
+ "torch_dtype": "float16",
95
+ "vision_use_head": false
96
+ }
97
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "cache_implementation": "hybrid",
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 106
8
+ ],
9
+ "pad_token_id": 0,
10
+ "top_k": 64,
11
+ "top_p": 0.95,
12
+ "transformers_version": "4.53.2"
13
+ }