Naphula commited on
Commit
567bab0
·
verified ·
1 Parent(s): 9371685

Upload 2 files

Browse files
Files changed (2) hide show
  1. fp32_to_bf16.py +45 -0
  2. model_tools.md +66 -63
fp32_to_bf16.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoModelForCausalLM, AutoTokenizer
3
+ import os
4
+
5
+ # --- YOU MUST UPDATE THESE TWO PATHS ---
6
+ # Path to the directory where your FP32 model is stored locally
7
+ # (Using r"" strings to ensure Windows backslashes are handled correctly)
8
+ input_dir = r"A:\LLM\.cache\huggingface\hub\models--wzhouad--gemma-2-9b-it-WPO-HB"
9
+
10
+ # Path to the directory where the converted BF16 model will be saved
11
+ output_dir = r"A:\LLM\.cache\huggingface\hub\models--wzhouad--gemma-2-9b-it-WPO-HB_BF16"
12
+ # -------------------------------------
13
+
14
+ # Make sure the output directory exists
15
+ if not os.path.exists(output_dir):
16
+ os.makedirs(output_dir)
17
+
18
+ # Load the tokenizer from the local path
19
+ print(f"Loading tokenizer from {input_dir}...")
20
+ tokenizer = AutoTokenizer.from_pretrained(input_dir)
21
+
22
+ # Load the model in FP32 from the local path
23
+ print(f"Loading FP32 model from {input_dir}...")
24
+ model = AutoModelForCausalLM.from_pretrained(
25
+ input_dir,
26
+ torch_dtype=torch.float32,
27
+ device_map="cpu"
28
+ # device_map="auto" # use this if you have enough GPU VRAM
29
+ )
30
+
31
+ # Convert the model to BF16 and save it to the new local directory
32
+ print("Converting model to BF16 and saving to disk...")
33
+
34
+ # CHANGE: .half() is for float16. We use .to(torch.bfloat16) for bfloat16.
35
+ model = model.to(torch.bfloat16)
36
+
37
+ model.save_pretrained(
38
+ output_dir,
39
+ safe_serialization=True,
40
+ max_shard_size="5GB"
41
+ )
42
+ tokenizer.save_pretrained(output_dir)
43
+
44
+ print(f"Model successfully converted and saved to {output_dir}")
45
+ print("You can now use this new BF16 model in your mergekit config.yaml.")
model_tools.md CHANGED
@@ -1,64 +1,67 @@
1
- ---
2
- title: Model Tools
3
- emoji: 📚
4
- colorFrom: pink
5
- colorTo: yellow
6
- sdk: static
7
- pinned: false
8
- ---
9
-
10
- # Model Tools by Naphula
11
- Tools to enhance LLM quantizations and merging
12
-
13
- # [graph_v18.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py)
14
- - Merge models in minutes instead of hours on low VRAM. For a 3060/3060 Ti user: This script enables functionality that is otherwise impossible (merging 70B models or large 7B merges with `--cuda`) without OOM. [More details here](https://huggingface.co/spaces/Naphula/model_tools/blob/main/mergekit_low-VRAM-graph_patch.md)
15
- - Update: v18 is much faster than v4 and replaces the trial-and-error loop with an adaptive math-based calculator (using GrimJim's measure.py logic)
16
-
17
- # config.py
18
- - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
19
-
20
- # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
21
- - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
22
-
23
- # [fp32_to_fp16.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/fp32_to_fp16.py)
24
- - Converts FP32 to FP16 safetensors
25
-
26
- # [textonly_ripper_v2.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/textonly_ripper_v2.py)
27
- - Converts a sharded, multimodal (text and vision) model into a text-only version. Readme at [textonly_ripper.md](https://huggingface.co/spaces/Naphula/model_tools/blob/main/textonly_ripper.md)
28
-
29
- # [vocab_resizer.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/vocab_resizer.py)
30
- - Converts models with larger vocab_sizes to a standard size (default 131072 Mistral 24B) for use with mergekit. Note that `tokenizer.model` must be manually copied into the `/fixed/` folder.
31
-
32
- # [lm_head_remover.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/lm_head_remover.py)
33
- - This script will load a "fat" 18.9GB model (default Gemma 9B), force it to tie the weights (deduplicating the lm_head), and re-save it. This will drop the file size to ~17.2GB and make it compatible with the others.
34
-
35
- # [model_index_json_generator.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/model_index_json_generator.py)
36
- - Generates a missing `model.safetensors.index.json` file. Useful for cases where safetensors may have been sharded at the wrong size.
37
-
38
- # [folder_content_combiner_anyfiles.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/folder_content_combiner_anyfiles.py)
39
- - Combines all files in the script's current directory into a single output file, sorted alphabetically.
40
-
41
- # [GGUF Repo Suite](https://huggingface.co/spaces/Naphula/gguf-repo-suite)
42
- - Create and quantize Hugging Face models
43
-
44
- # [Markdown Viewer](https://huggingface.co/spaces/Naphula/Portable_Offline_Markdown_Viewer)
45
- - Portable Offline Markdown Viewer
46
-
47
- # [Markdown to SMF](https://huggingface.co/spaces/Naphula/model_tools/blob/main/md_to_smf.py)
48
- - Converts a Markdown string to an SMF-compatible BBCode string. Not perfect—sometimes misses double bold tags.
49
-
50
- # [Quant Clone](https://github.com/electroglyph/quant_clone)
51
- - A tool which allows you to recreate UD quants such as Q8_K_XL. Examples: [Mistral 24B](https://huggingface.co/spaces/Naphula/model_tools/raw/main/Mistral-Small-3.2-24B-Instruct-2506-UD-Q8_K_XL_UD.txt), [Mistral 7B](https://huggingface.co/spaces/Naphula/model_tools/raw/main/Warlock-7B-v2-Q8_K_XL.txt)
52
-
53
- # [Text Analysis Suite v1.5](https://huggingface.co/spaces/Naphula/TAS_1.5)
54
- - Analyze text files with advanced metrics
55
-
56
- ---
57
-
58
- # Not Functional
59
-
60
- # [Failed Experiment gguf_to_safetensors_v2.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/gguf_to_safetensors_v2.py)
61
- - Unsuccessful attempt by Gemini to patch the gguf_to_safetensors script. Missing json files are hard to reconstruct. Also see [safetensors_meta_ripper_v1.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/safetensors_meta_ripper_v1.py) and [tokenizer_ripper_v1.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/tokenizer_ripper_v1.py)
62
-
63
- # [IQ5_NL.md](https://huggingface.co/spaces/Naphula/model_tools/blob/main/IQ5_NL.md)
 
 
 
64
  - Note: Not functional yet. Includes the code needed to quantize IQ5_NL GGUFs using block size 32.
 
1
+ ---
2
+ title: Model Tools
3
+ emoji: 📚
4
+ colorFrom: pink
5
+ colorTo: yellow
6
+ sdk: static
7
+ pinned: false
8
+ ---
9
+
10
+ # Model Tools by Naphula
11
+ Tools to enhance LLM quantizations and merging
12
+
13
+ # [graph_v18.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py)
14
+ - Merge models in minutes instead of hours on low VRAM. For a 3060/3060 Ti user: This script enables functionality that is otherwise impossible (merging 70B models or large 7B merges with `--cuda`) without OOM. [More details here](https://huggingface.co/spaces/Naphula/model_tools/blob/main/mergekit_low-VRAM-graph_patch.md)
15
+ - Update: v18 is much faster than v4 and replaces the trial-and-error loop with an adaptive math-based calculator (using GrimJim's measure.py logic)
16
+
17
+ # config.py
18
+ - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
19
+
20
+ # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
21
+ - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
22
+
23
+ # [fp32_to_bf16.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/fp32_to_bf16.py)
24
+ - Converts FP32 to BF16 safetensors
25
+
26
+ # [fp32_to_fp16.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/fp32_to_fp16.py)
27
+ - Converts FP32 to FP16 safetensors
28
+
29
+ # [textonly_ripper_v2.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/textonly_ripper_v2.py)
30
+ - Converts a sharded, multimodal (text and vision) model into a text-only version. Readme at [textonly_ripper.md](https://huggingface.co/spaces/Naphula/model_tools/blob/main/textonly_ripper.md)
31
+
32
+ # [vocab_resizer.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/vocab_resizer.py)
33
+ - Converts models with larger vocab_sizes to a standard size (default 131072 Mistral 24B) for use with mergekit. Note that `tokenizer.model` must be manually copied into the `/fixed/` folder.
34
+
35
+ # [lm_head_remover.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/lm_head_remover.py)
36
+ - This script will load a "fat" 18.9GB model (default Gemma 9B), force it to tie the weights (deduplicating the lm_head), and re-save it. This will drop the file size to ~17.2GB and make it compatible with the others.
37
+
38
+ # [model_index_json_generator.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/model_index_json_generator.py)
39
+ - Generates a missing `model.safetensors.index.json` file. Useful for cases where safetensors may have been sharded at the wrong size.
40
+
41
+ # [folder_content_combiner_anyfiles.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/folder_content_combiner_anyfiles.py)
42
+ - Combines all files in the script's current directory into a single output file, sorted alphabetically.
43
+
44
+ # [GGUF Repo Suite](https://huggingface.co/spaces/Naphula/gguf-repo-suite)
45
+ - Create and quantize Hugging Face models
46
+
47
+ # [Markdown Viewer](https://huggingface.co/spaces/Naphula/Portable_Offline_Markdown_Viewer)
48
+ - Portable Offline Markdown Viewer
49
+
50
+ # [Markdown to SMF](https://huggingface.co/spaces/Naphula/model_tools/blob/main/md_to_smf.py)
51
+ - Converts a Markdown string to an SMF-compatible BBCode string. Not perfect—sometimes misses double bold tags.
52
+
53
+ # [Quant Clone](https://github.com/electroglyph/quant_clone)
54
+ - A tool which allows you to recreate UD quants such as Q8_K_XL. Examples: [Mistral 24B](https://huggingface.co/spaces/Naphula/model_tools/raw/main/Mistral-Small-3.2-24B-Instruct-2506-UD-Q8_K_XL_UD.txt), [Mistral 7B](https://huggingface.co/spaces/Naphula/model_tools/raw/main/Warlock-7B-v2-Q8_K_XL.txt)
55
+
56
+ # [Text Analysis Suite v1.5](https://huggingface.co/spaces/Naphula/TAS_1.5)
57
+ - Analyze text files with advanced metrics
58
+
59
+ ---
60
+
61
+ # Not Functional
62
+
63
+ # [Failed Experiment gguf_to_safetensors_v2.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/gguf_to_safetensors_v2.py)
64
+ - Unsuccessful attempt by Gemini to patch the gguf_to_safetensors script. Missing json files are hard to reconstruct. Also see [safetensors_meta_ripper_v1.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/safetensors_meta_ripper_v1.py) and [tokenizer_ripper_v1.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/tokenizer_ripper_v1.py)
65
+
66
+ # [IQ5_NL.md](https://huggingface.co/spaces/Naphula/model_tools/blob/main/IQ5_NL.md)
67
  - Note: Not functional yet. Includes the code needed to quantize IQ5_NL GGUFs using block size 32.