Elvisaro commited on
Commit
29efa29
·
verified ·
1 Parent(s): 8e70ea9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +188 -1
README.md CHANGED
@@ -325,4 +325,191 @@ def convert_instella_model():
325
  raise
326
 
327
  if __name__ == "__main__":
328
- convert_instella_model()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
325
  raise
326
 
327
  if __name__ == "__main__":
328
+ convert_instella_model()
329
+
330
+ # Documentation
331
+
332
+ # Instella Model Conversion to GGUF Format Documentation
333
+
334
+ ## Overview
335
+
336
+ This script converts Instella models from the Hugging Face format (safetensors) to GGUF format with float16 precision for use with llama.cpp and other compatible inference engines. The conversion preserves the model architecture while ensuring compatibility with GGUF-based inference systems.
337
+
338
+ ## Script Structure
339
+
340
+ The script consists of two main parts:
341
+
342
+ 1. **convert_instella_bf16.py**: The main script that orchestrates the conversion process
343
+ 2. **convert_instella_f16.py**: The generated conversion script that performs the actual conversion
344
+
345
+ ## Requirements
346
+
347
+ - Python 3.8+
348
+ - PyTorch
349
+ - NumPy
350
+ - safetensors
351
+
352
+ ## Usage
353
+
354
+ ```bash
355
+ python convert_instella_bf16.py
356
+ ```
357
+
358
+ This will:
359
+ 1. Install required dependencies
360
+ 2. Generate the conversion script
361
+ 3. Convert the model in the "huggintuned" directory
362
+ 4. Save the output as "huggintuned/model.gguf"
363
+
364
+ For custom paths, modify the `model_dir` and `output_path` variables in the `convert_instella_model()` function.
365
+
366
+ ## Detailed Function Documentation
367
+
368
+ ### convert_instella_bf16.py
369
+
370
+ #### `create_instella_conversion_script()`
371
+
372
+ Generates the conversion script file with all necessary functions for GGUF conversion.
373
+
374
+ **Returns:**
375
+ - `str`: Path to the generated script file
376
+
377
+ #### `convert_instella_model()`
378
+
379
+ Main function that orchestrates the conversion process.
380
+
381
+ 1. Installs required dependencies
382
+ 2. Generates the conversion script
383
+ 3. Sets input and output paths
384
+ 4. Runs the conversion
385
+ 5. Verifies the output file
386
+
387
+ ### convert_instella_f16.py (Generated Script)
388
+
389
+ #### `write_gguf_header(f, num_tensors, num_kv)`
390
+
391
+ Writes the GGUF header to the output file.
392
+
393
+ **Parameters:**
394
+ - `f`: File object for writing
395
+ - `num_tensors`: Number of tensors in the model
396
+ - `num_kv`: Number of metadata key-value pairs
397
+
398
+ #### `write_metadata_kv(f, key, val_type, val)`
399
+
400
+ Writes a metadata key-value pair to the output file.
401
+
402
+ **Parameters:**
403
+ - `f`: File object for writing
404
+ - `key`: Metadata key name
405
+ - `val_type`: GGUF type identifier
406
+ - `val`: Value to write
407
+
408
+ #### `write_tensor_info(f, name, tensor)`
409
+
410
+ Writes tensor information (name, shape, type) to the output file.
411
+
412
+ **Parameters:**
413
+ - `f`: File object for writing
414
+ - `name`: Tensor name
415
+ - `tensor`: PyTorch tensor
416
+
417
+ #### `write_tensor_data(f, tensor)`
418
+
419
+ Writes tensor data to the output file, converting to float16 format.
420
+
421
+ **Parameters:**
422
+ - `f`: File object for writing
423
+ - `tensor`: PyTorch tensor
424
+
425
+ #### `map_tensor_name(name)`
426
+
427
+ Maps Hugging Face tensor names to GGUF tensor names.
428
+
429
+ **Parameters:**
430
+ - `name`: Original tensor name
431
+
432
+ **Returns:**
433
+ - Mapped tensor name for GGUF format
434
+
435
+ #### `get_model_metadata(config_path)`
436
+
437
+ Builds metadata for the GGUF model based on the Instella configuration.
438
+
439
+ **Parameters:**
440
+ - `config_path`: Path to the model's config.json file
441
+
442
+ **Returns:**
443
+ - Dictionary of metadata key-value pairs
444
+
445
+ #### `convert_model(model_dir, output_path)`
446
+
447
+ Main conversion function that processes the model and writes the GGUF file.
448
+
449
+ **Parameters:**
450
+ - `model_dir`: Directory containing the model files
451
+ - `output_path`: Path to save the GGUF model
452
+
453
+ ## Model Architecture Parameters
454
+
455
+ The script handles the following Instella model parameters:
456
+
457
+ | Parameter | Default Value | Description |
458
+ |-----------|---------------|-------------|
459
+ | vocab_size | 50304 | Vocabulary size |
460
+ | hidden_size | 4096 | Dimension of hidden representations |
461
+ | intermediate_size | 11008 | Dimension of MLP representations |
462
+ | num_hidden_layers | 32 | Number of transformer layers |
463
+ | num_attention_heads | 32 | Number of attention heads |
464
+ | num_key_value_heads | 32 | Number of key/value heads for GQA |
465
+ | max_position_embeddings | 2048 | Maximum sequence length |
466
+ | rope_theta | 10000.0 | Base period of RoPE embeddings |
467
+ | rms_norm_eps | 1e-5 | Epsilon for RMS normalization |
468
+
469
+ ## Tensor Mapping
470
+
471
+ The script maps tensor names from Hugging Face format to GGUF format:
472
+
473
+ | Hugging Face Name | GGUF Name |
474
+ |-------------------|-----------|
475
+ | model.embed_tokens.weight | token_embd.weight |
476
+ | model.norm.weight | output_norm.weight |
477
+ | lm_head.weight | output.weight |
478
+ | model.layers.{n}.self_attn.q_proj.weight | blk.{n}.attn_q.weight |
479
+ | model.layers.{n}.self_attn.k_proj.weight | blk.{n}.attn_k.weight |
480
+ | model.layers.{n}.self_attn.v_proj.weight | blk.{n}.attn_v.weight |
481
+ | model.layers.{n}.self_attn.o_proj.weight | blk.{n}.attn_output.weight |
482
+ | model.layers.{n}.mlp.gate_proj.weight | blk.{n}.ffn_gate.weight |
483
+ | model.layers.{n}.mlp.up_proj.weight | blk.{n}.ffn_up.weight |
484
+ | model.layers.{n}.mlp.down_proj.weight | blk.{n}.ffn_down.weight |
485
+ | model.layers.{n}.input_layernorm.weight | blk.{n}.attn_norm.weight |
486
+ | model.layers.{n}.post_attention_layernorm.weight | blk.{n}.ffn_norm.weight |
487
+
488
+ ## Precision Handling
489
+
490
+ The script handles bfloat16 precision models by:
491
+
492
+ 1. Loading the original tensors (which may be in bfloat16)
493
+ 2. Converting to float32 for processing
494
+ 3. Converting to float16 for GGUF compatibility
495
+ 4. Writing the data in binary format
496
+
497
+ ## Error Handling
498
+
499
+ The script includes error handling for:
500
+ - Missing model files
501
+ - Config file parsing errors
502
+ - Conversion process errors
503
+ - Output file verification
504
+
505
+ ## Notes
506
+
507
+ - The script is specifically designed for Instella models but may work with similar architectures
508
+ - The default parameters are based on the Instella2Config defaults
509
+ - The script automatically detects and uses the model's configuration when available
510
+
511
+ ## Limitations
512
+
513
+ - Only supports safetensors format (not PyTorch .bin files)
514
+ - Does not support quantization (outputs float16 precision only)
515
+ - May require adjustments for significantly different model architectures