| | |
| |
|
| | This directory contains the outputs of a fine-tuned large language model. Below is an explanation of each file: |
| |
|
| | |
| |
|
| | - **added_tokens.json** |
| | Contains any custom tokens that were added to the tokenizer beyond the base vocabulary (e.g., special domain-specific words or symbols). |
| |
|
| | - **config.json** |
| | Stores the model architecture and hyperparameters (e.g., number of layers, hidden size, attention heads). This is needed to reload the model correctly. |
| |
|
| | - **merges.txt** |
| | Used by Byte-Pair Encoding (BPE) tokenizers. Contains merge rules for combining subword units into larger tokens. |
| |
|
| | - **model.safetensors** |
| | The main file containing the model’s learned weights in the safetensors format (a safe and fast alternative to PyTorch’s .bin format). |
| |
|
| | - **model.safetensors.index.json** |
| | An index file for the safetensors weights, used when the model is sharded or split across multiple files. |
| |
|
| | - **special_tokens_map.json** |
| | Maps special tokens (like [CLS], [SEP], [PAD], etc.) to their corresponding IDs in the tokenizer. |
| |
|
| | - **spm.model** |
| | SentencePiece model file, used for tokenization if SentencePiece is the tokenizer (common in multilingual or T5-style models). |
| |
|
| | - **tokenizer_config.json** |
| | Stores configuration settings for the tokenizer (e.g., lowercasing, normalization, special token handling). |
| |
|
| | - **tokenizer.json** |
| | Contains the full tokenizer vocabulary and rules in a single JSON file, often used for fast loading. |
| |
|
| | - **vocab.json** |
| | The vocabulary file mapping tokens to their integer IDs (used by some tokenizers, especially BPE). |
| |
|
| | - **vocab.txt** |
| | A plain text vocabulary file, listing all tokens (used by some tokenizers, especially WordPiece). |
| |
|
| | |
| |
|
| | These files together allow you to reload the fine-tuned model, preprocess text in the same way as during training, and ensure compatibility with downstream tasks. |
| |
|