Configuration Parsing Warning: Config file tokenizer_config.json cannot be fetched (too big)

See axolotl config

axolotl version: 0.13.0.dev0

# Axolotl config for NeuTTS Danish fine-tuning (FIXED)
# Key changes: disabled sample_packing, more epochs, higher LR

base_model: syvai/plapre-base
model_type: LlamaForCausalLM

# Pre-tokenized dataset
datasets:
  - path: syvai/danish-tts-voice-cloning-tokenized
    ds_type: json
    type:

val_set_size: 0.01

# Output
output_dir: ./outputs/neutts-danish-v2
dataset_prepared_path: last_run_prepared_v2

# Sequence length
sequence_len: 2048

# Training hyperparameters - adjusted
learning_rate: 1e-4
lr_scheduler: cosine
warmup_ratio: 0.03
num_epochs: 3
micro_batch_size: 2
gradient_accumulation_steps: 16

# Memory optimization
bf16: true
tf32: true
gradient_checkpointing: true

resume_from_checkpoint:
logging_steps: 10
flash_attention: true

# Optimizer
optimizer: adamw_bnb_8bit
weight_decay: 0.01

# Logging & saving
save_steps: 5000
eval_steps: 5000
save_total_limit: 3

# wandb
wandb_project: tts
wandb_entity:
wandb_watch:
wandb_name: neutts-danish-v2
wandb_log_model:

# Sample packing is OK with flash_attention: true
# Flash attention uses cu_seqlens to prevent cross-attention between packed samples
sample_packing: true
pad_to_sequence_len: false


special_tokens:                                                                               
  eos_token: <|SPEECH_GENERATION_END|>

outputs/neutts-danish-v2

This model is a fine-tuned version of syvai/plapre-base on the syvai/danish-tts-voice-cloning-tokenized dataset. It achieves the following results on the evaluation set:

Loss: 6.9211
Ppl: 1013.4261
Memory/max Active (gib): 8.93
Memory/max Allocated (gib): 8.93
Memory/device Reserved (gib): 22.86

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 716
training_steps: 23889

Training results

Training Loss	Epoch	Step	Validation Loss	Ppl	Active (gib)	Allocated (gib)	Reserved (gib)
No log	0	0	8.6548	5737.6073	7.85	7.85	22.75
7.0324	0.6278	5000	7.0311	1131.2746	8.93	8.93	22.86
6.9324	1.2555	10000	6.9570	1050.5248	8.93	8.93	22.86
6.9077	1.8833	15000	6.9266	1018.9874	8.93	8.93	22.86
6.8949	2.5110	20000	6.9211	1013.4261	8.93	8.93	22.86

Framework versions

Transformers 4.57.6
Pytorch 2.9.1+cu128
Datasets 4.5.0
Tokenizers 0.22.2

Downloads last month: 10

Safetensors

Model size

0.2B params

Tensor type

BF16

Model tree for syvai/plapre-voice-clone

Base model

syvai/plapre-base

Finetuned

(1)

this model

syvai
/

plapre-voice-clone

outputs/neutts-danish-v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for syvai/plapre-voice-clone

Dataset used to train syvai/plapre-voice-clone