gemma-3n-E4B-joke55

Hyperparameter Value
Training Steps 100
Batch Size 1
Gradient Accumulation 4
Learning Rate 0.0002
LoRA Rank 8
Quantization 4-bit (Training) / q8_0 (GGUF Export)

Training Loss

The following chart illustrates the training loss curve, demonstrating the model's convergence over 100 steps:

Training Loss Curve

Example Output

Here is a sample generation from the model to demonstrate its capabilities:

User Prompt: Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

Compose a new joke in English about actors, entertainment, comedy involving actors, acting. Aim for humor that is clever, absurd, mild and based on clever twist and using ideas like status reversal, unexpected answer, absurd logic. Keep the tone conversational, deadpan, old-fashioned. Keep it self-contained in 4 sentences

Input:

Response:

Model Response: Why do actors always get the best seats at comedy shows?

Because they're the only ones who know how to act.


Model fine-tuned by Mathieu-Thomas-JOSSET using Unsloth.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mathieu-Thomas-JOSSET/gemma-3n-E4B-joke55-GGUF

Adapter
(7)
this model

Dataset used to train Mathieu-Thomas-JOSSET/gemma-3n-E4B-joke55-GGUF