Model Card for MiniGPT-FR

This is the model card for MiniGPT-FR, a French causal language model trained from scratch.
The model is designed for free-form text generation and completion in French.

Model Details

Developed by: Independent experimental project
Funded by: Self-funded
Model type: Decoder-only Transformer language model
Language(s): French
License: Apache License 2.0

MiniGPT-FR is a medium-sized causal language model trained to learn the structure and statistical regularities of the French language.
The model focuses on fluency, syntactic correctness, and stylistic consistency rather than factual accuracy or instruction-following.

It is not instruction-tuned and should be considered a base language model.

Architecture

MiniGPT-FR uses a standard dense Transformer decoder architecture with modern components commonly found in contemporary LLMs.

Component	Value
Architecture	Decoder-only Transformer
Number of Parameters	~60M
Context Length	256 tokens
Number of Layers	20
Embedding Size	640
FFN Hidden Size	2560
Number of Attention Heads	10
Activation Function	SwiGLU
Position Encodings	RoPE
Weight Sharing	FFN weight sharing
Layer Normalization	LayerNorm
Tied Embeddings	Yes

Pre-training

MiniGPT-FR was trained using next-token prediction on French text corpora.

The training process followed a curriculum learning strategy with a progressive increase in dataset size in order to stabilize language acquisition.

Training overview

Training objective: next-token prediction
Language: French
Dataset size (final): ~200k text samples
Optimizer: AdamW
Learning rate schedule: Cosine decay with warmup
Validation metric: Cross-entropy loss

The model was trained from scratch and does not rely on pretrained multilingual checkpoints.

Post-training

This model was not post-trained.

There is currently:

no instruction tuning
no preference alignment
no reinforcement learning from human feedback (RLHF)

As a result, the model should not be expected to reliably follow instructions or provide factually accurate answers.

Capabilities

MiniGPT-FR is capable of:

Generating fluent French text
Completing prompts and sentences
Producing descriptive and encyclopedic-style paragraphs
Simple paraphrasing and reformulation

Known limitations

No instruction-following behavior
Susceptible to factual hallucinations
Limited reasoning capabilities
Short context window (256 tokens)

Run the model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Houzeric/MiniGPT-FR"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

prompt = "Il est principalement connu pour"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tokenizer

Tokenizer used: camembert-base
Shared vocabulary
Padding aligned with the EOS token

License

This model is distributed under the Apache 2.0 License.

Disclaimer

This model is provided for research and experimentation purposes only.
The generated texts may be inaccurate, incomplete, or incoherent.
No guarantees are provided regarding the accuracy of the produced information.

Credits

Model developed and trained independently as an experimental project, with a focus on progressive French language learning and the optimization of medium-scale language models.

Downloads last month: 6

Houzeric
/

mini-gpt-french