Model Card for MiniGPT-FR

This is the model card for MiniGPT-FR, a French causal language model trained from scratch.
The model is designed for free-form text generation and completion in French.


Model Details

Developed by: Independent experimental project
Funded by: Self-funded
Model type: Decoder-only Transformer language model
Language(s): French
License: Apache License 2.0

MiniGPT-FR is a medium-sized causal language model trained to learn the structure and statistical regularities of the French language.
The model focuses on fluency, syntactic correctness, and stylistic consistency rather than factual accuracy or instruction-following.

It is not instruction-tuned and should be considered a base language model.


Architecture

MiniGPT-FR uses a standard dense Transformer decoder architecture with modern components commonly found in contemporary LLMs.

Component Value
Architecture Decoder-only Transformer
Number of Parameters ~60M
Context Length 256 tokens
Number of Layers 20
Embedding Size 640
FFN Hidden Size 2560
Number of Attention Heads 10
Activation Function SwiGLU
Position Encodings RoPE
Weight Sharing FFN weight sharing
Layer Normalization LayerNorm
Tied Embeddings Yes

Pre-training

MiniGPT-FR was trained using next-token prediction on French text corpora.

The training process followed a curriculum learning strategy with a progressive increase in dataset size in order to stabilize language acquisition.

Training overview

  • Training objective: next-token prediction
  • Language: French
  • Dataset size (final): ~200k text samples
  • Optimizer: AdamW
  • Learning rate schedule: Cosine decay with warmup
  • Validation metric: Cross-entropy loss

The model was trained from scratch and does not rely on pretrained multilingual checkpoints.


Post-training

This model was not post-trained.

There is currently:

  • no instruction tuning
  • no preference alignment
  • no reinforcement learning from human feedback (RLHF)

As a result, the model should not be expected to reliably follow instructions or provide factually accurate answers.


Capabilities

MiniGPT-FR is capable of:

  • Generating fluent French text
  • Completing prompts and sentences
  • Producing descriptive and encyclopedic-style paragraphs
  • Simple paraphrasing and reformulation

Known limitations

  • No instruction-following behavior
  • Susceptible to factual hallucinations
  • Limited reasoning capabilities
  • Short context window (256 tokens)

Run the model

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Houzeric/MiniGPT-FR"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True
)

prompt = "Il est principalement connu pour"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.8,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Tokenizer

Tokenizer used: camembert-base
Shared vocabulary
Padding aligned with the EOS token


License

This model is distributed under the Apache 2.0 License.


Disclaimer

This model is provided for research and experimentation purposes only.
The generated texts may be inaccurate, incomplete, or incoherent.
No guarantees are provided regarding the accuracy of the produced information.


Credits

Model developed and trained independently as an experimental project, with a focus on progressive French language learning and the optimization of medium-scale language models.

Downloads last month
72
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Houzeric/mini-gpt-french 1