Model Card for MiniGPT-FR
This is the model card for MiniGPT-FR, a French causal language model trained from scratch.
The model is designed for free-form text generation and completion in French.
Model Details
Developed by: Independent experimental project
Funded by: Self-funded
Model type: Decoder-only Transformer language model
Language(s): French
License: Apache License 2.0
MiniGPT-FR is a medium-sized causal language model trained to learn the structure and statistical regularities of the French language.
The model focuses on fluency, syntactic correctness, and stylistic consistency rather than factual accuracy or instruction-following.
It is not instruction-tuned and should be considered a base language model.
Architecture
MiniGPT-FR uses a standard dense Transformer decoder architecture with modern components commonly found in contemporary LLMs.
| Component | Value |
|---|---|
| Architecture | Decoder-only Transformer |
| Number of Parameters | ~60M |
| Context Length | 256 tokens |
| Number of Layers | 20 |
| Embedding Size | 640 |
| FFN Hidden Size | 2560 |
| Number of Attention Heads | 10 |
| Activation Function | SwiGLU |
| Position Encodings | RoPE |
| Weight Sharing | FFN weight sharing |
| Layer Normalization | LayerNorm |
| Tied Embeddings | Yes |
Pre-training
MiniGPT-FR was trained using next-token prediction on French text corpora.
The training process followed a curriculum learning strategy with a progressive increase in dataset size in order to stabilize language acquisition.
Training overview
- Training objective: next-token prediction
- Language: French
- Dataset size (final): ~200k text samples
- Optimizer: AdamW
- Learning rate schedule: Cosine decay with warmup
- Validation metric: Cross-entropy loss
The model was trained from scratch and does not rely on pretrained multilingual checkpoints.
Post-training
This model was not post-trained.
There is currently:
- no instruction tuning
- no preference alignment
- no reinforcement learning from human feedback (RLHF)
As a result, the model should not be expected to reliably follow instructions or provide factually accurate answers.
Capabilities
MiniGPT-FR is capable of:
- Generating fluent French text
- Completing prompts and sentences
- Producing descriptive and encyclopedic-style paragraphs
- Simple paraphrasing and reformulation
Known limitations
- No instruction-following behavior
- Susceptible to factual hallucinations
- Limited reasoning capabilities
- Short context window (256 tokens)
Run the model
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Houzeric/MiniGPT-FR"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True
)
prompt = "Il est principalement connu pour"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.8,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Tokenizer
Tokenizer used: camembert-base
Shared vocabulary
Padding aligned with the EOS token
License
This model is distributed under the Apache 2.0 License.
Disclaimer
This model is provided for research and experimentation purposes only.
The generated texts may be inaccurate, incomplete, or incoherent.
No guarantees are provided regarding the accuracy of the produced information.
Credits
Model developed and trained independently as an experimental project, with a focus on progressive French language learning and the optimization of medium-scale language models.
- Downloads last month
- 72