Chinese-Dragon-GPT1 🐉

Chinese-Dragon-GPT1 is a lightweight, 82.7-million parameter Transformer model trained from scratch on Chinese news data. It follows the GPT-1 (Post-LayerNorm) architecture and was optimized for efficient training and inference on consumer-grade hardware.

Model Details

  • Architecture: GPT-1 (Custom Small Configuration)
  • Parameters: 82,659,840
  • Layers (n_layer): 6
  • Hidden Dimension (n_embd): 768
  • Attention Heads (n_head): 12
  • Context Length (n_positions): 256
  • Vocabulary Size: 52,000 (Custom Byte-Level BPE)
  • Training Dataset: CLUE TNews (Chinese News headlines)

Training Story ⚙️

This model was trained as part of the MightyDragon-Dev project. Unlike most LLMs trained on massive GPU clusters, this model was trained on an Intel i5-10210U (ThinkPad X13 Laptop) using CPU-only optimization.

  • Training Time: ~15 hours
  • Total Steps: 12,500
  • Final Training Loss: ~0.13
  • Hardware: 4 Cores @ 2.2GHz Boost / 16GB RAM

How to Use

You can use this model directly with the Hugging Face pipeline API:

from transformers import pipeline

generator = pipeline(
    "text-generation", 
    model="MightyDragon-Dev/Chinese-Dragon-GPT1"
)

# Example Prompt
prompt = "据报道,"
result = generator(prompt, max_length=50, do_sample=True, top_k=50)
print(result[0]['generated_text'])
Downloads last month
37
Safetensors
Model size
82.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MightyDragon-Dev/Chinese-Dragon-GPT1