License

This model is licensed under the MIT License.

ViorikaLM-CHAT

🚧 Experimental Under-Training Model (~250M parameters) based on a custom 12-layer/12-head Transformer architecture.
Primarily supports English 🇬🇧. This is my first model.

📖 Description

ViorikaLM-CHAT is an experimental generative language model designed for text generation and dialogue tasks.
The main goal of this project is to test the full pipeline: from implementing the architecture and training from scratch to uploading models to the Hugging Face Hub.

⚙️ Model Details

Architecture: Custom Transformer Decoder (12 layers, 12 attention heads)
Model size: ~250M parameters #
Training Approach: Pre-trained from scratch on WikiText
Languages: Primarily English
License: MIT

🏋️ Training Details

Dataset: wikitext-103-raw-v1 (or similar WikiText format)
Hardware: Single NVIDIA GTX 1070 (8GB VRAM)
Training Status: Very early checkpoint (Under-trained)
Epochs: 2
Batch size: 8
Optimizer: Adam, lr = 3e-4
Max sequence length: 128 tokens

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ViorikaAI/ViorikaLM-CHAT"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "Hello, how are you?"
inputs = tokenizer(input_text, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.9
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 12

Dmitriy-Zemskov
/

ViorikaLM-CHAT