language:

  • en,kr

library_name: transformers pipeline_tag: text-generation tags: - base-model - llama - causal-lm - pretrained - vllm - pytorch model-index: - name: keural-alpha-base results: []

Keural-Alpha-Base

Model Summary

Keural-Alpha-Base is a base (foundation) large language model trained using a LLaMA-compatible, decoder-only Transformer architecture.
It is comparable in role to models such as GPT-2 (base) or LLaMA base, and is intended to serve as a strong pretrained backbone for downstream fine-tuning.

This model is not instruction-tuned and not chat-aligned.


Model Details

  • Model type: Causal Language Model
  • Architecture: LLaMA-style Transformer
  • Framework: Hugging Face Transformers
  • Tokenizer: SentencePiece (LLaMA-compatible)
  • Vocabulary size: 32,000
  • Max sequence length: 2,048 tokens

Architecture Configuration

Component Value
Architecture LlamaForCausalLM
Hidden size 2048
Intermediate size 8192
Number of layers 24
Attention heads 16
Key-value heads 16
Head dimension 128
Activation SiLU
Normalization RMSNorm (ε = 1e-6)
Dropout 0.0
Vocabulary size 32,000
Max position embeddings 2048
Positional encoding RoPE (θ = 10000)
Attention bias Disabled
Weight tying Disabled

Tokenizer Details

  • Tokenizer type: SentencePiece
  • BOS token: <s>
  • EOS token: </s>
  • PAD token: Not defined (standard for LLaMA base models)

The absence of a padding token is intentional and follows standard LLaMA base design.
During inference, it is recommended to set pad_token_id = eos_token_id and provide an explicit attention_mask.


Intended Use

This model is designed for:

  • Further instruction tuning
  • Chat alignment
  • Domain-specific fine-tuning
  • Research on large language models
  • Serving as a pretrained backbone for NLP tasks

⚠️ Out-of-the-box generations may show repetition or incoherence, which is expected behavior for base models.


Limitations

  • Not instruction-following
  • Not safety-aligned
  • No RLHF applied
  • Requires fine-tuning for chat or production deployment

Supported Hardware

The model has been validated on:

  • NVIDIA H100 / H200
  • NVIDIA A100
  • NVIDIA Spark
  • Dell GB10
  • Modern RTX GPUs (Ampere / Ada / Blackwell)

vLLM Compatibility

Keural-Alpha-Base is fully compatible with vLLM.

Example:

python -m vllm.entrypoints.openai.api_server \
  --model mkd-hossain/keural-alpha-base \
  --served-model-name keural-alpha-base \
  --tensor-parallel-size 2 \
  --dtype bfloat16 \
  --max-model-len 2048 \
  --disable-log-stats



example command
curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "keural-alpha-base",
    "prompt": "Hello, my name is",
    "max_tokens": 60,
    "temperature": 0.7,
    "top_p": 0.9,
    "repetition_penalty": 1.15
  }'


Usage Example (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mkd-hossain/keural-alpha-base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer.pad_token = tokenizer.eos_token

inputs = tokenizer(
    "Hello, I am Hossain from Bangladesh.",
    return_tensors="pt"
)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.15,
        no_repeat_ngram_size=4,
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))



Ethical Considerations

As a base model, Keural-Alpha-Base may generate biased, incorrect, or unsafe content.
Users are responsible for applying appropriate alignment, filtering, and safeguards before deployment.



Author
Organization MKD Co LTD.
Developed by 
Project: Keural AI Systems
Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkd-hossain/keural-alpha-base

Unable to build the model tree, the base model loops to the model itself. Learn more.