language:
- en,kr
library_name: transformers pipeline_tag: text-generation tags: - base-model - llama - causal-lm - pretrained - vllm - pytorch model-index: - name: keural-alpha-base results: []
Keural-Alpha-Base
Model Summary
Keural-Alpha-Base is a base (foundation) large language model trained using a LLaMA-compatible, decoder-only Transformer architecture.
It is comparable in role to models such as GPT-2 (base) or LLaMA base, and is intended to serve as a strong pretrained backbone for downstream fine-tuning.
This model is not instruction-tuned and not chat-aligned.
Model Details
- Model type: Causal Language Model
- Architecture: LLaMA-style Transformer
- Framework: Hugging Face Transformers
- Tokenizer: SentencePiece (LLaMA-compatible)
- Vocabulary size: 32,000
- Max sequence length: 2,048 tokens
Architecture Configuration
| Component | Value |
|---|---|
| Architecture | LlamaForCausalLM |
| Hidden size | 2048 |
| Intermediate size | 8192 |
| Number of layers | 24 |
| Attention heads | 16 |
| Key-value heads | 16 |
| Head dimension | 128 |
| Activation | SiLU |
| Normalization | RMSNorm (ε = 1e-6) |
| Dropout | 0.0 |
| Vocabulary size | 32,000 |
| Max position embeddings | 2048 |
| Positional encoding | RoPE (θ = 10000) |
| Attention bias | Disabled |
| Weight tying | Disabled |
Tokenizer Details
- Tokenizer type: SentencePiece
- BOS token:
<s> - EOS token:
</s> - PAD token: Not defined (standard for LLaMA base models)
The absence of a padding token is intentional and follows standard LLaMA base design.
During inference, it is recommended to setpad_token_id = eos_token_idand provide an explicitattention_mask.
Intended Use
This model is designed for:
- Further instruction tuning
- Chat alignment
- Domain-specific fine-tuning
- Research on large language models
- Serving as a pretrained backbone for NLP tasks
⚠️ Out-of-the-box generations may show repetition or incoherence, which is expected behavior for base models.
Limitations
- Not instruction-following
- Not safety-aligned
- No RLHF applied
- Requires fine-tuning for chat or production deployment
Supported Hardware
The model has been validated on:
- NVIDIA H100 / H200
- NVIDIA A100
- NVIDIA Spark
- Dell GB10
- Modern RTX GPUs (Ampere / Ada / Blackwell)
vLLM Compatibility
Keural-Alpha-Base is fully compatible with vLLM.
Example:
python -m vllm.entrypoints.openai.api_server \
--model mkd-hossain/keural-alpha-base \
--served-model-name keural-alpha-base \
--tensor-parallel-size 2 \
--dtype bfloat16 \
--max-model-len 2048 \
--disable-log-stats
example command
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "keural-alpha-base",
"prompt": "Hello, my name is",
"max_tokens": 60,
"temperature": 0.7,
"top_p": 0.9,
"repetition_penalty": 1.15
}'
Usage Example (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mkd-hossain/keural-alpha-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer.pad_token = tokenizer.eos_token
inputs = tokenizer(
"Hello, I am Hossain from Bangladesh.",
return_tensors="pt"
)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.15,
no_repeat_ngram_size=4,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Ethical Considerations
As a base model, Keural-Alpha-Base may generate biased, incorrect, or unsafe content.
Users are responsible for applying appropriate alignment, filtering, and safeguards before deployment.
Author
Organization MKD Co LTD.
Developed by
Project: Keural AI Systems
- Downloads last month
- -
Model tree for mkd-hossain/keural-alpha-base
Unable to build the model tree, the base model loops to the model itself. Learn more.