SLIM-1-base-chkp

This repository contains intermediate checkpoints for the SLIM-1-base-chkp project. This is a Small Language Model (SLM) trained from scratch on high-quality educational and code data.

Model Details

Architecture: Decoder-only Transformer
Parameters: 175M.
Context Window: 2048 tokens.
Training Precision: bfloat16.
Target: English (Scientific, Educational, and Code content).
Train tokens: about 7 billions

Training Progress

Hardware: 1x NVIDIA GeForce RTX 3060.
Datasets Used:
- FineWeb-Edu
- OpenWebMath
- Refined-web
- Code-search-net
- Tiny-codes
- Cosmopedia

Usage

Since this is a Base model, it is designed for text completion rather than instruction-following. It is best used for further fine-tuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("lumasik/SLIM-1-base-chkp2500")
tokenizer = AutoTokenizer.from_pretrained("lumasik/SLIM-1-base-chkp2500")

prompt = "The relationship between large language models and tokenization is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including lumasik/SLIM-1-base-chkp

SLIM-1

Collection

SLIM-1 from scratch models. Trained by me • 2 items • Updated 1 day ago