SLIM-1-base-chkp

This repository contains intermediate checkpoints for the SLIM-1-base-chkp project. This is a Small Language Model (SLM) trained from scratch on high-quality educational and code data.

Model Details

  • Architecture: Decoder-only Transformer
  • Parameters: 175M.
  • Context Window: 2048 tokens.
  • Training Precision: bfloat16.
  • Target: English (Scientific, Educational, and Code content).
  • Train tokens: about 7 billions

Training Progress

  • Hardware: 1x NVIDIA GeForce RTX 3060.
  • Datasets Used:
    • FineWeb-Edu
    • OpenWebMath
    • Refined-web
    • Code-search-net
    • Tiny-codes
    • Cosmopedia

Usage

Since this is a Base model, it is designed for text completion rather than instruction-following. It is best used for further fine-tuning.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("lumasik/SLIM-1-base-chkp2500")
tokenizer = AutoTokenizer.from_pretrained("lumasik/SLIM-1-base-chkp2500")

prompt = "The relationship between large language models and tokenization is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lumasik/SLIM-1-base-chkp