Hugging Face Discord Ko-fi

Glimmer-1-Base

Glimmer-1 is the first model in the Glimmer series: a 11.9K parameter Llama-style transformer trained on 500K tokens of FineWeb-Edu. It is a SLM model exploring the lower bound of useful language model scale.

Notice

Glimmer-1-Base is an experimental research model. It has no supervised fine-tuning, is prone to incoherence, and is not suitable for any production use. SFT and CoT training are planned for future releases.


At a Glance

Property Value
Parameters ~11,900
Training Tokens 500,000 (FineWeb-Edu)
Context Window 512 tokens
Hardware RTX 4070 SUPER
Status Base only, no SFT

Benchmarks

  • arc_easy (acc): 25.46%
  • wikitext-2 (word_perplexity): 1,765,201
  • wikitext-2 (byte_perplexity): 14.73
  • wikitext-2 (bits_per_byte): 3.8806
  • BLiMP (acc): 52.43%

Architecture

Parameter Value
Architecture Transformer Decoder (LlamaForCausalLM)
Hidden Dimension 16
Layers 2
Attention Heads 4
KV Heads 1 (GQA)
MLP Intermediate Size 24 (SiLU activation)
Context Length 512 tokens
Vocabulary Size 512
Normalization RMSNorm, eps 1e-06
Position Encoding RoPE (default)
Embeddings Tied input / output

Limitations

  • Context window. 512 tokens severely limits long-range dependencies.
  • World knowledge. The model has almost no factual knowledge due to parameter constraints.
  • Coherence. Topic switching, random spacing, and unusual characters are expected behaviors, not bugs.
  • Reliability. Not suitable for any production application.
  • Purpose. Research, education, and architectural experimentation only.

Inference

Ensure you have your environment set up:

pip install torch transformers safetensors accelerate
"""
Inference pipeline framework for Glint-Research/Glimmer-1-Base.
Handles direct loading of structural safetensors and tokenization generation loops.
"""

import os
import json
import torch
import torch.nn.functional as F
from safetensors.torch import load_file
from transformers import LlamaConfig, LlamaForCausalLM, AutoTokenizer

class GlimmerInferencePipeline:
    def __init__(self, model_path: str, device: str = None):
        """
        Initializes the model structure and updates weights directly 
        from the local repository directory.
        """
        if device is None:
            self.device = "cuda" if torch.cuda.is_available() else "cpu"
        else:
            self.device = device
            
        print(f"[*] Initializing Glimmer-1-Base runtime on engine: {self.device}")
        
        config_file = os.path.join(model_path, "config.json")
        if not os.path.exists(config_file):
            raise FileNotFoundError(f"Could not locate config.json inside {model_path}")
            
        with open(config_file, "r", encoding="utf-8") as f:
            self.config_data = json.load(f)
            
        self.config = LlamaConfig(**self.config_data)
        
        print("[*] Loading tokenizer engine...")
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        
        print("[*] Loading underlying safetensors architecture...")
        self.model = LlamaForCausalLM(self.config)
        
        weights_file = os.path.join(model_path, "model.safetensors")
        if os.path.exists(weights_file):
            state_dict = load_file(weights_file, device="cpu")
            self.model.load_state_dict(state_dict, strict=True)
        else:
            raise FileNotFoundError(f"Could not find model.safetensors weight matrix in {model_path}")
            
        self.model.to(self.device)
        self.model.eval()
        print("[+] Model stack fully loaded and verified.")

    @torch.inference_mode()
    def generate(
        self, 
        prompt: str, 
        max_new_tokens: int = 50, 
        temperature: float = 0.7, 
        top_k: int = 50
    ) -> str:
        """
        Executes causal autoregressive generation loop.
        """
        inputs = self.tokenizer(prompt, return_tensors="pt")
        input_ids = inputs["input_ids"].to(self.device)
        
        bos_token_id = self.config_data.get("bos_token_id", 1)
        eos_token_id = self.config_data.get("eos_token_id", 2)
        
        if input_ids.shape[1] == 0 or input_ids[0, 0] != bos_token_id:
            bos_tensor = torch.tensor([[bos_token_id]], dtype=torch.long, device=self.device)
            input_ids = torch.cat([bos_tensor, input_ids], dim=-1)

        for _ in range(max_new_tokens):
            outputs = self.model(input_ids)
            next_token_logits = outputs.logits[:, -1, :]
            
            if temperature > 0.0:
                next_token_logits = next_token_logits / temperature
                
                if top_k > 0:
                    indices_to_remove = next_token_logits < torch.topk(next_token_logits, top_k)[0][..., -1, None]
                    next_token_logits[indices_to_remove] = float('-inf')
                
                probabilities = F.softmax(next_token_logits, dim=-1)
                next_token = torch.multinomial(probabilities, num_samples=1)
            else:
                next_token = torch.argmax(next_token_logits, dim=-1, keepdim=True)
                
            input_ids = torch.cat([input_ids, next_token], dim=-1)
            
            if next_token.item() == eos_token_id:
                break
                
        # Transform resulting output block back into text
        generated_output = self.tokenizer.decode(input_ids[0], skip_special_tokens=True)
        return generated_output

if __name__ == "__main__":
    # Point execution context directly to repository path files
    # Replace '.' with historical snapshot paths if running externally
    LOCAL_REPO_DIR = "."
    
    try:
        pipeline = GlimmerInferencePipeline(model_path=LOCAL_REPO_DIR)
        
        sample_prompt = "Deep learning architecture optimization requires"
        print(f"\n[Prompt Input]: {sample_prompt}")
        
        generated_text = pipeline.generate(
            prompt=sample_prompt, 
            max_new_tokens=32, 
            temperature=0.85
        )
        print(f"[Generated Response]: {generated_text}\n")
        
    except Exception as e:
        print(f"[-] Execution Error failed: {str(e)}")
        print("[!] Ensure config.json, tokenizer.json, and model.safetensors are inside the execution directory.")

Related Models

Model Parameters Notes
Glint-1.3 ~1M Instruction-tuned
Shard-1 54.5M Gemma-4 attention

Citation

@misc{glimmer1base2026,
  author    = {CompactAI},
  title     = {Glimmer-1: An 11.9K-Parameter Llama-Style Transformer},
  year      = {2026},
  publisher = {Glint Research},
  url       = {https://huggingface.co/Glint-Research}
}

Built by CompactAI โ€” trained and made by Enderchefcoder
Small models trying their best since 2026.

Downloads last month
236
Safetensors
Model size
11.9k params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train Glint-Research/Glimmer-1-Base

Spaces using Glint-Research/Glimmer-1-Base 2