HALO-S 3.5M — WikiText-2 (Character-Level)

Modelo de lenguaje preentrenado con la arquitectura HALO-S (Hierarchical Attention with Local Optimization – Sparse), una alternativa eficiente al Transformer con complejidad O(N×K) en lugar de O(N²).

Entrenado durante 10 épocas sobre WikiText-2 a nivel de carácter (vocab_size=256).

Resultados

Métrica HALO-S Transformer denso
Parámetros 3.54 M 3.28 M
Val Loss 1.2463 1.2391
Val Perplexity 3.48 3.45

Los beneficios de velocidad/memoria se amplifican con secuencias ≥ 2048 tokens.

Uso rápido

pip install pyhalos
import torch
from halo import HaloConfig, HaloSModel, CharacterTokenizer
from safetensors.torch import load_file

# Cargar config
with open("config.json") as f:
    cfg_dict = json.load(f)

config = HaloConfig(
    vocab_size=cfg_dict["vocab_size"],
    hidden_size=cfg_dict["hidden_size"],
    num_layers=cfg_dict["num_layers"],
    num_heads=cfg_dict["num_heads"],
    num_kv_heads=cfg_dict["num_kv_heads"],
    num_globals=cfg_dict["num_globals"],
    local_window=cfg_dict["local_window"],
    max_seq_len=cfg_dict["max_seq_len"],
)

model = HaloSModel(config)
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict)
model.eval()

# Generar texto
tokenizer = CharacterTokenizer()
input_ids = torch.tensor([tokenizer.encode("Machine learning")]).long()
output = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_k=50)
print(tokenizer.decode(output[0].tolist()))

Framework

Este modelo usa pyhalo — disponible en PyPI.

Autor

BUEORMdalusx64@gmail.com

Downloads last month
130
Safetensors
Model size
3.54M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 2 Ask for provider support

Dataset used to train BUEORM/HALO-S-Usmall

Space using BUEORM/HALO-S-Usmall 1