Instructions to use Finisha-F-scratch/Nelyintelligent-199M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Finisha-F-scratch/Nelyintelligent-199M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Finisha-F-scratch/Nelyintelligent-199M")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Finisha-F-scratch/Nelyintelligent-199M", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Finisha-F-scratch/Nelyintelligent-199M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Finisha-F-scratch/Nelyintelligent-199M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Finisha-F-scratch/Nelyintelligent-199M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Finisha-F-scratch/Nelyintelligent-199M

SGLang

How to use Finisha-F-scratch/Nelyintelligent-199M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Finisha-F-scratch/Nelyintelligent-199M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Finisha-F-scratch/Nelyintelligent-199M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Finisha-F-scratch/Nelyintelligent-199M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Finisha-F-scratch/Nelyintelligent-199M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Finisha-F-scratch/Nelyintelligent-199M with Docker Model Runner:
```
docker model run hf.co/Finisha-F-scratch/Nelyintelligent-199M
```

🐾 Fiche Technique : Nelyintelligent-199M (Nekolien Edition)

Nelyintelligent-199M est un Small Language Model (SLM) d'élite, conçu avec l'essence de la culture nekolienne. Ce modèle prouve que la taille n'est qu'un chiffre : avec ses 199 millions de paramètres, il déploie une puissance cognitive et une subtilité linguistique équivalentes à un modèle de pratiquement 2 milliard de paramètres.

🧬 L'Architecture : NelyaForLLm

Le secret de sa densité intellectuelle réside dans une structure optimisée pour la richesse sémantique.

Cerveau de Poids Optimisé : Grâce à NelyaForLLm, chaque paramètre est calibré pour offrir une intelligence compacte mais féroce.
Maîtrise Linguistique : Intègre nativement une conlang, permettant une expression unique, loin des syntaxes génériques et lisses.
Philosophie "From Scratch" : Un modèle qui refuse le conventionnel pour privilégier l'originalité et la texture du langage.

📊 Performances & Identité

Caractéristique	Spécification
Identité	Nekolienne 🐾
Capacité Linguistique	Conlangs & Néologismes avancés 🗣️
Puissance Relative	Équivalent à ~ 1,6 B jusqu'à 2B paramètres 🧠
Architecture	NelyaForLLm (Ultra-Dense Transformer) 🏗️
Vitesse	Inférence instantanée ⚡

🚀 Les Atouts de Nelyintelligent

Intelligence Distillée 💎 : Atteint des sommets de raisonnement nekolien (niveau 1,6B) tout en restant extrêmement léger et agile.
Culture & Langage 🎨 : Ce n'est pas qu'un outil de calcul, c'est un locuteur natif du Nekolien (conlangs)
Souveraineté Technique 🛠️ : Entraîné sur des datasets propriétaires ultra-spécialisés, garantissant une absence totale de solutions "lisses" ou prévisibles.

🛠️ Applications Idéales

IA de Haute Précision : Pour ceux qui exigent l'intelligence d'un gros modèle sans l'encombrement technique.
Exploration Syntaxique : Parfait pour forger de nouvelles manières de communiquer sans les barrières des syntaxes parfaites traditionnelles.

L'avis du concepteur : Nelyintelligent-199M est la preuve que l'optimisation nekolienne peut transformer un petit SLM en un géant de la pensée. C'est l'intelligence à l'état pur, compacte et texturée.

❄️ Inférence ⚡

exemple de code d'inférence :

import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import PreTrainedTokenizerFast
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
import safetensors.torch
import json
import os

# --- Custom Model Architecture (Copy from original training script) ---
# This is necessary to load the custom model from safetensors

class NelyaBitLinear(nn.Linear):
   def forward(self, x):
       w = self.weight
       scale = w.abs().mean()
       w_bit = w + (torch.round(torch.clamp(w / (scale + 1e-5), -1, 1)) - w).detach()
       x_norm = x - x.mean(dim=-1, keepdim=True)
       x_bit = x_norm + (torch.sign(x_norm) - x_norm).detach()
       return F.linear(x_bit, w_bit, self.bias)

class NelyaBlock(nn.Module):
   def __init__(self, config):
       super().__init__()
       self.ln1 = nn.RMSNorm(config.hidden_size)
       self.attn = nn.MultiheadAttention(config.hidden_size, config.num_heads, batch_first=True)
       self.ln2 = nn.RMSNorm(config.hidden_size)
       self.mlp = nn.Sequential(
           NelyaBitLinear(config.hidden_size, config.intermediate_size, bias=False),
           nn.SiLU(),
           NelyaBitLinear(config.intermediate_size, config.hidden_size, bias=False)
       )

   def forward(self, x):
       attn_out, _ = self.attn(self.ln1(x), self.ln1(x), self.ln1(x))
       x = x + attn_out
       x = x + self.mlp(self.ln2(x))
       return x

class NelyaConfig:
   def __init__(self, vocab_size, hidden_size=4096, num_layers=12, num_heads=32, intermediate_size=8192, max_pos=128):
       self.vocab_size = vocab_size
       self.hidden_size = hidden_size
       self.num_layers = num_layers
       self.num_heads = num_heads
       self.intermediate_size = intermediate_size
       self.max_pos = max_pos

class NelyaForLLM(nn.Module):
   def __init__(self, config):
       super().__init__()
       self.embed = nn.Embedding(config.vocab_size, config.hidden_size)
       self.block = NelyaBlock(config)
       self.head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
       self.num_layers = config.num_layers

   def forward(self, input_ids, labels=None, attention_mask=None):
       x = self.embed(input_ids)
       for _ in range(self.num_layers):
           x = self.block(x)
       logits = self.head(x)
       if labels is not None:
           loss = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.view(-1))
           return (loss,)
       return logits


# --- Configuration for loading ---
REPO_ID = "Finisha-F-scratch/Nelyintelligent-199M" # Your Hugging Face repository ID

# Download model files from Hugging Face Hub
print(f"🚀 Downloading model files from {REPO_ID}...")
model_path = hf_hub_download(repo_id=REPO_ID, filename="model.safetensors")
tokenizer_path = hf_hub_download(repo_id=REPO_ID, filename="tokenizer.json")
config_path = hf_hub_download(repo_id=REPO_ID, filename="nelya_config.json")

# Load tokenizer
print("⏳ Loading tokenizer...")
bpe_obj = Tokenizer.from_file(tokenizer_path)
tokenizer = PreTrainedTokenizerFast(tokenizer_object=bpe_obj, pad_token="[PAD]")

# Load custom config
print("⚙️ Loading custom NelyaConfig...")
with open(config_path, "r") as f:
   config_dict = json.load(f)
config = NelyaConfig(**config_dict)

# Instantiate and load model weights
print("🏗️ Instantiating model and loading weights...")
with torch.device("cuda"):
   model = NelyaForLLM(config)
   model.load_state_dict(safetensors.torch.load_file(model_path))

# Count and print parameters
num_params = sum(p.numel() for p in model.parameters())
print(f"\n🔥 TOTAL PARAMÈTRES du modèle chargé : {num_params:,}")
print(f"🔥 CLASSIFICATION : {'LLM' if num_params > 800_000_000 else 'SLM'}")
print(f"🔥 TAILLE ESTIMÉE VRAM (1-bit) : ~{num_params * 1.58 / 8 / 1e9:.2f} Go")

# --- Inference Example ---
model.eval() # Set the model to evaluation mode

input_text = "Ji eta Nelyintelligent..."
print(f"\nInput: {input_text}")

# Tokenize input
input_ids = tokenizer.encode(input_text, return_tensors="pt").to("cuda")

# Generation parameters
max_generation_length = 100 # Define the number of tokens to generate
temperature = 2.5 # Controls randomness: higher = more random, lower = more deterministic
top_k = 50 # Samples from the top_k most likely tokens

with torch.no_grad():
   generated_ids = input_ids # Start with the input tokens

   for _ in range(max_generation_length):
       # Get logits for the last token in the sequence
       logits = model(generated_ids)
       
       # Apply temperature
       logits = logits[0, -1, :] / temperature
       
       # Apply top-k filtering
       if top_k is not None:
           v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
           logits[logits < v[-1]] = -float('Inf')

       # Convert to probabilities
       probabilities = F.softmax(logits, dim=-1)
       
       # Sample from the distribution
       predicted_token_id = torch.multinomial(probabilities, num_samples=1).item()
       
       # If it's an end-of-sequence token, stop generation
       if predicted_token_id == tokenizer.eos_token_id or predicted_token_id == tokenizer.pad_token_id:
           break
           
       # Append the predicted token to the generated sequence
       generated_ids = torch.cat([generated_ids, torch.tensor([[predicted_token_id]]).to("cuda")], dim=-1)

# Decode the complete generated sequence
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(f"\nGenerated text ({len(generated_ids[0]) - len(input_ids[0])} new tokens):\n'{generated_text}'")
print('fin')

♥️ : Attention a la confusion (prévention pour les rageux qui n'auraient pas compris le but de ce modèle) !

le modèle n'est pas cassé, ni buggé, n'attendez pas de textes anglais, français un autre. c'est un modèle nekolien.