You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

✨🩷 Pour accéder a Tiny-nelya-neko, vous devez accepter nos conditions et notre license. Ainsi que utiliser le modèle ce pour quoi il a été entraîné. ✨🩷 Merci de remplir le formulaire ici-présent pour faire votre candidature.

Log in or Sign Up to review the conditions and access this model content.

📄 Model Card: Nelya-neko

Nelya-neko

🌟 Model Overview Nelya-neko is a Small Language Model (SLM) with 124 million parameters, pre-trained on the Nekolien constructed language (intellectual property of LLm-Clem). It is the first model in the new generation of Clemylia architectures designed for advanced research tasks in conlangs (constructed languages) and long-context processing.

🛠️ Technical Details and Architecture

Feature Value Impact Note
Family / Type Foundation Model (Base) / SLM Requires fine-tuning for alignment and downstream applications.
Developer Clemylia (LLm-Clem) Created from scratch (architecture, tokenizer, pre-training).
Parameters 124 Million Size optimized for efficiency and deployment on consumer-grade hardware.
Context Window 7000 Tokens Major Innovation: Enables processing of full documents and long-form Nekolien conversations.
Language Nekolien (Constructed Language) Ultra-specialized. Should not be used for natural languages without extensive fine-tuning.
Tokenizer Nekolien-tokenizer Proprietary tokenizer built from scratch, essential for decoding and encoding Nekolien.

🔑 Special Tokens (Included in Nekolien-tokenizer)

The model uses a set of special tokens to structure data and enable future alignment tasks:

Token Conventional Role Specific Function
UNK (Unknown) Handles unknown sequences not present in the Nekolien corpus.
CLS (Classifier) Classification token for sequence encapsulation (useful for fine-tuning).
SEP (Separator) Used to mark the boundary between different parts of a text sequence.
MASK Required for Masked Language Modeling (MLM) and prediction tasks in fine-tuning.
Memory / Metadata Unique token, potentially related to the efficient management of the extended context (7000 tokens).
Padding Ensures sequence length consistency for GPU efficiency.

📜 License and Usage Restrictions

License: LRUNDL (Limited Distinction Research Non-Commercial License)

  • Attribution: All derivatives (fine-tuned models) must clearly attribute authorship to LLm-Clem.
  • Restriction: Use of Nelya-neko is strictly limited to research and non-commercial experimentation.
  • Compliance: Derivative works must adhere to the LRUNDL (no more permissive licenses, such as MIT, can be applied).

💡 Intended Use and Limitations

Intended Use

  • Conlang Research: Studying language modeling on constructed linguistic systems.
  • Nekolien Dataset Creation: Generating coherent corpora for fine-tuning.
  • Base for Specialized Assistants: Developing bots for the Nekolien language following alignment fine-tuning. Limitations and Precautions
  • Not Aligned: As a pure foundation model, Nelya-neko produces thematic text continuation, not structured responses (requires fine-tuning for instruction following).
  • Monolingual: Performance in any language other than Nekolien is nil or not guaranteed.
  • Access: The model and its tokenizer are subject to access restrictions managed by LLm-Clem.

🚀 Next Steps for Deployment

To transition from this foundation model to a functional application, Alignment Fine-Tuning (based on Nekolien instruction/response pairs) is necessary to instill the desired obedience and persona. Nekolien variant of this model: Original/Central Nekolien. Nelya-neko complies with the rules of the Nekolien Academy: https://neko-lexicon-archives.lovable.app/

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Conlanger-LLM-CLEM/Tiny-Nelya-neko