Model By:Candra Alpin Gunawan

LCTLM1

LCTLM (Laten Connection Transformers Language Model) is Transformers model with special architecture. in originaly Transformers any block have (LN - > attention -> LN -> FFN ) is implemented with old LLM Model.. LCTLM bring a new architecture dan replacing FFN to LCM Block , but why we replacing FFN to LCM Block ?? LCM block can looking attention with perception it is step1 and step2 with sparately GELU activation dan mixing the both with residural connection.

The main idea of LCTLM is to increase representation capacity without increasing parameter size aggressively. Instead of using a single large Feed-Forward Network, the LCM Block creates two parallel perception paths (step1 and step2) that process the normalized input independently. Each path applies a GELU activation, captures a different latent feature, and then both paths are combined and projected through a magnitude layer.

This mechanism creates a "latent mixing effect", allowing the model to generate richer internal representations compared to traditional FFN-based transformers with the same number of parameters.

LCTLM also preserves the classic residual structure, so stability during training is still maintained. Even without a standard FFN, the model shows strong convergence and smooth loss behavior during training.

One of the surprising results is that LCTLM performs above expectations for its size. Although this model only has ~42M parameters, its language generation quality behaves more like a 100M class model, especially in semantic consistency and long-context coherence.

During training, LCTLM showed stable gradients and no exploding activation issues, even when trained on sequences up to 500 tokens. The dual-path LCM structure seems to distribute the representation load more evenly, similar to the capacity behavior found in mixture-of-experts (MoE) architectures, despite LCTLM not being an MoE model.

This gives LCTLM an interesting property: a parameter-efficient model with "MoE-like" representational behavior, but without the routing complexity of real MoE architectures.

Each LCT layer consists of:

  1. LayerNorm + Multihead Causal Attention + Residual
  2. LCM Block + Residual

Unlike standard transformers, there is no classical FFN (MLP) after the attention.
The LCM Block replaces it entirely.

LCTLM is designed as a research experiment to explore whether a transformer can maintain strong language modeling capability without relying on classical FFN layers. The goal is to build a lightweight LM architecture that scales well and can be extended to larger models in the future (100M – 1B parameters) while keeping computational cost manageable.

In summary, LCTLM proposes a simple but effective modification to the transformer architecture. By replacing the FFN with the Latent Connected Module, the model achieves a strong balance between parameter efficiency, stability, and expressive power.

This project is still experimental, and further improvements such as RMSNorm, FlashAttention, or larger-scale training are planned in future versions.

About LCM: https://zenodo.org/records/17501400?token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6Ijk2ZmJmNDg3LWI3MTYtNDVlNy05OWEzLTRiOTZkNGFhOTkzMyIsImRhdGEiOnt9LCJyYW5kb20iOiJiZTNhZjBmMGJmN2NmN2EyNWYyMzRiZWI3MjJkMjcwZCJ9.1Tcsiz_aRDDHbR2MmUdf2MkcUPbyKsI88dRGsv1O3MpA-dxBMk7B4JiSvfwk0RKG9SBzV7WGHY3mnth_iEwhTg

LCT Ai Paper : https://zenodo.org/records/17963099?preview=1&token=eyJhbGciOiJIUzUxMiJ9.eyJpZCI6IjRhYWFjYzdlLWE3ZGQtNGI2Yy05NmQ3LTc1MGU2OWZlNGNlNyIsImRhdGEiOnt9LCJyYW5kb20iOiIyYzhkZGEzMTZkMWFiMmFmZTA4MTcxYzJiNWY0NjdiMSJ9.Opf4Nw1YGTzEROClOUL2c7XuSSVXpUK4lhpk1Y3MpTD4DmA6aaY-XQ1Qwnwm7X8Q_PRXDM3IeGVa1MaOmmw5aQ

Model Detail

  • Name: LCTLM (Latent Connection Transformers Language Model)
  • size: 166mb
  • param size: 42-50M parameters
  • type: Decoder Only NLP Transformers
  • lib: torch

How to use Model


  from lctlm1 import *
  from tokenizers import Tokenizer
  tokenizer = Tokenizer.from_file("tokenizer.json")
  model = LMLCT1()
  model.load_state_dict(torch.load("lctlm1.pth"))
  print(generate_text(model,input_texts,tokenizer))

How to use Fini_tuned Model

    import torch 
    from tokenizers import Tokenizer
    from lctlm1 import *
    
    from tokenizers import Tokenizer
    tokenizer = Tokenizer.from_file("tokenizer.json")
    model = LMLCT1()
    model.load_state_dict(torch.load("lctlm1_finetuned.pth"))
    
    if __name__ == "__main__" :
    
        state_stop = False
    
        while (state_stop == False) :
            users = input("type input here : ")
            response =  generate_tesk(model=model,texts=users,tokenizer=tokenizer,temperature=1.0)
            print(f"response : \n {response}")
            stop = input("stop ? ")
            if stop == "y" or stop == "Y" :
                break 
    

Model loss Bechmark:

loss1 loss2

output sample:

input: "i love when world looking at me, i hope i can live my cherish ones "

outputs: i love when world looking at me, i hope i can live my cherish ones home now we all sleep home aing be growing many are to about community ; to their with sister and s be outside school around simply it can terrible i always our online . can life smart just trought and life hurting connection I learn without ! on own but is extra to an person to about because human , still dis or they that what us mental is by advanced that of speci s improved . , taken courses meant be ists once belive we ' over us a called sister ; would see you : pm To , i i , nothing do make life easier do we and that are who ited . mus at ess may doing services and wanting be end being care , , much ! ur cracy students like this We see family such communication , ors give photo , , ine and more what teacher or ers be , how retain that may contained a for application T . of work a lesson have many effects medical . you lazy sets wall your away helping you how it eat . , can in problem your shape you to and carry alone school theres ways walk talk human . mean im ins be people are 911 time probably because are P how offer not ending extra ric students . ens together you is of big in people alone as go school cer any of to with and , , if school teach give weak , try activities , can you like helps make its most stopping wi your life issues school There be half friends All this it you like on the that taking notes them This is part you many lessons school trying make life easy How you the and whats for are there People listen person may yourself In can out school and end those are functional . created , missing d helped a , life and , that can threat me that . causes thing te of from things and used often and time O mpi ll thirty nine more . find school have sport is point as as ti cam , also oking old That an in future Tech meet friend An world us up others they a so they we if stick what want tell will e problems as , it what be , will down the ial cy E oking ba ing new and future So yes the can you in with , a th s make afraid some it that solely ! you is about gradu , who to agreement it then future amazing your ions be you do P MAKE your was and ley your ? bye just your quite mighty animals sun how just out the here earth not final is ses what say Third it animals expected bing children and eg don t . - e

lets bring LCTLM to next level with support : https://ko-fi.com/alpin92578

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support