--- license: mit datasets: - crownelius/Opus-4.6-Reasoning-3300x base_model: - microsoft/phi-2 - venkycs/phi-2-instruct pipeline_tag: text-generation --- **LBNET-2.7B-BASE model card** We introduce the first-ever Logic/Reasoning-based transformer model based on Phi-2. In February 2026, we created an experimental architecture called LBNets, an attempt to inject reasoning-like layers into a model's architecture. In this case, we experimented with Phi-2. **Here is the logic behind LBNET-2.7B-BASE:** - Split the base model into two halves: pre-reasoning and post-reasoning layers. - Between these layers, you insert: - learnable latent 'reasoning tokens' - reasoning blocks (cross-attention: latent tokens attend to the main hidden states (the “context”), self-attention: latent tokens attend to each other, MLP) - reasoning injector back into the main stream To make generation workable: - During prefill (the initial prompt, past_length == 0 and seq_len > 1), the model runs the reasoning loop once. - During token-by-token generation with KV-cache (seq_len == 1), the model skips the reasoning loop (otherwise it gets slow and unstable). LBNETS-2.7B-BASE achieves much above average benchmarks for its size compared to other models: | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |-------------|------:|------|-----:|--------|---|-----:|---|-----:| |arc_challenge| 1|none | 0|acc |↑ |0.5324|± |0.0146| | | |none | 0|acc_norm|↑ |0.5478|± |0.0145| |arc_easy | 1|none | 0|acc |↑ |0.8047|± |0.0081| | | |none | 0|acc_norm|↑ |0.7862|± |0.0084| |boolq | 2|none | 0|acc |↑ |0.8346|± |0.0065| |openbookqa | 1|none | 0|acc |↑ |0.4040|± |0.0220| | | |none | 0|acc_norm|↑ |0.5160|± |0.0224| |piqa | 1|none | 0|acc |↑ |0.7889|± |0.0095| | | |none | 0|acc_norm|↑ |0.7949|± |0.0094| |winogrande | 1|none | 0|acc |↑ |0.7577|± |0.0120| We reccommend running this model on at least an RTX 3050 with 8gb of VRAM. FOR FULL MODEL FUNCTIONALITY, YOU MUST USE THE CHAT SCRIPT BELOW: The script is ROCm-friendly. May need tweaking for CUDA setups. ```python import os import argparse import torch from transformers import AutoTokenizer from configuration import PhiReasoningConfig from modeling import PhiForLogicalReasoning # ROCm allocator hint (helps fragmentation on AMD ROCm) os.environ.setdefault("PYTORCH_HIP_ALLOC_CONF", "expandable_segments:True,max_split_size_mb:64") DEFAULT_SYSTEM_PROMPT = "You are LBNets, a helpful assistant." def format_prompt(system_prompt: str, user_text: str, history, max_turns: int = 6) -> str: """ Build a single instruction that includes recent chat history. This keeps compatibility with your training template. history: list of (user, assistant) tuples """ system_prompt = (system_prompt or "").strip() user_text = (user_text or "").strip() convo = "" for u, a in history[-max_turns:]: convo += f"User: {u}\nAssistant: {a}\n" instruction = "" if convo: instruction += "Conversation so far:\n" + convo + "\n" instruction += "Current user message:\n" + user_text return ( f"### System:\n{system_prompt}\n\n" f"### Instruction:\n{instruction}\n\n" f"### Response:\n" ) @torch.inference_mode() def generate_text(model, tok, prompt_text: str, device: str, max_new_tokens: int = 256) -> str: inputs = tok( prompt_text, return_tensors="pt", add_special_tokens=False, truncation=True, max_length=768, # history makes prompts longer; keep sane ).to(device) in_len = inputs["input_ids"].shape[1] out_ids = model.generate( **inputs, do_sample=False, # greedy use_cache=True, # KV cache (fast) max_new_tokens=max_new_tokens, min_new_tokens=1, # general anti-loop controls (not per-problem patching) repetition_penalty=1.10, no_repeat_ngram_size=3, pad_token_id=tok.pad_token_id, eos_token_id=tok.eos_token_id, ) new_ids = out_ids[0][in_len:] text = tok.decode(new_ids, skip_special_tokens=True) # Avoid "blank" replies from leading newline spam return text.lstrip("\n").rstrip() def load_model(model_path: str, device: str): cfg = PhiReasoningConfig.from_pretrained(model_path) cfg.attn_implementation = "eager" cfg.use_cache = True tok = AutoTokenizer.from_pretrained(model_path) if tok.pad_token is None: tok.pad_token = tok.eos_token tok.pad_token_id = tok.eos_token_id model = PhiForLogicalReasoning.from_pretrained( model_path, config=cfg, torch_dtype=torch.float16, # often faster/more compatible on ROCm than bf16 low_cpu_mem_usage=True, ).to(device) model.eval() gate = model.model.reasoning_injector.gate_scale.detach().float().cpu().numpy() total_params = sum(p.numel() for p in model.parameters()) print(f"Loaded: {model_path}") print(f"Parameters: {total_params:,}") print(f"Gate scale: {gate}") print(f"Device: {device}") return model, tok def main(): ap = argparse.ArgumentParser() ap.add_argument("--model_path", default="Aclevo/LBNET-2.7B-BASE") ap.add_argument("--device", default="cuda:0") ap.add_argument("--system_prompt", default=DEFAULT_SYSTEM_PROMPT) ap.add_argument("--max_new_tokens", type=int, default=256) ap.add_argument("--history_turns", type=int, default=6) args = ap.parse_args() model, tok = load_model(args.model_path, args.device) history = [] print("\n============================================================") print("LBNets Chat Ready!") print("Commands: 'quit' to exit | 'reset' to clear conversation") print("============================================================\n") while True: user = input("User: ").strip() if not user: continue if user.lower() in ("quit", "exit", "q"): break if user.lower() in ("reset", "/reset"): history.clear() print("AI: Conversation reset.\n") continue prompt = format_prompt(args.system_prompt, user, history, max_turns=args.history_turns) resp = generate_text(model, tok, prompt, args.device, max_new_tokens=args.max_new_tokens) print(f"AI: {resp}\n") # store turn history.append((user, resp)) if __name__ == "__main__": main() ``` If you like our work and services, give us a star on Github: https://github.com/Aclevo, or give us a mention in your work! -Aclevo Team