Instructions to use beaglabs/lancero-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use beaglabs/lancero-1.7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="beaglabs/lancero-1.7B", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("beaglabs/lancero-1.7B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use beaglabs/lancero-1.7B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "beaglabs/lancero-1.7B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beaglabs/lancero-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/beaglabs/lancero-1.7B
- SGLang
How to use beaglabs/lancero-1.7B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "beaglabs/lancero-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beaglabs/lancero-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "beaglabs/lancero-1.7B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "beaglabs/lancero-1.7B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use beaglabs/lancero-1.7B with Docker Model Runner:
docker model run hf.co/beaglabs/lancero-1.7B
Lancero 1.7B
Lancero is an entity-conditioned language model that augments a standard causal LM with a structured entity stream inspired by category theory. It is the first model from beag-research.
This repository contains adapter weights (LoRA + entity decoder + entity
projection). The base model HuggingFaceTB/SmolLM2-1.7B-Instruct is required
at load time and is not included here.
Architecture
Lancero extends a pretrained transformer with a parallel entity stream:
| Component | Role |
|---|---|
| Entity hash encoder | Maps surface tokens to 17-bit structured entity IDs (type, class, scope, arity, role, morphism) |
| Entity decoder | Learned embedding of the hash space (131,072 entities -> 896d) |
| Entity projection | Linear projection from entity space into LM hidden space (896d -> 2048d) |
| LoRA adapter | Rank-64 adaptation of all Q/K/V/O/gate/up/down projections in the base LM |
During generation, token embeddings and projected entity embeddings are summed before being passed through the transformer. The entity stream provides the model with explicit identity, type, and structural information that raw subword tokens cannot express.
The broader inference engine (ent) layers on:
- Abstraction hierarchies (type-aware is-a/specializes-to graphs)
- Graph-based path reasoning between entity pairs
- Working memory with confidence-scored active filtering
- Durable semantic and procedural memory across calls
- Symbolic program execution for arithmetic, comparison, and deduction
When the Neural Model Runs
Not every query needs the full 1.7B transformer. The engine routes dynamically:
- Symbolic path: Entity resolution, type queries, graph traversal, and arithmetic run entirely on hashes and category-theoretic operations with zero neural inference.
- Neural path: Open-ended generation, explanation, and creative tasks fall through to the entity-conditioned language model.
This means Lancero is not a general-purpose chatbot. It is an inference architecture where the language model serves as a reasoning-augmented generator, not the sole decision-maker.
Files
| File | Description |
|---|---|
entity_decoder.safetensors |
Entity embedding table + type/class/scope classification heads |
entity_proj.safetensors |
Learned linear projection from entity space to LM hidden space |
lora/adapter_model.safetensors |
LoRA weights for SmolLM2 attention and MLP projections |
lora/adapter_config.json |
PEFT adapter configuration (rank=64, alpha=128) |
Loading
Requires the ent package:
from transformers import AutoTokenizer
from ent.training.train import EntitySmolWrapper
base_model = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.pad_token = tokenizer.eos_token
model = EntitySmolWrapper.from_pretrained(
path="beaglabs/lancero-1.7B",
base_model_name=base_model,
device="cuda",
tokenizer=tokenizer,
)
Evaluation
Lancero is evaluated on its native reasoning tasks, not on general-purpose language benchmarks. A 1.7B model tested against MMLU or HumanEval measures how well it mimics a larger model โ not how well it performs its intended function.
| Task | Score |
|---|---|
| Entity resolution (type + scope + hash) | 100% |
| Abstraction (hierarchical labeling) | 100% |
| Graph reasoning (path finding + relation + score) | 100% |
| Working memory (merge, support, reject, filter, summarize) | 100% |
| Durable memory (threshold, confirmation, recall) | 100% |
| Program execution (arithmetic, comparison, counting, deduction, ordering, elimination) | 100% |
| Retrieval reranking (iterative, memory-backed, abstraction-conditioned) | 100% |
| Code structure (symbol resolution, dependency tracing, call graph, edit target) | 100% |
These tasks test entity identity, structural reasoning, and memory โ the capabilities the architecture was designed for.
Training
- Base model: SmolLM2-1.7B-Instruct, frozen entity decoder
- Trainable: entity projection layer + LoRA adapters (rank 64, alpha 128)
- LoRA targets: all Q, K, V, O projections + all gate, up, down MLP projections
- Entity dropout: 5%
- Training code:
ent/training/train.py,ent/training/modal_train.py
Limitations
- Not a drop-in chat model. Requires the
entinference engine for full capability. - 1.7B parameters limits generative fluency compared to larger models.
- The entity hash space (131,072 entries) is a fixed vocabulary. Novel surface forms map to heuristic type/class/scope assignments.
- Code execution sandboxing in the eval framework is unreliable on macOS; program execution evals pass on Linux.
License
Apache 2.0
- Downloads last month
- 26
Model tree for beaglabs/lancero-1.7B
Base model
HuggingFaceTB/SmolLM2-1.7B