|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- reasoning |
|
|
- mathematics |
|
|
- programming |
|
|
- creative-writing |
|
|
- chain-of-thought |
|
|
- interpretability |
|
|
- fairness |
|
|
- security |
|
|
- deployment |
|
|
- sustainability |
|
|
- monitoring |
|
|
- plugin |
|
|
--- |
|
|
|
|
|
# Brello Thinking |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**Brello Thinking** is an advanced language model created by **Epic Systems** as a part of **Brello AI Family**. Built on the robust Tencent Hunyuan base model, Brello Thinking specializes in deep reasoning, mathematical problem-solving, coding, and creative thinking with enhanced chain-of-thought capabilities. |
|
|
|
|
|
### Key Features |
|
|
|
|
|
- **Advanced Reasoning**: Enhanced chain-of-thought with both fast and slow thinking modes |
|
|
- **Mathematical Excellence**: Superior at math and symbolic computation |
|
|
- **Programming Prowess**: Strong coding abilities across Python, JS, C++, SQL, and more |
|
|
- **Long Context Understanding**: Handles up to 256K tokens, long docs, and codebases |
|
|
- **Creative Problem Solving**: Generates new solutions and approaches |
|
|
- **Multi-language Support**: Fluent in English and Chinese, robust cross-lingual transfer |
|
|
|
|
|
--- |
|
|
|
|
|
## 1. Executive Summary |
|
|
|
|
|
**Brello Thinking v1.1.0** (2025-08-07) is a 1.8B-parameter causal language model engineered for complex reasoning, mathematics, and creative tasks. It combines ultra-long context, dual “fast”/“deep” thinking modes, and a plugin SDK for live tool integration. It is designed for safe, sustainable, and fair production deployments. |
|
|
|
|
|
#### Highlights in this Release |
|
|
|
|
|
- **Mixed-precision quantization** (BF16 & INT8) |
|
|
- **Plugin SDK** (JSON-RPC, HMAC auth, dynamic tool routing) |
|
|
- **Monitoring** (Prometheus, Grafana, carbon tracking) |
|
|
- **Sustainability Dashboard** (gCO₂eq/token metrics, CodeCarbon SDK) |
|
|
|
|
|
--- |
|
|
|
|
|
## 2. Model Architecture |
|
|
|
|
|
| Component | Specification | |
|
|
|----------------------------|-----------------------------------------------------------------------------------------------------| |
|
|
| **Base Model** | Tencent Hunyuan / EpicBrelloV1ForCausalLM | |
|
|
| **Parameters** | 1.8B (BF16/INT8 quantization; LoRA adapters optional) | |
|
|
| **Context Window** | 256,000 tokens (rotary cache, sliding window, eviction logic) | |
|
|
| **Attention** | Grouped-Query + Multi-Head FlashAttention (16 heads, 4 KV heads) | |
|
|
| **Feed-Forward** | Two-stage (SiLU → Linear → SiLU) with RMSNorm, hidden size 6144 | |
|
|
| **Depth** | 32 transformer blocks + 4 “Safety Adapter” blocks | |
|
|
| **Adapters** | LoRA for math, code, creative, and domain fine-tuning (10–18M params each) | |
|
|
| **Inference Modes** | Autoregressive sampling (top-k, top-p), beam, contrastive decoding | |
|
|
| **Sharding** | ZeRO-3 / tensor-parallel / model-parallel combinations | |
|
|
|
|
|
--- |
|
|
|
|
|
## 3. Training & Tuning |
|
|
|
|
|
### 3.1 Pretraining Corpus |
|
|
|
|
|
- **Web General**: 400B tokens (CommonCrawl, CC-100, curated news) |
|
|
- **Science/Technical**: 50B tokens (arXiv, PubMed, patents) |
|
|
- **Code**: 20B tokens (public GitHub, CodeSearchNet, MBPP) |
|
|
- **Multilingual**: 30B tokens (Chinese, Spanish, German, Arabic) |
|
|
- **Augmentations**: 15% span corruption, zh–en back-translation, dynamic masking |
|
|
|
|
|
### 3.2 Optimization |
|
|
|
|
|
- **Optimizer**: AdamW (β₁=0.9, β₂=0.95, weight_decay=0.01) |
|
|
- **LR Schedule**: Linear warmup (10K steps), cosine decay (500K steps) |
|
|
- **Batch**: 2M tokens/step, grad accumulation ×8 |
|
|
|
|
|
### 3.3 Instruction/RLHF Tuning |
|
|
|
|
|
- **Instruction Pairs**: 1.2M human-annotated QA/reasoning |
|
|
- **Reward Model**: Dual human-preference ranking (5K raters, Elo) |
|
|
- **Algorithm**: PPO w/ KL penalty (target KL=0.1), reward clipping |
|
|
|
|
|
--- |
|
|
|
|
|
## 4. Specialized Modules |
|
|
|
|
|
| Adapter Name | Data Source | Params (M) | Use Case | |
|
|
|-------------------|-----------------------------------|------------|----------------------------------| |
|
|
| math-adapter | GSM8K, MATH, AIME datasets | 12 | Math proof, step-by-step logic | |
|
|
| code-adapter | MBPP, MultiPL-E, GitHub repos | 18 | Coding, debugging, codegen | |
|
|
| creative-adapter | Gutenberg, story corpora | 10 | Narrative, dialogue, ideation | |
|
|
|
|
|
--- |
|
|
|
|
|
## 5. Plugin & Tooling SDK |
|
|
|
|
|
- **Interface**: JSON-RPC (Unix socket or REST), HMAC-SHA256 auth |
|
|
- **Plugins**: |
|
|
- DB connectors: PostgreSQL, MySQL, Snowflake |
|
|
- HTTP client: retry/backoff |
|
|
- Vector DB: FAISS, Pinecone |
|
|
|
|
|
#### Tool Call Example |
|
|
|
|
|
1. Model emits: |
|
|
```json |
|
|
{"tool_call": {"name": "weather_fetch", "args": {"location":"Mumbai"}}} |
|
|
``` |
|
|
2. Host executes plugin, returns: |
|
|
```json |
|
|
{"tool_result": {"forecast":"Sunny, 32°C"}} |
|
|
``` |
|
|
3. Model resumes reasoning with tool result in context. |
|
|
|
|
|
--- |
|
|
|
|
|
## 6. Inference, Monitoring & Scaling |
|
|
|
|
|
### 6.1 Endpoint Performance |
|
|
|
|
|
| Mode | Batch | Seq Len | Throughput (tok/s) | Latency (p50) | |
|
|
|--------------|-------|----------|--------------------|---------------| |
|
|
| Fast-Think | 8 | 4,096 | 250,000 | 15 ms | |
|
|
| Deep-Think | 1 | 256,000 | 18,000 | 120 ms | |
|
|
| INT8 Quant | 16 | 2,048 | 320,000 | 12 ms | |
|
|
|
|
|
### 6.2 Observability |
|
|
|
|
|
- **Prometheus Metrics**: |
|
|
- `brello_inference_latency_seconds` |
|
|
- `brello_generated_tokens_total` |
|
|
- `brello_cache_evictions_total` |
|
|
- **Grafana**: |
|
|
- Token latency histograms, CO₂ per generation |
|
|
|
|
|
--- |
|
|
|
|
|
## 7. Sustainability & Carbon Tracking |
|
|
|
|
|
- **Data Center PUE**: 1.2 |
|
|
- **Carbon Emission**: ~0.0008 gCO₂eq/token (tracked with CodeCarbon) |
|
|
- **Offset**: Epic Systems funds VER 2.0 credits |
|
|
|
|
|
--- |
|
|
|
|
|
## 8. Robustness, Safety & Fairness |
|
|
|
|
|
- **Adapters**: Real-time adversarial input filtering, personal data redaction, toxicity classifier (fine-tuned BERT-tox) |
|
|
- **Bias Audits**: |
|
|
- Toxicity variation <1.8% (12 demographic axes) |
|
|
- Gender parity ±2% |
|
|
- Dialect coverage 98% (EN & ZH) |
|
|
|
|
|
--- |
|
|
|
|
|
## 9. Interpretability |
|
|
|
|
|
- **Chain-of-Thought logs**: Token-level reasoning trace |
|
|
- **Integrated Gradients**: Span attribution |
|
|
- **Attention Rollouts**: Layer-wise visualization (custom plugin) |
|
|
|
|
|
--- |
|
|
|
|
|
## 10. Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-------------------|----------| |
|
|
| num_layers | 32 | |
|
|
| d_model | 2048 | |
|
|
| d_hidden | 6144 | |
|
|
| num_heads | 16 | |
|
|
| kv_heads | 4 | |
|
|
| rotary_pct | 0.25 | |
|
|
| lr_warmup_steps | 10,000 | |
|
|
| weight_decay | 0.01 | |
|
|
| batch_size | 2M | |
|
|
| dropout_rate | 0.1 | |
|
|
|
|
|
--- |
|
|
|
|
|
## 11. Evaluation & Error Analysis |
|
|
|
|
|
- **Benchmarks**: GSM8K, MBPP, BBH, LongBench, MATH |
|
|
- **Analysis**: Math/logic confusion matrix, hallucination drift cluster analysis |
|
|
|
|
|
--- |
|
|
|
|
|
## 12. Roadmap |
|
|
|
|
|
| Version | Highlights | ETA | |
|
|
|-----------|----------------------------------------------|----------| |
|
|
| v1.1.0 | Plugins, carbon tracking, INT8 quantization | Released | |
|
|
| v1.2.0 | Vision-language, adapter expansion | Nov 2025 | |
|
|
| v1.3.0 | Audio, multilingual tuning | Feb 2026 | |
|
|
| v2.0 | Federated RAG, continuous learning | Q4 2026 | |
|
|
|
|
|
--- |
|
|
|
|
|
## 13. Licensing & Compliance |
|
|
|
|
|
- **License**: Proprietary, Epic Systems |
|
|
- **Privacy**: GDPR, CCPA compliant |
|
|
- **Certifications**: ISO 27001, SOC 2 Type II, HIPAA (BAA on request) |
|
|
- **Restrictions**: No redistribution or large-scale rehosting |
|
|
|
|
|
--- |
|
|
|
|
|
## 14. Usage Example |
|
|
|
|
|
```python |
|
|
import os |
|
|
import torch |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
from peft import PeftModel # For LoRA adapters |
|
|
from brello_sdk import BrelloPluginManager # Hypothetical SDK |
|
|
from codecarbon import EmissionsTracker |
|
|
from prometheus_client import CollectorRegistry, Counter, Histogram, push_to_gateway |
|
|
|
|
|
def setup_model( |
|
|
model_id: str = "BrelloES/brello-thinking", |
|
|
use_bf16: bool = True, |
|
|
load_int8: bool = True, |
|
|
): |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_id, |
|
|
device_map="auto", |
|
|
torch_dtype=torch.bfloat16 if use_bf16 else torch.float32, |
|
|
load_in_8bit=load_int8, |
|
|
) |
|
|
# Attach LoRA adapters |
|
|
model = PeftModel.from_pretrained(model, "adapters/math-adapter") |
|
|
model = PeftModel.from_pretrained(model, "adapters/code-adapter") |
|
|
return tokenizer, model |
|
|
|
|
|
def setup_plugins(): |
|
|
pm = BrelloPluginManager() |
|
|
pm.register( |
|
|
name="weather_fetch", |
|
|
path="/opt/brello/plugins/weather_plugin.so", |
|
|
auth_key=os.getenv("WEATHER_PLUGIN_KEY", "CHANGE_ME"), |
|
|
) |
|
|
pm.register( |
|
|
name="db_query", |
|
|
path="/opt/brello/plugins/db_query_plugin.so", |
|
|
auth_key=os.getenv("DB_PLUGIN_KEY", "CHANGE_ME"), |
|
|
) |
|
|
return pm |
|
|
|
|
|
def setup_metrics(): |
|
|
registry = CollectorRegistry() |
|
|
Histogram( |
|
|
"brello_inference_latency_seconds", |
|
|
"Inference latency (seconds) per request", |
|
|
registry=registry, |
|
|
buckets=(0.01, 0.05, 0.1, 0.2, 0.5, 1.0), |
|
|
) |
|
|
Counter( |
|
|
"brello_generated_tokens_total", |
|
|
"Total number of tokens generated by Brello", |
|
|
registry=registry, |
|
|
) |
|
|
return registry |
|
|
|
|
|
def generate_response(tokenizer, model, plugin_mgr, registry, messages, mode: str = "deep"): |
|
|
inputs = tokenizer.apply_chat_template( |
|
|
messages, |
|
|
tokenize=True, |
|
|
add_generation_prompt=True, |
|
|
enable_thinking=True if mode == "deep" else False, |
|
|
) |
|
|
tracker = EmissionsTracker(project_name="brello_inference", output_dir="carbon_logs") |
|
|
tracker.start() |
|
|
# (Metrics update simplified for clarity) |
|
|
outputs = model.generate( |
|
|
inputs.to(model.device), |
|
|
max_new_tokens=512, |
|
|
top_p=0.9, |
|
|
temperature=0.6, |
|
|
plugin_manager=plugin_mgr, |
|
|
return_dict_in_generate=True, |
|
|
output_scores=True, |
|
|
) |
|
|
emissions_kg = tracker.stop() |
|
|
text = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True) |
|
|
return text, emissions_kg |
|
|
|
|
|
def main(): |
|
|
tokenizer, model = setup_model() |
|
|
plugin_mgr = setup_plugins() |
|
|
registry = setup_metrics() |
|
|
messages = [ |
|
|
{"role": "system", "content": "You are Brello Thinking in Deep-Think mode."}, |
|
|
{"role": "user", "content": "Explain why prime factorization is unique."}, |
|
|
] |
|
|
response, co2 = generate_response(tokenizer, model, plugin_mgr, registry, messages, mode="deep") |
|
|
print("=== Deep-Think Output ===\n", response) |
|
|
print(f"CO₂ Emitted: {co2:.6f} kg") |
|
|
# Fast-Think comparison |
|
|
messages[0]["content"] = "You are Brello Thinking in Fast-Think mode." |
|
|
response_fast, co2_fast = generate_response(tokenizer, model, plugin_mgr, registry, messages, mode="fast") |
|
|
print("\n=== Fast-Think Output ===\n", response_fast) |
|
|
print(f"CO₂ Emitted: {co2_fast:.6f} kg") |
|
|
|
|
|
if __name__ == "__main__": |
|
|
main() |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Otvd |
|
|
|
|
|
- **Creator**: Epic Systems |
|
|
- **Engineer**: Rehan Temkar |
|
|
- **Model**: Brello Thinking v1.0.0 |
|
|
|
|
|
--- |
|
|
|
|
|
*Brello Thinking - Advanced AI Reasoning by Epic Systems* |
|
|
|
|
|
--- |