brello-thinking / README.md
RehanKingggg's picture
Update README.md
15131af verified
---
license: mit
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- reasoning
- mathematics
- programming
- creative-writing
- chain-of-thought
- interpretability
- fairness
- security
- deployment
- sustainability
- monitoring
- plugin
---
# Brello Thinking
## Model Description
**Brello Thinking** is an advanced language model created by **Epic Systems** as a part of **Brello AI Family**. Built on the robust Tencent Hunyuan base model, Brello Thinking specializes in deep reasoning, mathematical problem-solving, coding, and creative thinking with enhanced chain-of-thought capabilities.
### Key Features
- **Advanced Reasoning**: Enhanced chain-of-thought with both fast and slow thinking modes
- **Mathematical Excellence**: Superior at math and symbolic computation
- **Programming Prowess**: Strong coding abilities across Python, JS, C++, SQL, and more
- **Long Context Understanding**: Handles up to 256K tokens, long docs, and codebases
- **Creative Problem Solving**: Generates new solutions and approaches
- **Multi-language Support**: Fluent in English and Chinese, robust cross-lingual transfer
---
## 1. Executive Summary
**Brello Thinking v1.1.0** (2025-08-07) is a 1.8B-parameter causal language model engineered for complex reasoning, mathematics, and creative tasks. It combines ultra-long context, dual “fast”/“deep” thinking modes, and a plugin SDK for live tool integration. It is designed for safe, sustainable, and fair production deployments.
#### Highlights in this Release
- **Mixed-precision quantization** (BF16 & INT8)
- **Plugin SDK** (JSON-RPC, HMAC auth, dynamic tool routing)
- **Monitoring** (Prometheus, Grafana, carbon tracking)
- **Sustainability Dashboard** (gCO₂eq/token metrics, CodeCarbon SDK)
---
## 2. Model Architecture
| Component | Specification |
|----------------------------|-----------------------------------------------------------------------------------------------------|
| **Base Model** | Tencent Hunyuan / EpicBrelloV1ForCausalLM |
| **Parameters** | 1.8B (BF16/INT8 quantization; LoRA adapters optional) |
| **Context Window** | 256,000 tokens (rotary cache, sliding window, eviction logic) |
| **Attention** | Grouped-Query + Multi-Head FlashAttention (16 heads, 4 KV heads) |
| **Feed-Forward** | Two-stage (SiLU → Linear → SiLU) with RMSNorm, hidden size 6144 |
| **Depth** | 32 transformer blocks + 4 “Safety Adapter” blocks |
| **Adapters** | LoRA for math, code, creative, and domain fine-tuning (10–18M params each) |
| **Inference Modes** | Autoregressive sampling (top-k, top-p), beam, contrastive decoding |
| **Sharding** | ZeRO-3 / tensor-parallel / model-parallel combinations |
---
## 3. Training & Tuning
### 3.1 Pretraining Corpus
- **Web General**: 400B tokens (CommonCrawl, CC-100, curated news)
- **Science/Technical**: 50B tokens (arXiv, PubMed, patents)
- **Code**: 20B tokens (public GitHub, CodeSearchNet, MBPP)
- **Multilingual**: 30B tokens (Chinese, Spanish, German, Arabic)
- **Augmentations**: 15% span corruption, zh–en back-translation, dynamic masking
### 3.2 Optimization
- **Optimizer**: AdamW (β₁=0.9, β₂=0.95, weight_decay=0.01)
- **LR Schedule**: Linear warmup (10K steps), cosine decay (500K steps)
- **Batch**: 2M tokens/step, grad accumulation ×8
### 3.3 Instruction/RLHF Tuning
- **Instruction Pairs**: 1.2M human-annotated QA/reasoning
- **Reward Model**: Dual human-preference ranking (5K raters, Elo)
- **Algorithm**: PPO w/ KL penalty (target KL=0.1), reward clipping
---
## 4. Specialized Modules
| Adapter Name | Data Source | Params (M) | Use Case |
|-------------------|-----------------------------------|------------|----------------------------------|
| math-adapter | GSM8K, MATH, AIME datasets | 12 | Math proof, step-by-step logic |
| code-adapter | MBPP, MultiPL-E, GitHub repos | 18 | Coding, debugging, codegen |
| creative-adapter | Gutenberg, story corpora | 10 | Narrative, dialogue, ideation |
---
## 5. Plugin & Tooling SDK
- **Interface**: JSON-RPC (Unix socket or REST), HMAC-SHA256 auth
- **Plugins**:
- DB connectors: PostgreSQL, MySQL, Snowflake
- HTTP client: retry/backoff
- Vector DB: FAISS, Pinecone
#### Tool Call Example
1. Model emits:
```json
{"tool_call": {"name": "weather_fetch", "args": {"location":"Mumbai"}}}
```
2. Host executes plugin, returns:
```json
{"tool_result": {"forecast":"Sunny, 32°C"}}
```
3. Model resumes reasoning with tool result in context.
---
## 6. Inference, Monitoring & Scaling
### 6.1 Endpoint Performance
| Mode | Batch | Seq Len | Throughput (tok/s) | Latency (p50) |
|--------------|-------|----------|--------------------|---------------|
| Fast-Think | 8 | 4,096 | 250,000 | 15 ms |
| Deep-Think | 1 | 256,000 | 18,000 | 120 ms |
| INT8 Quant | 16 | 2,048 | 320,000 | 12 ms |
### 6.2 Observability
- **Prometheus Metrics**:
- `brello_inference_latency_seconds`
- `brello_generated_tokens_total`
- `brello_cache_evictions_total`
- **Grafana**:
- Token latency histograms, CO₂ per generation
---
## 7. Sustainability & Carbon Tracking
- **Data Center PUE**: 1.2
- **Carbon Emission**: ~0.0008 gCO₂eq/token (tracked with CodeCarbon)
- **Offset**: Epic Systems funds VER 2.0 credits
---
## 8. Robustness, Safety & Fairness
- **Adapters**: Real-time adversarial input filtering, personal data redaction, toxicity classifier (fine-tuned BERT-tox)
- **Bias Audits**:
- Toxicity variation <1.8% (12 demographic axes)
- Gender parity ±2%
- Dialect coverage 98% (EN & ZH)
---
## 9. Interpretability
- **Chain-of-Thought logs**: Token-level reasoning trace
- **Integrated Gradients**: Span attribution
- **Attention Rollouts**: Layer-wise visualization (custom plugin)
---
## 10. Hyperparameters
| Parameter | Value |
|-------------------|----------|
| num_layers | 32 |
| d_model | 2048 |
| d_hidden | 6144 |
| num_heads | 16 |
| kv_heads | 4 |
| rotary_pct | 0.25 |
| lr_warmup_steps | 10,000 |
| weight_decay | 0.01 |
| batch_size | 2M |
| dropout_rate | 0.1 |
---
## 11. Evaluation & Error Analysis
- **Benchmarks**: GSM8K, MBPP, BBH, LongBench, MATH
- **Analysis**: Math/logic confusion matrix, hallucination drift cluster analysis
---
## 12. Roadmap
| Version | Highlights | ETA |
|-----------|----------------------------------------------|----------|
| v1.1.0 | Plugins, carbon tracking, INT8 quantization | Released |
| v1.2.0 | Vision-language, adapter expansion | Nov 2025 |
| v1.3.0 | Audio, multilingual tuning | Feb 2026 |
| v2.0 | Federated RAG, continuous learning | Q4 2026 |
---
## 13. Licensing & Compliance
- **License**: Proprietary, Epic Systems
- **Privacy**: GDPR, CCPA compliant
- **Certifications**: ISO 27001, SOC 2 Type II, HIPAA (BAA on request)
- **Restrictions**: No redistribution or large-scale rehosting
---
## 14. Usage Example
```python
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel # For LoRA adapters
from brello_sdk import BrelloPluginManager # Hypothetical SDK
from codecarbon import EmissionsTracker
from prometheus_client import CollectorRegistry, Counter, Histogram, push_to_gateway
def setup_model(
model_id: str = "BrelloES/brello-thinking",
use_bf16: bool = True,
load_int8: bool = True,
):
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16 if use_bf16 else torch.float32,
load_in_8bit=load_int8,
)
# Attach LoRA adapters
model = PeftModel.from_pretrained(model, "adapters/math-adapter")
model = PeftModel.from_pretrained(model, "adapters/code-adapter")
return tokenizer, model
def setup_plugins():
pm = BrelloPluginManager()
pm.register(
name="weather_fetch",
path="/opt/brello/plugins/weather_plugin.so",
auth_key=os.getenv("WEATHER_PLUGIN_KEY", "CHANGE_ME"),
)
pm.register(
name="db_query",
path="/opt/brello/plugins/db_query_plugin.so",
auth_key=os.getenv("DB_PLUGIN_KEY", "CHANGE_ME"),
)
return pm
def setup_metrics():
registry = CollectorRegistry()
Histogram(
"brello_inference_latency_seconds",
"Inference latency (seconds) per request",
registry=registry,
buckets=(0.01, 0.05, 0.1, 0.2, 0.5, 1.0),
)
Counter(
"brello_generated_tokens_total",
"Total number of tokens generated by Brello",
registry=registry,
)
return registry
def generate_response(tokenizer, model, plugin_mgr, registry, messages, mode: str = "deep"):
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
enable_thinking=True if mode == "deep" else False,
)
tracker = EmissionsTracker(project_name="brello_inference", output_dir="carbon_logs")
tracker.start()
# (Metrics update simplified for clarity)
outputs = model.generate(
inputs.to(model.device),
max_new_tokens=512,
top_p=0.9,
temperature=0.6,
plugin_manager=plugin_mgr,
return_dict_in_generate=True,
output_scores=True,
)
emissions_kg = tracker.stop()
text = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
return text, emissions_kg
def main():
tokenizer, model = setup_model()
plugin_mgr = setup_plugins()
registry = setup_metrics()
messages = [
{"role": "system", "content": "You are Brello Thinking in Deep-Think mode."},
{"role": "user", "content": "Explain why prime factorization is unique."},
]
response, co2 = generate_response(tokenizer, model, plugin_mgr, registry, messages, mode="deep")
print("=== Deep-Think Output ===\n", response)
print(f"CO₂ Emitted: {co2:.6f} kg")
# Fast-Think comparison
messages[0]["content"] = "You are Brello Thinking in Fast-Think mode."
response_fast, co2_fast = generate_response(tokenizer, model, plugin_mgr, registry, messages, mode="fast")
print("\n=== Fast-Think Output ===\n", response_fast)
print(f"CO₂ Emitted: {co2_fast:.6f} kg")
if __name__ == "__main__":
main()
```
---
## Otvd
- **Creator**: Epic Systems
- **Engineer**: Rehan Temkar
- **Model**: Brello Thinking v1.0.0
---
*Brello Thinking - Advanced AI Reasoning by Epic Systems*
---