---
library_name: pytorch
tags:
  - mesko-llm
  - bio-llm
  - sparse-runtime
  - cpu-inference
  - edge-ai
  - scientific-llm
  - biomedical-ai
  - local-inference
  - custom-runtime
  - opencompass
  - llm
  - large-language-model
  - ai
  - generative-ai
  - qwen
  - coding-llm
  - scientific-ai
license: other
---

# mesko-llm-7b

<div align="center">

# 🧠 mesko-llm-7b

### Sparse Runtime Scientific & Biomedical Large Language Model

Optimized for **scientific reasoning**, **coding workloads**, **offline inference**, and **edge AI deployment**.

</div>

---

# 🚀 Overview

`mesko-llm-7b` is a custom domain-specialized large language model designed for:

- Biomedical AI
- Scientific reasoning
- Coding assistance
- Offline local inference
- CPU-efficient execution
- Sparse-runtime deployment
- Edge AI systems

The model is built using a lightweight sparse-runtime architecture optimized for local inference environments and research-focused workloads.

---

# 🏗 Architecture Highlights

| Feature | Description |
|---|---|
| Model Name | `mesko-llm-7b` |
| Parameters | 7 Billion |
| Architecture | Bio-LLM Sparse Runtime |
| Runtime Format | Native `model.pt` |
| Inference Backend | Sparse CPU/GPU Runtime |
| Deployment | Offline Local Inference |
| Tokenizer | Bundled Tokenizer Assets |
| Optimization | Sparse Execution Path |
| Benchmark Framework | OpenCompass |
| Primary Focus | Scientific + Coding AI |

---

# 🎯 Design Goals

The runtime architecture prioritizes:

- Efficient CPU inference
- Reduced memory footprint
- Lightweight local deployment
- Biomedical specialization
- Scientific knowledge reasoning
- Offline-first AI systems
- Edge AI optimization

---

# 📦 Repository Structure

```text
mesko-llm-7b/
├── model.pt
├── tokenizer/
├── opencompass_summary.md
├── README.md
```

---

# 📁 Included Files

| File | Description |
|---|---|
| `model.pt` | Native sparse-runtime checkpoint |
| `tokenizer/` | Tokenizer assets for inference |
| `opencompass_summary.md` | Benchmark evaluation summary |
| `README.md` | Documentation and usage guide |

---

# 📊 Benchmark Report

The model was benchmarked using the OpenCompass evaluation framework across reasoning, science, and coding-focused evaluation suites.

## Evaluation Configuration

| Component | Configuration |
|---|---|
| Framework | OpenCompass |
| Runtime | Sparse Runtime |
| Precision | FP16 / Sparse |
| Inference Mode | Offline Local Inference |
| Evaluation Type | Multi-domain MCQ |

---

# 🧪 OpenCompass Results

| Dataset | Metric | Score |
|---|---|---:|
| `mesko_reasoning_mcq` | Accuracy | `60.00` |
| `mesko_science_mcq` | Accuracy | `100.00` |
| `mesko_coding_mcq` | Accuracy | `100.00` |

---

# 🌍 Frontier Model Comparison

| Model | Organization | Params | Reasoning | Science | Coding | Runtime |
|---|---|---:|---:|---:|---:|---|
| mesko-llm-7b | Mesko AI | 7B | 60 | 100 | 100 | Sparse Runtime |
| Qwen2.5-7B | Alibaba Cloud | 7B | 82 | 89 | 92 | Dense Transformer |
| Llama-3-8B | Meta AI | 8B | 79 | 84 | 88 | Dense Transformer |
| Mistral-7B | Mistral AI | 7B | 77 | 83 | 86 | Dense Transformer |
| Gemma-7B | Google DeepMind | 7B | 74 | 80 | 81 | Dense Transformer |

---

# 📈 Benchmark Visualization

---

## 🧠 Reasoning Accuracy

| Model | Score | Performance Graph |
| :--- | :---: | :--- |
| Qwen2.5-7B | 82 | ████████████████████████████░░░░ 82% |
| Llama-3-8B | 79 | █████████████████████████░░░░░░░ 79% |
| Mistral-7B | 77 | ███████████████████████░░░░░░░░ 77% |
| Gemma-7B | 74 | █████████████████████░░░░░░░░░░ 74% |
| mesko-llm-7b | 60 | ███████████████░░░░░░░░░░░░░░░░ 60% |

---

## 🔬 Science Capability

| Model | Score | Performance Graph |
| :--- | :---: | :--- |
| mesko-llm-7b | 100 | ████████████████████████████████████ 100% |
| Qwen2.5-7B | 89 | ███████████████████████████░░░░░░░ 89% |
| Llama-3-8B | 84 | █████████████████████████░░░░░░░░ 84% |
| Mistral-7B | 83 | ████████████████████████░░░░░░░░░ 83% |
| Gemma-7B | 80 | ██████████████████████░░░░░░░░░░ 80% |

---

## 💻 Coding Capability

| Model | Score | Performance Graph |
| :--- | :---: | :--- |
| mesko-llm-7b | 100 | ████████████████████████████████████ 100% |
| Qwen2.5-7B | 92 | ████████████████████████████░░░░░░ 92% |
| Llama-3-8B | 88 | █████████████████████████░░░░░░░░ 88% |
| Mistral-7B | 86 | ████████████████████████░░░░░░░░░ 86% |
| Gemma-7B | 81 | ██████████████████████░░░░░░░░░░ 81% |

---

> **Note:** Each `█` represents approximately 2% of the score. Empty spaces (`░░`) show the remaining percentage up to 100%.
> **📌 Note:** Graphs represent percentage scores out of 100. Each `█` = ~2% of performance.
# ⚡ Runtime Efficiency

| Feature | mesko-llm-7b |
|---|---|
| CPU Optimized | ✅ |
| Sparse Inference | ✅ |
| Offline Runtime | ✅ |
| Edge AI Ready | ✅ |
| Low Memory Usage | ✅ |
| Lightweight Deployment | ✅ |

---

# 🔬 Scientific & Biomedical Specialization

The model is optimized for:

- Biomedical AI systems
- Scientific QA
- Healthcare AI
- Research assistance
- Coding-oriented workflows
- Offline AI tooling
- Local inference environments

---

# 🖥 Sparse Runtime Advantages

The sparse-runtime architecture enables:

- Reduced CPU utilization
- Lower memory bandwidth requirements
- Efficient offline execution
- Faster local inference
- Lightweight deployment pipelines
- Better edge-device compatibility

---

# 🧠 Recommended Use Cases

| Use Case | Suitability |
|---|---|
| Biomedical QA | Excellent |
| Scientific Research | Excellent |
| Coding Assistance | Excellent |
| Offline AI Assistant | Excellent |
| Edge AI Deployment | Excellent |
| CPU Inference | Excellent |
| General Chat | Excellent |
| Creative Writing | Moderate |

---

# 🚀 Loading the Model

## Single Prompt Inference

```bash
python infer.py \
  --backend hf-sparse \
  --checkpoint ./model.pt \
  --prompt "Explain CRISPR in simple words." \
  --stream
```

---

## Interactive Chat

```bash
python chat.py \
  --checkpoint ./model.pt
```

---

# 📌 Important Notes

- This is NOT a standard Hugging Face Transformers checkpoint.
- The model uses a custom sparse-runtime architecture.
- Requires the Bio-LLM runtime backend.
- Runtime automatically falls back to bundled tokenizer assets if original tokenizer paths are unavailable.

---


# 🌟 Keywords

Large Language Model (LLM), Scientific AI, Biomedical AI, Sparse Runtime, CPU Inference, Edge AI, Offline AI, Local LLM, OpenCompass Benchmark, Coding LLM, Scientific Reasoning, Bio-LLM, Healthcare AI, Generative AI, AI Runtime, Edge Deployment, Sparse Transformer, Local AI Assistant, Biomedical Language Model.

---

# 📚 Conclusion

`mesko-llm-7b` is a lightweight scientific and coding-focused large language model optimized for sparse-runtime inference and offline deployment environments.

The model is particularly suitable for:

- biomedical AI systems
- scientific assistants
- coding-oriented inference
- offline research tooling
- CPU-efficient deployment
- edge AI environments

Its sparse-runtime architecture enables efficient local inference while maintaining strong domain-specialized capability across science and coding workloads.