mesko-llm-7b

🧠 mesko-llm-7b

Sparse Runtime Scientific & Biomedical Large Language Model

Optimized for scientific reasoning, coding workloads, offline inference, and edge AI deployment.

🚀 Overview

mesko-llm-7b is a custom domain-specialized large language model designed for:

Biomedical AI
Scientific reasoning
Coding assistance
Offline local inference
CPU-efficient execution
Sparse-runtime deployment
Edge AI systems

The model is built using a lightweight sparse-runtime architecture optimized for local inference environments and research-focused workloads.

🏗 Architecture Highlights

Feature	Description
Model Name	`mesko-llm-7b`
Parameters	7 Billion
Architecture	Bio-LLM Sparse Runtime
Runtime Format	Native `model.pt`
Inference Backend	Sparse CPU/GPU Runtime
Deployment	Offline Local Inference
Tokenizer	Bundled Tokenizer Assets
Optimization	Sparse Execution Path
Benchmark Framework	OpenCompass
Primary Focus	Scientific + Coding AI

🎯 Design Goals

The runtime architecture prioritizes:

Efficient CPU inference
Reduced memory footprint
Lightweight local deployment
Biomedical specialization
Scientific knowledge reasoning
Offline-first AI systems
Edge AI optimization

📦 Repository Structure

mesko-llm-7b/
├── model.pt
├── tokenizer/
├── opencompass_summary.md
├── README.md

📁 Included Files

File	Description
`model.pt`	Native sparse-runtime checkpoint
`tokenizer/`	Tokenizer assets for inference
`opencompass_summary.md`	Benchmark evaluation summary
`README.md`	Documentation and usage guide

📊 Benchmark Report

The model was benchmarked using the OpenCompass evaluation framework across reasoning, science, and coding-focused evaluation suites.

Evaluation Configuration

Component	Configuration
Framework	OpenCompass
Runtime	Sparse Runtime
Precision	FP16 / Sparse
Inference Mode	Offline Local Inference
Evaluation Type	Multi-domain MCQ

🧪 OpenCompass Results

Dataset	Metric	Score
`mesko_reasoning_mcq`	Accuracy	`60.00`
`mesko_science_mcq`	Accuracy	`100.00`
`mesko_coding_mcq`	Accuracy	`100.00`

🌍 Frontier Model Comparison

Model	Organization	Params	Reasoning	Science	Coding	Runtime
mesko-llm-7b	Mesko AI	7B	60	100	100	Sparse Runtime
Qwen2.5-7B	Alibaba Cloud	7B	82	89	92	Dense Transformer
Llama-3-8B	Meta AI	8B	79	84	88	Dense Transformer
Mistral-7B	Mistral AI	7B	77	83	86	Dense Transformer
Gemma-7B	Google DeepMind	7B	74	80	81	Dense Transformer

📈 Benchmark Visualization

🧠 Reasoning Accuracy

Model	Score	Performance Graph
Qwen2.5-7B	82	████████████████████████████░░░░ 82%
Llama-3-8B	79	█████████████████████████░░░░░░░ 79%
Mistral-7B	77	███████████████████████░░░░░░░░ 77%
Gemma-7B	74	█████████████████████░░░░░░░░░░ 74%
mesko-llm-7b	60	███████████████░░░░░░░░░░░░░░░░ 60%

🔬 Science Capability

Model	Score	Performance Graph
mesko-llm-7b	100	████████████████████████████████████ 100%
Qwen2.5-7B	89	███████████████████████████░░░░░░░ 89%
Llama-3-8B	84	█████████████████████████░░░░░░░░ 84%
Mistral-7B	83	████████████████████████░░░░░░░░░ 83%
Gemma-7B	80	██████████████████████░░░░░░░░░░ 80%

💻 Coding Capability

Model	Score	Performance Graph
mesko-llm-7b	100	████████████████████████████████████ 100%
Qwen2.5-7B	92	████████████████████████████░░░░░░ 92%
Llama-3-8B	88	█████████████████████████░░░░░░░░ 88%
Mistral-7B	86	████████████████████████░░░░░░░░░ 86%
Gemma-7B	81	██████████████████████░░░░░░░░░░ 81%

Note: Each █ represents approximately 2% of the score. Empty spaces (░░) show the remaining percentage up to 100%. 📌 Note: Graphs represent percentage scores out of 100. Each █ = ~2% of performance.

⚡ Runtime Efficiency

Feature	mesko-llm-7b
CPU Optimized	✅
Sparse Inference	✅
Offline Runtime	✅
Edge AI Ready	✅
Low Memory Usage	✅
Lightweight Deployment	✅

🔬 Scientific & Biomedical Specialization

The model is optimized for:

Biomedical AI systems
Scientific QA
Healthcare AI
Research assistance
Coding-oriented workflows
Offline AI tooling
Local inference environments

🖥 Sparse Runtime Advantages

The sparse-runtime architecture enables:

Reduced CPU utilization
Lower memory bandwidth requirements
Efficient offline execution
Faster local inference
Lightweight deployment pipelines
Better edge-device compatibility

🧠 Recommended Use Cases

Use Case	Suitability
Biomedical QA	Excellent
Scientific Research	Excellent
Coding Assistance	Excellent
Offline AI Assistant	Excellent
Edge AI Deployment	Excellent
CPU Inference	Excellent
General Chat	Excellent
Creative Writing	Moderate

🚀 Loading the Model

Single Prompt Inference

python infer.py \
  --backend hf-sparse \
  --checkpoint ./model.pt \
  --prompt "Explain CRISPR in simple words." \
  --stream

Interactive Chat

python chat.py \
  --checkpoint ./model.pt

📌 Important Notes

This is NOT a standard Hugging Face Transformers checkpoint.
The model uses a custom sparse-runtime architecture.
Requires the Bio-LLM runtime backend.
Runtime automatically falls back to bundled tokenizer assets if original tokenizer paths are unavailable.

🌟 Keywords

Large Language Model (LLM), Scientific AI, Biomedical AI, Sparse Runtime, CPU Inference, Edge AI, Offline AI, Local LLM, OpenCompass Benchmark, Coding LLM, Scientific Reasoning, Bio-LLM, Healthcare AI, Generative AI, AI Runtime, Edge Deployment, Sparse Transformer, Local AI Assistant, Biomedical Language Model.

📚 Conclusion

mesko-llm-7b is a lightweight scientific and coding-focused large language model optimized for sparse-runtime inference and offline deployment environments.

The model is particularly suitable for:

biomedical AI systems
scientific assistants
coding-oriented inference
offline research tooling
CPU-efficient deployment
edge AI environments

Its sparse-runtime architecture enables efficient local inference while maintaining strong domain-specialized capability across science and coding workloads.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support