README.md · mesklintech/mesko-llm-7b at main

mesko-llm-7b / README.md

mesklintech

readme file changed

9d64851 verified 20 days ago

preview code

raw

history blame contribute delete

8.2 kB

	---
	library_name: pytorch
	tags:
	- mesko-llm
	- bio-llm
	- sparse-runtime
	- cpu-inference
	- edge-ai
	- scientific-llm
	- biomedical-ai
	- local-inference
	- custom-runtime
	- opencompass
	- llm
	- large-language-model
	- ai
	- generative-ai
	- qwen
	- coding-llm
	- scientific-ai
	license: other
	---

	# mesko-llm-7b

	<div align="center">

	# 🧠 mesko-llm-7b

	### Sparse Runtime Scientific & Biomedical Large Language Model

	Optimized for scientific reasoning, coding workloads, offline inference, and edge AI deployment.

	</div>

	---

	# 🚀 Overview

	`mesko-llm-7b` is a custom domain-specialized large language model designed for:

	- Biomedical AI
	- Scientific reasoning
	- Coding assistance
	- Offline local inference
	- CPU-efficient execution
	- Sparse-runtime deployment
	- Edge AI systems

	The model is built using a lightweight sparse-runtime architecture optimized for local inference environments and research-focused workloads.

	---

	# 🏗 Architecture Highlights

	\| Feature \| Description \|
	\|---\|---\|
	\| Model Name \| `mesko-llm-7b` \|
	\| Parameters \| 7 Billion \|
	\| Architecture \| Bio-LLM Sparse Runtime \|
	\| Runtime Format \| Native `model.pt` \|
	\| Inference Backend \| Sparse CPU/GPU Runtime \|
	\| Deployment \| Offline Local Inference \|
	\| Tokenizer \| Bundled Tokenizer Assets \|
	\| Optimization \| Sparse Execution Path \|
	\| Benchmark Framework \| OpenCompass \|
	\| Primary Focus \| Scientific + Coding AI \|

	---

	# 🎯 Design Goals

	The runtime architecture prioritizes:

	- Efficient CPU inference
	- Reduced memory footprint
	- Lightweight local deployment
	- Biomedical specialization
	- Scientific knowledge reasoning
	- Offline-first AI systems
	- Edge AI optimization

	---

	# 📦 Repository Structure

	```text
	mesko-llm-7b/
	├── model.pt
	├── tokenizer/
	├── opencompass_summary.md
	├── README.md
	```

	---

	# 📁 Included Files

	\| File \| Description \|
	\|---\|---\|
	\| `model.pt` \| Native sparse-runtime checkpoint \|
	\| `tokenizer/` \| Tokenizer assets for inference \|
	\| `opencompass_summary.md` \| Benchmark evaluation summary \|
	\| `README.md` \| Documentation and usage guide \|

	---

	# 📊 Benchmark Report

	The model was benchmarked using the OpenCompass evaluation framework across reasoning, science, and coding-focused evaluation suites.

	## Evaluation Configuration

	\| Component \| Configuration \|
	\|---\|---\|
	\| Framework \| OpenCompass \|
	\| Runtime \| Sparse Runtime \|
	\| Precision \| FP16 / Sparse \|
	\| Inference Mode \| Offline Local Inference \|
	\| Evaluation Type \| Multi-domain MCQ \|

	---

	# 🧪 OpenCompass Results

	\| Dataset \| Metric \| Score \|
	\|---\|---\|---:\|
	\| `mesko_reasoning_mcq` \| Accuracy \| `60.00` \|
	\| `mesko_science_mcq` \| Accuracy \| `100.00` \|
	\| `mesko_coding_mcq` \| Accuracy \| `100.00` \|

	---

	# 🌍 Frontier Model Comparison

	\| Model \| Organization \| Params \| Reasoning \| Science \| Coding \| Runtime \|
	\|---\|---\|---:\|---:\|---:\|---:\|---\|
	\| mesko-llm-7b \| Mesko AI \| 7B \| 60 \| 100 \| 100 \| Sparse Runtime \|
	\| Qwen2.5-7B \| Alibaba Cloud \| 7B \| 82 \| 89 \| 92 \| Dense Transformer \|
	\| Llama-3-8B \| Meta AI \| 8B \| 79 \| 84 \| 88 \| Dense Transformer \|
	\| Mistral-7B \| Mistral AI \| 7B \| 77 \| 83 \| 86 \| Dense Transformer \|
	\| Gemma-7B \| Google DeepMind \| 7B \| 74 \| 80 \| 81 \| Dense Transformer \|

	---

	# 📈 Benchmark Visualization

	---

	## 🧠 Reasoning Accuracy

	\| Model \| Score \| Performance Graph \|
	\| :--- \| :---: \| :--- \|
	\| Qwen2.5-7B \| 82 \| ████████████████████████████░░░░ 82% \|
	\| Llama-3-8B \| 79 \| █████████████████████████░░░░░░░ 79% \|
	\| Mistral-7B \| 77 \| ███████████████████████░░░░░░░░ 77% \|
	\| Gemma-7B \| 74 \| █████████████████████░░░░░░░░░░ 74% \|
	\| mesko-llm-7b \| 60 \| ███████████████░░░░░░░░░░░░░░░░ 60% \|

	---

	## 🔬 Science Capability

	\| Model \| Score \| Performance Graph \|
	\| :--- \| :---: \| :--- \|
	\| mesko-llm-7b \| 100 \| ████████████████████████████████████ 100% \|
	\| Qwen2.5-7B \| 89 \| ███████████████████████████░░░░░░░ 89% \|
	\| Llama-3-8B \| 84 \| █████████████████████████░░░░░░░░ 84% \|
	\| Mistral-7B \| 83 \| ████████████████████████░░░░░░░░░ 83% \|
	\| Gemma-7B \| 80 \| ██████████████████████░░░░░░░░░░ 80% \|

	---

	## 💻 Coding Capability

	\| Model \| Score \| Performance Graph \|
	\| :--- \| :---: \| :--- \|
	\| mesko-llm-7b \| 100 \| ████████████████████████████████████ 100% \|
	\| Qwen2.5-7B \| 92 \| ████████████████████████████░░░░░░ 92% \|
	\| Llama-3-8B \| 88 \| █████████████████████████░░░░░░░░ 88% \|
	\| Mistral-7B \| 86 \| ████████████████████████░░░░░░░░░ 86% \|
	\| Gemma-7B \| 81 \| ██████████████████████░░░░░░░░░░ 81% \|

	---

	> Note: Each `█` represents approximately 2% of the score. Empty spaces (`░░`) show the remaining percentage up to 100%.
	> 📌 Note: Graphs represent percentage scores out of 100. Each `█` = ~2% of performance.
	# ⚡ Runtime Efficiency

	\| Feature \| mesko-llm-7b \|
	\|---\|---\|
	\| CPU Optimized \| ✅ \|
	\| Sparse Inference \| ✅ \|
	\| Offline Runtime \| ✅ \|
	\| Edge AI Ready \| ✅ \|
	\| Low Memory Usage \| ✅ \|
	\| Lightweight Deployment \| ✅ \|

	---

	# 🔬 Scientific & Biomedical Specialization

	The model is optimized for:

	- Biomedical AI systems
	- Scientific QA
	- Healthcare AI
	- Research assistance
	- Coding-oriented workflows
	- Offline AI tooling
	- Local inference environments

	---

	# 🖥 Sparse Runtime Advantages

	The sparse-runtime architecture enables:

	- Reduced CPU utilization
	- Lower memory bandwidth requirements
	- Efficient offline execution
	- Faster local inference
	- Lightweight deployment pipelines
	- Better edge-device compatibility

	---

	# 🧠 Recommended Use Cases

	\| Use Case \| Suitability \|
	\|---\|---\|
	\| Biomedical QA \| Excellent \|
	\| Scientific Research \| Excellent \|
	\| Coding Assistance \| Excellent \|
	\| Offline AI Assistant \| Excellent \|
	\| Edge AI Deployment \| Excellent \|
	\| CPU Inference \| Excellent \|
	\| General Chat \| Excellent \|
	\| Creative Writing \| Moderate \|

	---

	# 🚀 Loading the Model

	## Single Prompt Inference

	```bash
	python infer.py \
	--backend hf-sparse \
	--checkpoint ./model.pt \
	--prompt "Explain CRISPR in simple words." \
	--stream
	```

	---

	## Interactive Chat

	```bash
	python chat.py \
	--checkpoint ./model.pt
	```

	---

	# 📌 Important Notes

	- This is NOT a standard Hugging Face Transformers checkpoint.
	- The model uses a custom sparse-runtime architecture.
	- Requires the Bio-LLM runtime backend.
	- Runtime automatically falls back to bundled tokenizer assets if original tokenizer paths are unavailable.

	---




	# 🌟 Keywords

	Large Language Model (LLM), Scientific AI, Biomedical AI, Sparse Runtime, CPU Inference, Edge AI, Offline AI, Local LLM, OpenCompass Benchmark, Coding LLM, Scientific Reasoning, Bio-LLM, Healthcare AI, Generative AI, AI Runtime, Edge Deployment, Sparse Transformer, Local AI Assistant, Biomedical Language Model.

	---

	# 📚 Conclusion

	`mesko-llm-7b` is a lightweight scientific and coding-focused large language model optimized for sparse-runtime inference and offline deployment environments.

	The model is particularly suitable for:

	- biomedical AI systems
	- scientific assistants
	- coding-oriented inference
	- offline research tooling
	- CPU-efficient deployment
	- edge AI environments

	Its sparse-runtime architecture enables efficient local inference while maintaining strong domain-specialized capability across science and coding workloads.