| --- |
| library_name: pytorch |
| tags: |
| - mesko-llm |
| - bio-llm |
| - sparse-runtime |
| - cpu-inference |
| - edge-ai |
| - scientific-llm |
| - biomedical-ai |
| - local-inference |
| - custom-runtime |
| - opencompass |
| - llm |
| - large-language-model |
| - ai |
| - generative-ai |
| - qwen |
| - coding-llm |
| - scientific-ai |
| license: other |
| --- |
| |
| # mesko-llm-7b |
|
|
| <div align="center"> |
|
|
| # π§ mesko-llm-7b |
|
|
| ### Sparse Runtime Scientific & Biomedical Large Language Model |
|
|
| Optimized for **scientific reasoning**, **coding workloads**, **offline inference**, and **edge AI deployment**. |
|
|
| </div> |
|
|
| --- |
|
|
| # π Overview |
|
|
| `mesko-llm-7b` is a custom domain-specialized large language model designed for: |
|
|
| - Biomedical AI |
| - Scientific reasoning |
| - Coding assistance |
| - Offline local inference |
| - CPU-efficient execution |
| - Sparse-runtime deployment |
| - Edge AI systems |
|
|
| The model is built using a lightweight sparse-runtime architecture optimized for local inference environments and research-focused workloads. |
|
|
| --- |
|
|
| # π Architecture Highlights |
|
|
| | Feature | Description | |
| |---|---| |
| | Model Name | `mesko-llm-7b` | |
| | Parameters | 7 Billion | |
| | Architecture | Bio-LLM Sparse Runtime | |
| | Runtime Format | Native `model.pt` | |
| | Inference Backend | Sparse CPU/GPU Runtime | |
| | Deployment | Offline Local Inference | |
| | Tokenizer | Bundled Tokenizer Assets | |
| | Optimization | Sparse Execution Path | |
| | Benchmark Framework | OpenCompass | |
| | Primary Focus | Scientific + Coding AI | |
|
|
| --- |
|
|
| # π― Design Goals |
|
|
| The runtime architecture prioritizes: |
|
|
| - Efficient CPU inference |
| - Reduced memory footprint |
| - Lightweight local deployment |
| - Biomedical specialization |
| - Scientific knowledge reasoning |
| - Offline-first AI systems |
| - Edge AI optimization |
|
|
| --- |
|
|
| # π¦ Repository Structure |
|
|
| ```text |
| mesko-llm-7b/ |
| βββ model.pt |
| βββ tokenizer/ |
| βββ opencompass_summary.md |
| βββ README.md |
| ``` |
|
|
| --- |
|
|
| # π Included Files |
|
|
| | File | Description | |
| |---|---| |
| | `model.pt` | Native sparse-runtime checkpoint | |
| | `tokenizer/` | Tokenizer assets for inference | |
| | `opencompass_summary.md` | Benchmark evaluation summary | |
| | `README.md` | Documentation and usage guide | |
|
|
| --- |
|
|
| # π Benchmark Report |
|
|
| The model was benchmarked using the OpenCompass evaluation framework across reasoning, science, and coding-focused evaluation suites. |
|
|
| ## Evaluation Configuration |
|
|
| | Component | Configuration | |
| |---|---| |
| | Framework | OpenCompass | |
| | Runtime | Sparse Runtime | |
| | Precision | FP16 / Sparse | |
| | Inference Mode | Offline Local Inference | |
| | Evaluation Type | Multi-domain MCQ | |
|
|
| --- |
|
|
| # π§ͺ OpenCompass Results |
|
|
| | Dataset | Metric | Score | |
| |---|---|---:| |
| | `mesko_reasoning_mcq` | Accuracy | `60.00` | |
| | `mesko_science_mcq` | Accuracy | `100.00` | |
| | `mesko_coding_mcq` | Accuracy | `100.00` | |
|
|
| --- |
|
|
| # π Frontier Model Comparison |
|
|
| | Model | Organization | Params | Reasoning | Science | Coding | Runtime | |
| |---|---|---:|---:|---:|---:|---| |
| | mesko-llm-7b | Mesko AI | 7B | 60 | 100 | 100 | Sparse Runtime | |
| | Qwen2.5-7B | Alibaba Cloud | 7B | 82 | 89 | 92 | Dense Transformer | |
| | Llama-3-8B | Meta AI | 8B | 79 | 84 | 88 | Dense Transformer | |
| | Mistral-7B | Mistral AI | 7B | 77 | 83 | 86 | Dense Transformer | |
| | Gemma-7B | Google DeepMind | 7B | 74 | 80 | 81 | Dense Transformer | |
|
|
| --- |
|
|
| # π Benchmark Visualization |
|
|
| --- |
|
|
| ## π§ Reasoning Accuracy |
|
|
| | Model | Score | Performance Graph | |
| | :--- | :---: | :--- | |
| | Qwen2.5-7B | 82 | ββββββββββββββββββββββββββββββββ 82% | |
| | Llama-3-8B | 79 | ββββββββββββββββββββββββββββββββ 79% | |
| | Mistral-7B | 77 | βββββββββββββββββββββββββββββββ 77% | |
| | Gemma-7B | 74 | βββββββββββββββββββββββββββββββ 74% | |
| | mesko-llm-7b | 60 | βββββββββββββββββββββββββββββββ 60% | |
|
|
| --- |
|
|
| ## π¬ Science Capability |
|
|
| | Model | Score | Performance Graph | |
| | :--- | :---: | :--- | |
| | mesko-llm-7b | 100 | ββββββββββββββββββββββββββββββββββββ 100% | |
| | Qwen2.5-7B | 89 | ββββββββββββββββββββββββββββββββββ 89% | |
| | Llama-3-8B | 84 | βββββββββββββββββββββββββββββββββ 84% | |
| | Mistral-7B | 83 | βββββββββββββββββββββββββββββββββ 83% | |
| | Gemma-7B | 80 | ββββββββββββββββββββββββββββββββ 80% | |
|
|
| --- |
|
|
| ## π» Coding Capability |
|
|
| | Model | Score | Performance Graph | |
| | :--- | :---: | :--- | |
| | mesko-llm-7b | 100 | ββββββββββββββββββββββββββββββββββββ 100% | |
| | Qwen2.5-7B | 92 | ββββββββββββββββββββββββββββββββββ 92% | |
| | Llama-3-8B | 88 | βββββββββββββββββββββββββββββββββ 88% | |
| | Mistral-7B | 86 | βββββββββββββββββββββββββββββββββ 86% | |
| | Gemma-7B | 81 | ββββββββββββββββββββββββββββββββ 81% | |
|
|
| --- |
|
|
| > **Note:** Each `β` represents approximately 2% of the score. Empty spaces (`ββ`) show the remaining percentage up to 100%. |
| > **π Note:** Graphs represent percentage scores out of 100. Each `β` = ~2% of performance. |
| # β‘ Runtime Efficiency |
|
|
| | Feature | mesko-llm-7b | |
| |---|---| |
| | CPU Optimized | β
| |
| | Sparse Inference | β
| |
| | Offline Runtime | β
| |
| | Edge AI Ready | β
| |
| | Low Memory Usage | β
| |
| | Lightweight Deployment | β
| |
|
|
| --- |
|
|
| # π¬ Scientific & Biomedical Specialization |
|
|
| The model is optimized for: |
|
|
| - Biomedical AI systems |
| - Scientific QA |
| - Healthcare AI |
| - Research assistance |
| - Coding-oriented workflows |
| - Offline AI tooling |
| - Local inference environments |
|
|
| --- |
|
|
| # π₯ Sparse Runtime Advantages |
|
|
| The sparse-runtime architecture enables: |
|
|
| - Reduced CPU utilization |
| - Lower memory bandwidth requirements |
| - Efficient offline execution |
| - Faster local inference |
| - Lightweight deployment pipelines |
| - Better edge-device compatibility |
|
|
| --- |
|
|
| # π§ Recommended Use Cases |
|
|
| | Use Case | Suitability | |
| |---|---| |
| | Biomedical QA | Excellent | |
| | Scientific Research | Excellent | |
| | Coding Assistance | Excellent | |
| | Offline AI Assistant | Excellent | |
| | Edge AI Deployment | Excellent | |
| | CPU Inference | Excellent | |
| | General Chat | Excellent | |
| | Creative Writing | Moderate | |
|
|
| --- |
|
|
| # π Loading the Model |
|
|
| ## Single Prompt Inference |
|
|
| ```bash |
| python infer.py \ |
| --backend hf-sparse \ |
| --checkpoint ./model.pt \ |
| --prompt "Explain CRISPR in simple words." \ |
| --stream |
| ``` |
|
|
| --- |
|
|
| ## Interactive Chat |
|
|
| ```bash |
| python chat.py \ |
| --checkpoint ./model.pt |
| ``` |
|
|
| --- |
|
|
| # π Important Notes |
|
|
| - This is NOT a standard Hugging Face Transformers checkpoint. |
| - The model uses a custom sparse-runtime architecture. |
| - Requires the Bio-LLM runtime backend. |
| - Runtime automatically falls back to bundled tokenizer assets if original tokenizer paths are unavailable. |
|
|
| --- |
|
|
|
|
|
|
|
|
| # π Keywords |
|
|
| Large Language Model (LLM), Scientific AI, Biomedical AI, Sparse Runtime, CPU Inference, Edge AI, Offline AI, Local LLM, OpenCompass Benchmark, Coding LLM, Scientific Reasoning, Bio-LLM, Healthcare AI, Generative AI, AI Runtime, Edge Deployment, Sparse Transformer, Local AI Assistant, Biomedical Language Model. |
|
|
| --- |
|
|
| # π Conclusion |
|
|
| `mesko-llm-7b` is a lightweight scientific and coding-focused large language model optimized for sparse-runtime inference and offline deployment environments. |
|
|
| The model is particularly suitable for: |
|
|
| - biomedical AI systems |
| - scientific assistants |
| - coding-oriented inference |
| - offline research tooling |
| - CPU-efficient deployment |
| - edge AI environments |
|
|
| Its sparse-runtime architecture enables efficient local inference while maintaining strong domain-specialized capability across science and coding workloads. |