You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MicroFlow: A Pretrained Mixture of Experts Model for Bacterial/Metagenomic Sequence Analysis

Model Description

MicroFlow is a pretrained language model built on the Mixtral Mixture of Experts (MoE) architecture, specifically optimized for analyzing bacterial and metagenomic sequences. Trained on large-scale tokenized metagenomic datasets, this model leverages a custom bidirectional attention mechanism to capture bidirectional semantic dependencies in microbial sequences, serving as a foundational model for downstream metagenomic analysis tasks (e.g., sequence classification, taxonomic annotation, and microbial community profiling).

Key Features

1. Architecture Design

Base Architecture: Mixture of Experts (MoE) pretrained model based on Mixtral
Parameter Scale: Configurable parameter scale aligned with Mixtral MoE variants (adjustable via MixtralConfig)
Attention Mechanism: Bidirectional attention mechanism (non-causal) implemented via custom SDPA (Scaled Dot Product Attention) and FlashAttention-2 with GQA (Grouped Query Attention) support
Tokenization: Custom BPE (Byte-Pair Encoding) tokenizer extended with microbial-specific special tokens (<abu>, <name>), with vocab size consistent with the pretrained base tokenizer
Position Encoding: RoPE (Rotary Positional Encoding) with configurable theta (default: 10000)
Expert System: Inherits Mixtral’s MoE expert configuration (8 local experts, 2 experts activated per token)

2. Pretraining Strategy

Pretraining Data: 3,264,597 metagenomic token sequences in plain text format (grouped by token length: ≤160, 160<len≤320, 320<len≤2048), with sequences tagged by <abu>/<name> based on structural features
Sequence Processing: Token sequences truncated/padded to target length (160/320/2048) with <pad> token, no truncation of semantic boundaries
Training Objectives:
- Masked Language Modeling (MLM, 15% masking probability, optional)
- BERT-style pretraining with bidirectional attention (non-causal)
- Multi-stage progressive pretraining (160→320→2048 tokens) to stabilize long-sequence training
- MoE router auxiliary loss (scaled by configurable coefficient) to optimize expert selection

Important:
This model requires proper setup of the custom bidirectional attention mechanism before loading. Ensure you follow the setup steps in the correct order:

  1) Define custom bidirectional attention functions (SDPA/FlashAttention-2),
  2) Register the custom attention functions to ALL_ATTENTION_FUNCTIONS,
  3) Configure model with `attn_implementation="bidirectional_flash"` (for FlashAttention) or "bidirectional" (for SDPA),
  4) Load model weights and tokenizer (extend with <abu>/<name> special tokens).

The extracted embeddings capture deep semantic features of metagenomic sequences and can be used directly for downstream analysis tasks (e.g., taxonomic classification) without additional fine-tuning.

Citation

If you use this pretrained model in your research, please cite:

@software{microflow_metagenomic2025,
  title = {MicroFlow: A Pretrained Mixture of Experts Model for Bacterial/Metagenomic Sequence Analysis},
  author = {Zhang, Chao},
  year = {2025},
  url = {https://github.com/zhangchao162/microflow},
  note = {Pretrained MoE model with bidirectional SDPA/FlashAttention and custom BPE tokenization for metagenomic sequence analysis}
}

Contact

For questions about model usage, pretraining pipeline, or fine-tuning guidance for downstream metagenomic tasks, please contact 1623804006@qq.com.

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support