Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,73 @@ library_name: transformers
|
|
| 9 |
tags:
|
| 10 |
- modernbert
|
| 11 |
- m-bert
|
| 12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
tags:
|
| 10 |
- modernbert
|
| 11 |
- m-bert
|
| 12 |
+
---
|
| 13 |
+
# **MBERT Context Specifier**
|
| 14 |
+
|
| 15 |
+
*MBERT Context Specifier* with 150M parameters is a text-based context labeler or classifier trained using the modernized bidirectional encoder-only Transformer model (BERT-style). This model is pre-trained on 2 trillion tokens of English and code data, with a native context length of up to 8,192 tokens. It incorporates the following features:
|
| 16 |
+
|
| 17 |
+
1. **Rotary Positional Embeddings (RoPE):** Enables long-context support.
|
| 18 |
+
2. **Local-Global Alternating Attention:** Enhances efficiency when processing long inputs.
|
| 19 |
+
3. **Unpadding and Flash Attention:** Optimizes efficient inference.
|
| 20 |
+
|
| 21 |
+
ModernBERT’s native long-context length makes it ideal for tasks requiring the processing of lengthy documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a vast dataset of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.
|
| 22 |
+
|
| 23 |
+
# **Run inference**
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
from transformers import pipeline
|
| 27 |
+
|
| 28 |
+
# load model from huggingface.co/models using our repository id
|
| 29 |
+
classifier = pipeline(
|
| 30 |
+
task="text-classification",
|
| 31 |
+
model="prithivMLmods/MBERT-Context-Specifier",
|
| 32 |
+
device=0,
|
| 33 |
+
)
|
| 34 |
+
|
| 35 |
+
sample = "The global market for sustainable technologies has seen rapid growth over the past decade as businesses increasingly prioritize environmental sustainability."
|
| 36 |
+
|
| 37 |
+
classifier(sample)
|
| 38 |
+
```
|
| 39 |
+
# **Intended Use**
|
| 40 |
+
|
| 41 |
+
The MBERT Context Specifier is designed for the following purposes:
|
| 42 |
+
|
| 43 |
+
1. **Text and Code Classification:**
|
| 44 |
+
- Assigning contextual labels to large text or code inputs.
|
| 45 |
+
- Suitable for tasks requiring semantic understanding of both text and code.
|
| 46 |
+
|
| 47 |
+
2. **Long-Document Processing:**
|
| 48 |
+
- Ideal for tasks like document retrieval, summarization, and classification within lengthy documents (up to 8,192 tokens).
|
| 49 |
+
|
| 50 |
+
3. **Semantic Search:**
|
| 51 |
+
- Enables semantic understanding and hybrid (text + code) searches across large corpora.
|
| 52 |
+
- Applicable in industries requiring domain-specific retrieval tasks (e.g., legal, healthcare, and finance).
|
| 53 |
+
|
| 54 |
+
4. **Code Retrieval and Documentation:**
|
| 55 |
+
- Retrieving relevant code snippets or understanding context in large repositories of codebases and technical documentation.
|
| 56 |
+
|
| 57 |
+
5. **Language Understanding and Analysis:**
|
| 58 |
+
- General-purpose tasks like question answering, summarization, and sentiment analysis over large text inputs.
|
| 59 |
+
|
| 60 |
+
6. **Efficient Inference with Long Contexts:**
|
| 61 |
+
- Optimized for scenarios requiring efficient processing of large inputs with minimal computational overhead, thanks to Flash Attention and RoPE.
|
| 62 |
+
|
| 63 |
+
# **Limitations**
|
| 64 |
+
|
| 65 |
+
1. **Domain-Specific Performance:**
|
| 66 |
+
- While pre-trained on a large corpus of text and code, MBERT may require fine-tuning for niche or highly specialized domains to achieve optimal performance.
|
| 67 |
+
|
| 68 |
+
2. **Tokenization Constraints:**
|
| 69 |
+
- Inputs exceeding the 8,192-token limit will need truncation or intelligent preprocessing to avoid losing critical information.
|
| 70 |
+
|
| 71 |
+
3. **Bias in Training Data:**
|
| 72 |
+
- The pre-training data (text + code) may include biases from the source corpora, leading to biased classifications or retrievals in certain contexts.
|
| 73 |
+
|
| 74 |
+
4. **Code-Specific Challenges:**
|
| 75 |
+
- While MBERT supports code understanding, it may struggle with niche programming languages or highly domain-specific coding standards without fine-tuning.
|
| 76 |
+
|
| 77 |
+
5. **Inference Costs on Resource-Constrained Devices:**
|
| 78 |
+
- Processing long-context inputs can be computationally expensive, making MBERT less suitable for edge devices or environments with limited computational resources.
|
| 79 |
+
|
| 80 |
+
6. **Multilingual Support:**
|
| 81 |
+
- While optimized for English and code, MBERT may perform sub-optimally for other languages unless explicitly fine-tuned on multilingual datasets.
|