Saif10 commited on
Commit
df42854
·
verified ·
1 Parent(s): 1eb97c5

Create README.md

Browse files



@misc
{StructBERT2025,
author = {Saif},
title = {StructBERT Encoder for DSA},
year = {2025},
howpublished = {\url{https://huggingface.co/Saif10/StructBERT-encoder}}
}

Files changed (1) hide show
  1. README.md +45 -0
README.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - bert
5
+ - masked-language-model
6
+ - structbert
7
+ - dsa
8
+ ---
9
+
10
+ # StructBERT Encoder
11
+
12
+ This model is a **StructBERT variant** fine-tuned on a custom Data Structures and Algorithms (DSA) corpus.
13
+
14
+ ## Model Details
15
+
16
+ - **Architecture:** BERT (Masked Language Modeling)
17
+ - **Tokenizer:** BERT tokenizer
18
+ - **Training Data:** Merged DSA corpus (~32k lines)
19
+ - **Framework:** Hugging Face Transformers
20
+
21
+ ## Intended Use
22
+
23
+ - Predict missing tokens in DSA-related text
24
+ - Research, education, and NLP experimentation
25
+
26
+ ## Limitations
27
+
28
+ - Small corpus (~32k lines), so may not generalize beyond DSA content
29
+ - Token predictions may be biased toward training examples
30
+ - Not intended for production-grade applications
31
+
32
+ ## Example Usage
33
+
34
+ ```python
35
+ from transformers import BertTokenizer, BertForMaskedLM
36
+
37
+ tokenizer = BertTokenizer.from_pretrained("Saif10/StructBERT-encoder")
38
+ model = BertForMaskedLM.from_pretrained("Saif10/StructBERT-encoder")
39
+
40
+ text = "Binary search works by dividing the [MASK] into two halves."
41
+ inputs = tokenizer(text, return_tensors="pt")
42
+ outputs = model(**inputs)
43
+ predicted_token_id = outputs.logits.argmax(-1)
44
+ predicted_token = tokenizer.decode(predicted_token_id[0])
45
+ print(predicted_token)