File size: 4,242 Bytes
f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 604991a f3e1950 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
license: apache-2.0
library_name: sentence-transformers
tags:
- sentence-transformers
- semantic-search
- feature-extraction
- sentence-similarity
- cybersecurity
pipeline_tag: sentence-similarity
---
# MrZaper/LiteModel
**MrZaper/LiteModel** is a lightweight [sentence-transformers](https://www.SBERT.net) model fine-tuned for **semantic search and retrieval of academic articles in cybersecurity**.
It maps queries and article phrases into a 384-dimensional dense vector space for similarity search, clustering, and semantic matching.
This model is specifically trained for the **journal: Cybersecurity: Education, Science, Technology**
Website: [https://csecurity.kubg.edu.ua](https://csecurity.kubg.edu.ua/index.php/journal)
# What does it do?
Given a query in **English, Ukrainian, or any other language**, the model:
- Translates the query to English (using Google Translate).
- Encodes the query into a dense embedding using Sentence-BERT.
- Computes cosine similarity between the query embedding and **precomputed article embeddings**.
- Returns the top **unique article codes** with highest similarity scores.
Returned article codes can be viewed at:
```
https://csecurity.kubg.edu.ua/index.php/journal/article/view/{CODE}
```
For example:
`560` β [https://csecurity.kubg.edu.ua/index.php/journal/article/view/560](https://csecurity.kubg.edu.ua/index.php/journal/article/view/560)
---
# Model Files
The repository includes:
- `LiteModel` β SBERT-based semantic encoder
- `sbert_embeddings.npy` β Precomputed embeddings for articles
- `sbert_labels.pkl` β Corresponding article codes (e.g., `560`, `532`)
---
# Usage (Sentence-Transformers)
Install the required package:
```bash
pip install -U sentence-transformers deep-translator huggingface-hub scikit-learn
```
Example usage:
```python
from sentence_transformers import SentenceTransformer
import numpy as np
import pickle
from huggingface_hub import snapshot_download
from deep_translator import GoogleTranslator
import os
from sklearn.metrics.pairwise import cosine_similarity
# Load model and data from Hugging Face
model_name = 'MrZaper/LiteModel'
model_dir = snapshot_download(repo_id=model_name)
# Load SBERT model
sbert_model = SentenceTransformer(model_dir)
# Load precomputed article embeddings
embeddings = np.load(os.path.join(model_dir, "sbert_embeddings.npy"))
# Load article codes (labels)
with open(os.path.join(model_dir, "sbert_labels.pkl"), 'rb') as f:
labels = pickle.load(f)
def preprocess_query(query: str) -> str:
"""Translate the query to English using Google Translate."""
try:
return GoogleTranslator(source="auto", target="en").translate(query)
except Exception as e:
print(f"Translation error: {e}")
return query
def predict_semantic(query, model, embeddings, labels, top_n=5):
"""Find top-N most semantically similar unique article codes."""
query_emb = model.encode([preprocess_query(query)])
similarities = cosine_similarity(query_emb, embeddings)[0]
seen_keys = set()
results = []
# Sort results by similarity (descending)
sorted_indices = np.argsort(similarities)[::-1]
for idx in sorted_indices:
label = labels[idx]
sim = similarities[idx]
if label not in seen_keys:
seen_keys.add(label)
results.append({
"article_code": label,
"similarity": float(sim)
})
print(f"π Article {label} β similarity: {sim * 100:.2f}%")
if len(results) >= top_n:
break
return results
# Example query
query = "sql injection in websites"
results = predict_semantic(query, sbert_model, embeddings, labels)
print("\nTop article codes:")
for res in results:
print(f"Article {res['article_code']} β similarity: {res['similarity']*100:.2f}%")
```
# Example Output
π Article 560 β similarity: 92.15%
π Article 532 β similarity: 89.34%
π Article 475 β similarity: 85.22%
Corresponding links:
```bach
https://csecurity.kubg.edu.ua/index.php/journal/article/view/560
https://csecurity.kubg.edu.ua/index.php/journal/article/view/532
https://csecurity.kubg.edu.ua/index.php/journal/article/view/475
```
|