YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Cloud Log Classifier using CodeBERT
This project implements a cloud platform log classifier using a fine-tuned CodeBERT model. It can classify logs from AWS, Azure, and GCP with high accuracy.
The model was fine-tuned on a dataset of simulated cloud logs using microsoft/codebert-base-mlm as the base model.
π Project Structure
.
βββ Cloud_Classifier_using_codebert.ipynb # Jupyter Notebook containing training and evaluation code
βββ cloud-log-classifier-final/ # Saved model directory (generated after training)
β βββ config.json
β βββ pytorch_model.bin
β βββ tokenizer_config.json
β βββ ...
βββ cloud-log-classifier-final.zip # Zipped model for distribution
βββ README.md # This file
π Usage
You can use the fine-tuned model in your own projects using the CloudLogClassifier class.
Prerequisites
pip install torch transformers scikit-learn numpy
Python Inference Code
Save the following code as classifier.py or use it directly in your python scripts. Ensure you have the cloud-log-classifier-final folder (unzipped) in your working directory.
import torch
import json
import numpy as np
from transformers import RobertaForSequenceClassification, RobertaTokenizer
class CloudLogClassifier:
"""
Reusable classifier for cloud platform detection from logs.
"""
def __init__(self, model_path):
"""
Load the fine-tuned model and tokenizer
Args:
model_path (str): Path to the directory containing the saved model files
"""
self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
try:
self.model = RobertaForSequenceClassification.from_pretrained(model_path)
self.tokenizer = RobertaTokenizer.from_pretrained(model_path)
self.model.to(self.device)
self.model.eval()
# Load label mapping
with open(f'{model_path}/label_mapping.json', 'r') as f:
mappings = json.load(f)
# Convert keys back to integers for the dictionary
self.id2label = {int(k): v for k, v in mappings['id2label'].items()}
except Exception as e:
raise RuntimeError(f"Failed to load model from {model_path}: {str(e)}")
def predict(self, log_text):
"""
Predict cloud platform from log text
Args:
log_text (str): Log text to classify
Returns:
dict: Prediction results with label and confidence
"""
# Tokenize input
inputs = self.tokenizer(
log_text,
return_tensors='pt',
truncation=True,
padding='max_length',
max_length=128
).to(self.device)
# Get prediction
with torch.no_grad():
outputs = self.model(**inputs)
logits = outputs.logits
probabilities = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
confidence = probabilities[0][predicted_class].item()
return {
'platform': self.id2label[predicted_class],
'confidence': confidence,
'all_probabilities': {
self.id2label[i]: prob.item()
for i, prob in enumerate(probabilities[0])
}
}
# Usage Example
if __name__ == "__main__":
# Path to your unzipped model folder
model_path = './cloud-log-classifier-final'
try:
classifier = CloudLogClassifier(model_path)
test_logs = [
"[ 3.6936] ena 0000:00:05.0: Elastic Network Adapter (ENA)",
"AzureLinuxAgent: INFO Starting Azure Linux Agent",
"google_guest_agent INFO GCE Agent running"
]
print("Predictions:")
for log in test_logs:
result = classifier.predict(log)
print(f"\nLog: {log}")
print(f"Predicted Platform: {result['platform'].upper()}")
print(f"Confidence: {result['confidence']:.2%}")
except Exception as e:
print(f"Error: {e}")
π Model Performance
The model achieves high accuracy on the test set, effectively distinguishing between different cloud provider log formats (AWS, Azure, GCP).
| Metric | Score |
|---|---|
| Accuracy | ~98% |
| Precision | >95% |
| Recall | >95% |
| F1-Score | >95% |
π οΈ Training
The model was trained using the Trainer API from HuggingFace Transformers.
- Base Model: microsoft/codebert-base-mlm
- Epochs: 5
- Batch Size: 16
- Learning Rate: Default (5e-5)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support