kaixkhazaki commited on
Commit
d0b1040
·
verified ·
1 Parent(s): 130cb5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -1
README.md CHANGED
@@ -62,7 +62,7 @@ This model is a fine-tuned version of intfloat/multilingual-e5-large for documen
62
  'patents': 4,
63
  'scientific_articles': 5
64
  }
65
-
66
  ## Training procedure
67
 
68
  Trained on single gpu for 2 epochs for apx. 20 minutes.
@@ -82,3 +82,38 @@ hyperparameters:
82
 
83
  ## Evaluation results
84
  Test Loss: 0.5192, Test Acc: 0.9719
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  'patents': 4,
63
  'scientific_articles': 5
64
  }
65
+ ```
66
  ## Training procedure
67
 
68
  Trained on single gpu for 2 epochs for apx. 20 minutes.
 
82
 
83
  ## Evaluation results
84
  Test Loss: 0.5192, Test Acc: 0.9719
85
+
86
+
87
+ ## Usage:
88
+
89
+ ```python
90
+
91
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
92
+
93
+ # Load model and tokenizer
94
+ tokenizer = AutoTokenizer.from_pretrained("kaixkhazaki/multilingual-e5-doclaynet")
95
+ model = AutoModelForSequenceClassification.from_pretrained("kaixkhazaki/multilingual-e5-doclaynet")
96
+
97
+ # Prepare text (note the "passage: " prefix required for E5 models)
98
+ text = "passage: " + your_document_text
99
+
100
+ # Tokenize and predict
101
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
102
+ outputs = model(**inputs)
103
+ predictions = outputs.logits.softmax(dim=-1)
104
+
105
+ # Get predicted class
106
+ predicted_class = predictions.argmax().item()
107
+
108
+ # Map to label (assuming you've loaded the label mapping)
109
+ label_mapping = {
110
+ 0: 'financial_reports',
111
+ 1: 'government_tenders',
112
+ 2: 'laws_and_regulations',
113
+ 3: 'manuals',
114
+ 4: 'patents',
115
+ 5: 'scientific_articles'
116
+ }
117
+ predicted_label = label_mapping[predicted_class]
118
+
119
+ ```