Commit
·
d13e88c
1
Parent(s):
9ea8fbd
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,48 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
---
|
| 4 |
+
Imagine you have a BERT model – the superhero of natural language understanding – but this one speaks German! We took the powerful "bert-base-german-cased" model and gave it a special mission: classify German text. After intense training, it's ready to help you tackle tasks like sentiment analysis, content categorization, or even semantic search in the German language.
|
| 5 |
+
It is trained on 160K article summaries performs well on semantic search and text classification. I plan on to further fine tune this model with a much larger dataset approximately going to 511K article summaries.
|
| 6 |
+
|
| 7 |
+
How to Use It:
|
| 8 |
+
Let's see how you can unleash this German-speaking superhero on your data:
|
| 9 |
+
|
| 10 |
+
Install Hugging Face Transformers:
|
| 11 |
+
First, make sure you have the Hugging Face Transformers library installed. You can do this with pip:
|
| 12 |
+
|
| 13 |
+
bash
|
| 14 |
+
Copy code
|
| 15 |
+
pip install transformers
|
| 16 |
+
Load the Fine-Tuned Model:
|
| 17 |
+
To use this fine-tuned BERT model, load it with the Transformers library:
|
| 18 |
+
|
| 19 |
+
python
|
| 20 |
+
Copy code
|
| 21 |
+
from transformers import TFBertForSequenceClassification, BertTokenizer
|
| 22 |
+
|
| 23 |
+
# Load the model and tokenizer
|
| 24 |
+
model = TFBertForSequenceClassification.from_pretrained("path/to/your/model/directory")
|
| 25 |
+
tokenizer = BertTokenizer.from_pretrained("bert-base-german-cased")
|
| 26 |
+
Prepare Your Text:
|
| 27 |
+
You can perform text classification with this model. Tokenize your text using the tokenizer:
|
| 28 |
+
|
| 29 |
+
python
|
| 30 |
+
text = "Deine Textnachricht hier" # Your German text
|
| 31 |
+
inputs = tokenizer(text, padding='max_length', truncation=True, max_length=128, return_tensors='tf', return_attention_mask=True)
|
| 32 |
+
Get Predictions:
|
| 33 |
+
Predict the label or class for your text:
|
| 34 |
+
|
| 35 |
+
python
|
| 36 |
+
with tf.device('/GPU:0'):
|
| 37 |
+
outputs = model(inputs)
|
| 38 |
+
predicted_class = tf.argmax(outputs.logits, axis=1).numpy()[0]
|
| 39 |
+
The predicted_class will give you the predicted label for your text.
|
| 40 |
+
|
| 41 |
+
Semantic Search:
|
| 42 |
+
For semantic search, you can create embeddings for a list of text and calculate the similarity between your query text and the documents, as discussed earlier. The model can help you find similar content with ease.
|
| 43 |
+
|
| 44 |
+
That's it! Your fine-tuned BERT model is now your ally for handling various text-based tasks in the German language. Whether it's text classification or semantic search, this model is ready to assist you on your NLP adventures.
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
Feel free to reach out if you have questions or need assistance in using this model to accomplish your German language processing tasks. Viel Erfolg! (Good luck!)
|