| --- |
| datasets: |
| - rajpurkar/squad |
| language: |
| - en |
| pipeline_tag: question-answering |
| tags: |
| - qa |
| - question |
| - answering |
| - small |
| - tiny |
| - open-source |
| - bert |
| - Excerp |
| --- |
| |
| # Welcome to Excerp v1 |
| This is our first question answering model for english texts. |
| It is based on a BERT architecture, but trained from scratch on the SQuAD dataset. |
|
|
| ## Score |
| The score is **~20%**. (`{'exact_match': 10.387890255439924, 'f1': 19.81830726643602}`) |
|
|
| ## Benchmark |
| The score is **~20%**, compared to the astounding 87% on BERT SQuAD |
| ## Training code |
| You can find the full training code as `train.py` in this repo :-) |
|
|
| ## Testing the final model |
| We tested the final model on various prompts: |
| ### Example 1: Simple Q&A |
| Input: |
| ```plaintext |
| The Amazon rainforest, also known as Amazonia, is a moist broadleaf |
| tropical rainforest in the Amazon biome that covers most of the Amazon |
| basin of South America. This basin encompasses 7,000,000 km² of which |
| 5,500,000 km² are covered by the rainforest. The majority of the forest |
| is contained within Brazil, with 60% of the rainforest. |
| ``` |
| Output:<br> |
| ❓ Question: How much of the Amazon rainforest is in Brazil?<br> |
| 💬 Answer : 60%<br> |
| 📊 Score : 11.6933<br> |
| 📍 Position: Char 319–322 |
|
|
| ### Example 2: Date |
| Input: |
| ```plaintext |
| The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars |
| in Paris, France. It was constructed from 1887 to 1889 as the centerpiece |
| of the 1889 World's Fair. The tower is 330 metres tall and is the tallest |
| structure in Paris. |
| ``` |
| Output:<br> |
| ❓ Question: When was the Eiffel Tower built?<br> |
| 💬 Answer : 1889<br> |
| 📊 Score : 10.1579<br> |
| 📍 Position: Char 66–196 |
|
|
| ### Example 3: Large context |
| Input: |
| ```plaintext |
| Python is a high-level, general-purpose programming language. Its design |
| philosophy emphasizes code readability with the use of significant indentation. |
| Python is dynamically typed and garbage-collected. It supports multiple |
| programming paradigms, including structured, object-oriented and functional |
| programming. It was created by Guido van Rossum and first released in 1991. |
| Python consistently ranks as one of the most popular programming languages. |
| It is widely used in data science, machine learning, web development, and |
| automation. The Python Package Index (PyPI) hosts hundreds of thousands of |
| third-party modules. The standard library is very extensive, offering tools |
| suited to many tasks. |
| ``` |
| *Repeated 3 times!*<br> |
| Output:<br> |
| ❓ Question: When was Python first released?<br><br> |
| 💬 Answer : 1991<br> |
| 📊 Score : 12.9267<br> |
| 📍 Position: Char 375–379 |
|
|
| ## How to use |
| You can use the model by downloaded `model.zip` and `use.py`. Then, run `use.py` to see what the model does :D |
|
|
| ## Everything is open-source! |