Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
datasets:
|
| 3 |
+
- rajpurkar/squad
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
pipeline_tag: question-answering
|
| 7 |
+
tags:
|
| 8 |
+
- qa
|
| 9 |
+
- question
|
| 10 |
+
- answering
|
| 11 |
+
- small
|
| 12 |
+
- tiny
|
| 13 |
+
- open-source
|
| 14 |
+
- bert
|
| 15 |
+
- distill
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# Welcome to Distill v1
|
| 19 |
+
This is our first question answering model for english texts.
|
| 20 |
+
It is based on a BERT architecture, but trained from scratch on the SQuAD dataset.
|
| 21 |
+
|
| 22 |
+
## Score
|
| 23 |
+
The score is **~20%**. (`{'exact_match': 10.387890255439924, 'f1': 19.81830726643602}`)
|
| 24 |
+
|
| 25 |
+
## Training code
|
| 26 |
+
You can find the full training code as `train.py` in this repo :-)
|
| 27 |
+
|
| 28 |
+
## Testing the final model
|
| 29 |
+
We tested the final model on various prompts:
|
| 30 |
+
### Example 1: Simple Q&A
|
| 31 |
+
Input:
|
| 32 |
+
```plaintext
|
| 33 |
+
The Amazon rainforest, also known as Amazonia, is a moist broadleaf
|
| 34 |
+
tropical rainforest in the Amazon biome that covers most of the Amazon
|
| 35 |
+
basin of South America. This basin encompasses 7,000,000 km² of which
|
| 36 |
+
5,500,000 km² are covered by the rainforest. The majority of the forest
|
| 37 |
+
is contained within Brazil, with 60% of the rainforest.
|
| 38 |
+
```
|
| 39 |
+
<br>
|
| 40 |
+
Output:
|
| 41 |
+
❓ Question: How much of the Amazon rainforest is in Brazil?
|
| 42 |
+
💬 Answer : 60%
|
| 43 |
+
📊 Score : 11.6933
|
| 44 |
+
📍 Position: Char 319–322
|
| 45 |
+
|
| 46 |
+
### Example 2: Date
|
| 47 |
+
Input:
|
| 48 |
+
```plaintext
|
| 49 |
+
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars
|
| 50 |
+
in Paris, France. It was constructed from 1887 to 1889 as the centerpiece
|
| 51 |
+
of the 1889 World's Fair. The tower is 330 metres tall and is the tallest
|
| 52 |
+
structure in Paris.
|
| 53 |
+
```
|
| 54 |
+
<br>
|
| 55 |
+
Output:
|
| 56 |
+
❓ Question: When was the Eiffel Tower built?
|
| 57 |
+
💬 Answer : 1889
|
| 58 |
+
📊 Score : 10.1579
|
| 59 |
+
📍 Position: Char 66–196
|
| 60 |
+
|
| 61 |
+
### Example 3: Large context
|
| 62 |
+
Input:
|
| 63 |
+
```plaintext
|
| 64 |
+
Python is a high-level, general-purpose programming language. Its design
|
| 65 |
+
philosophy emphasizes code readability with the use of significant indentation.
|
| 66 |
+
Python is dynamically typed and garbage-collected. It supports multiple
|
| 67 |
+
programming paradigms, including structured, object-oriented and functional
|
| 68 |
+
programming. It was created by Guido van Rossum and first released in 1991.
|
| 69 |
+
Python consistently ranks as one of the most popular programming languages.
|
| 70 |
+
It is widely used in data science, machine learning, web development, and
|
| 71 |
+
automation. The Python Package Index (PyPI) hosts hundreds of thousands of
|
| 72 |
+
third-party modules. The standard library is very extensive, offering tools
|
| 73 |
+
suited to many tasks.
|
| 74 |
+
```
|
| 75 |
+
*Repeated 3 times!*
|
| 76 |
+
<br>
|
| 77 |
+
Output:
|
| 78 |
+
❓ Question: When was Python first released?
|
| 79 |
+
💬 Answer : 1991
|
| 80 |
+
📊 Score : 12.9267
|
| 81 |
+
📍 Position: Char 375–379
|
| 82 |
+
|
| 83 |
+
## How to use
|
| 84 |
+
You can use the model by downloaded `model.zip` and `use.py`. Then, run `use.py` to see what the model does :D
|
| 85 |
+
|
| 86 |
+
## Everything is open-source!
|