| --- |
| datasets: |
| - jerteh/cc100-sr-jerteh |
| - jerteh/SrpWiki |
| - jerteh/SrpELTeC |
| - srwac |
| - procesaur/znanje |
| language: |
| - sr |
| tags: |
| - Srpski |
| - Serbian |
| - GPT2 |
| - generisanje |
| license: cc-by-sa-4.0 |
| pipeline_tag: text-generation |
| widget: |
| - text: Kada bi čovek znao gde će pasti, |
| - text: Jednom davno, |
| - text: Srbija je |
| - text: Najbolji lek za stres je |
| --- |
| |
|
|
| <h4><!--<i class="highlight-container"><b class="highlight">sr-gpt2-large</b></i> je sada --> |
|
|
| <i class="highlight-container"><b class="highlight">gpt2-orao</b></i> — |
| Najveći generativni model za srpski jezik.</h4> |
|
|
| <img src="cover.png" class="cover"> |
| <div id="zastava"> |
| <div class="grb"> |
| <img src="https://www.ai.gov.rs/img/logo_60x120-2.png" style="position:relative; left:30px; z-index:10; height:85px"> |
| </div> |
| <table width=100% style="border:0px"> |
| <tr style="background-color:#C6363C;width:100%;border:0px;height:30px"><td style="width:100vw"></td></tr> |
| <tr style="background-color:#0C4076;width:100%;border:0px;height:30px"><td></td></tr> |
| <tr style="background-color:#ffffff;width:100%;border:0px;height:30px"><td></td></tr> |
| </table> |
| </div> |
| |
|
|
| <ul style="font-weight:bold"> |
| <li>Generiše novi tekst, ili nastavlja započeti tekstualni unos</li> |
| <li>Zasnovan na GPT2-large arhitekturi, 810 miliona parametara</li> |
| <li>Obučavan na korpusu srpskog jezika veličine 4 milijarde tokena</li> |
| <li>Jednaka podrška unosa i na ćirilici i na latinici!</li> |
| </ul> |
| |
| ## Upotreba |
|
|
| ```python |
| >>> from transformers import pipeline, set_seed |
| >>> generator = pipeline('text-generation', model='jerteh/gpt2-orao') |
| >>> set_seed(23) |
| >>> generator("", max_length=30, num_return_sequences=5) |
| ``` |
|
|
| ``` |
| [{'generated_text': 'Ja, međutim, ne idem na Adu - kaže Miodrag.'}, |
| {'generated_text': 'Domaćinstvo se nalazilo na mestu zvanom Kulina (ranije Kulina Vakuf) i bilo je jedno od najvećih i naj'}, |
| {'generated_text': 'Regionalne razlike se uglavnom odnose na geografski položaj, geografsko-geografski položaj i ekonomsku razvijenost.'}, |
| {'generated_text': 'Od tada do danas Srbija ne stoji na nogama'}, |
| {'generated_text': 'Iz tog razloga, na ovaj način se postiže bolja efikasnost rada, odnosno smanjuje se vreme potrebno za sprovođenje simulacije.'}] |
| ``` |
|
|
| Pored navedenih, model je obučavan i na ostalim korpusima [Društva za jezičke resurse i tehnologije](https://jerteh.rs), |
| uključujući korpuse savremenog srpskog jezika: SrpKor2013 i SrpKor2021, |
| kao i korpus [PDRS 1.0](https://www.clarin.si/repository/xmlui/handle/11356/1752) razvijen od strane Instituta za Srpski jezik SANU. |
|
|
| <h4>U slučaju potrebe za manjim modelom, pogledajte <a href="https://huggingface.co/jerteh/gpt2-vrabac" class="highlight-container"> |
| <b class="highlight">gpt2-vrabac</b></a> — manji model obučen na istom korpusu.</h4> |
|
|
| <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:40px"> |
| <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Autor</div> |
| <a href="https://huggingface.co/procesaur"> |
| <div class="flex"> |
| <div |
| style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%; |
| background-size: cover; background-image: url('https://cdn-uploads.huggingface.co/production/uploads/1673534533167-63bc254fb8c61b8aa496a39b.jpeg?w=200&h=200&f=face')"> |
| </div> |
| </div> |
| </a> |
| <div style="text-align: center; font-size: 16px; font-weight: 800">Mihailo Škorić</div> |
| <div> |
| <a href="https://huggingface.co/procesaur"> |
| <div style="text-align: center; font-size: 14px;">@procesaur</div> |
| </a> |
| </div> |
| </div> |
| </div> |
| |
| <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:40px"> |
| <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Computation</div> |
| <a href="https://www.ai.gov.rs/"> |
| <div class="flex"> |
| <div |
| style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%; |
| background-size: contain; background-image: url(https://www.ai.gov.rs/img/logo_60x120-2.png);background-repeat: no-repeat; |
| background-position: center;"> |
| </div> |
| </div> |
| </a> |
| <div style="text-align: center; font-size: 16px; font-weight: 800" title="nVidia DGX-zasnovan sistem">Nacionalna AI platforma</div> |
| <div> |
| <a href="https://www.ai.gov.rs/"> |
| <div style="text-align: center; font-size: 14px;">ai.gov.rs</div> |
| </a> |
| </div> |
| </div> |
| </div> |
| |
| <div class="inline-flex flex-col" style="line-height: 1.5;padding-right:40px"> |
| <div style="text-align: center; margin-top: 3px; font-size: 16px; font-weight: 800">Data</div> |
| <a href="https://jerteh.rs/"> |
| <div class="flex"> |
| <div |
| style="display:DISPLAY_1; margin-left: auto; margin-right: auto; width: 92px; height:92px; border-radius: 50%; |
| background-size: contain; background-image: url(https://cdn-avatars.huggingface.co/v1/production/uploads/1673601491672-63bc254fb8c61b8aa496a39b.png?w=200&h=200&f=face);background-repeat: no-repeat; |
| background-position: center;"> |
| </div> |
| </div> |
| </a> |
| <div style="text-align: center; font-size: 16px; font-weight: 800" title="Društvo za jezičke resurse i tehnologije">JeRTeh</div> |
| <div> |
| <a href="https://huggingface.co/jerteh"> |
| <div style="text-align: center; font-size: 14px;">@jerteh</div> |
| </a> |
| </div> |
| </div> |
| </div> |
| |
| ## Citiranje |
|
|
| ```bibtex |
| @article{skoric24modeli, |
| author = {Mihailo \vSkori\'c}, |
| title = {Novi jezi\vcki modeli za srpski jezik}, |
| journal = {Infoteka}, |
| volume = {24}, |
| issue = {1}, |
| year = {2024}, |
| publisher = {Zajednica biblioteka univerziteta u Srbiji, Beograd}, |
| url = {https://arxiv.org/abs/2402.14379} |
| } |
| ``` |
|
|
| <style> |
| .ffeat: { |
| color:red |
| } |
| |
| .cover { |
| width: 100%; |
| margin-bottom: 5pt |
| } |
| |
| .highlight-container, .highlight { |
| position: relative; |
| text-decoration:none |
| } |
| |
| .highlight-container { |
| display: inline-block; |
| |
| } |
|
|
| .highlight{ |
| color:white; |
| text-transform:uppercase; |
| font-size: 16pt; |
| } |
|
|
| .highlight-container{ |
| padding:5px 10px |
| } |
| |
| .highlight-container:before { |
| content: " "; |
| display: block; |
| height: 100%; |
| width: 100%; |
| margin-left: 0px; |
| margin-right: 0px; |
| position: absolute; |
| background: #e80909; |
| transform: rotate(2deg); |
| top: -1px; |
| left: -1px; |
| border-radius: 20% 25% 20% 24%; |
| padding: 10px 18px 18px 10px; |
| } |
|
|
| div.grb, #zastava>table { |
| position:absolute; |
| top:0px; |
| left: 0px; |
| margin:0px |
| } |
|
|
| div.grb>img, #zastava>table{ |
| margin:0px |
| } |
| |
| #zastava { |
| position: relative; |
| margin-bottom:120px |
| } |
| </style> |