Spaces:
Running
Running
| title: README | |
| emoji: 🦀 | |
| colorFrom: red | |
| colorTo: red | |
| sdk: static | |
| pinned: false | |
| ## Description | |
| Our goal was to create a Proof of Concept (PoC) solution for matching messages from Telegram marketplaces. | |
| There are two models that we developed: | |
| - **RoSBERTa-hermes-ru**: Trained for **location recognition**, **categories labeling**, and **inside-outside location classification**. | |
| - **rubert-tiny-separater**: Trained for **supply and demand** classification. | |
| ## Architecture and Pretraining | |
| ### [RoSBERTa-hermes-ru](https://huggingface.co/poc-embeddings/RoSBERTa-hermes-ru) | |
| RoSBERTa is based on [ai-forever/ru-en-RoSBERTa](https://huggingface.co/ai-forever/ru-en-RoSBERTa) with multiple heads for downstream tasks: | |
| - **Backbone**: Fully unfrozen, with the **NER head** fine-tuned for location recognition. | |
| - **Allocator head**: Trained to determine whether or not a message contains the actual location of the user. | |
| - **Tags head with 1 layer of adapter**: Trained to mark messages with different categories describing the message's context, such as tools, medicine, clothing, and more. | |
| ### [rubert-tiny-separater](https://huggingface.co/poc-embeddings/rubert-tiny-separater) | |
| Rubert is based on [sergeyzh/rubert-tiny-turbo](https://huggingface.co/sergeyzh/rubert-tiny-turbo) with a linear layer on top. The whole model was trained for classifying message types from Telegram marketplaces. | |
| **Labels**: | |
| - **Supply**: Somebody willing to sell something or provide a service. | |
| - **Demand**: Somebody wants to buy something or hire someone. | |
| - **Noise**: Messages unrelated to the topic. | |
| ## Supported Languages | |
| Russian, with English included. | |