--- title: README emoji: 🦀 colorFrom: red colorTo: red sdk: static pinned: false --- ## Description Our goal was to create a Proof of Concept (PoC) solution for matching messages from Telegram marketplaces. There are two models that we developed: - **RoSBERTa-hermes-ru**: Trained for **location recognition**, **categories labeling**, and **inside-outside location classification**. - **rubert-tiny-separater**: Trained for **supply and demand** classification. ## Architecture and Pretraining ### [RoSBERTa-hermes-ru](https://huggingface.co/poc-embeddings/RoSBERTa-hermes-ru) RoSBERTa is based on [ai-forever/ru-en-RoSBERTa](https://huggingface.co/ai-forever/ru-en-RoSBERTa) with multiple heads for downstream tasks: - **Backbone**: Fully unfrozen, with the **NER head** fine-tuned for location recognition. - **Allocator head**: Trained to determine whether or not a message contains the actual location of the user. - **Tags head with 1 layer of adapter**: Trained to mark messages with different categories describing the message's context, such as tools, medicine, clothing, and more. ### [rubert-tiny-separater](https://huggingface.co/poc-embeddings/rubert-tiny-separater) Rubert is based on [sergeyzh/rubert-tiny-turbo](https://huggingface.co/sergeyzh/rubert-tiny-turbo) with a linear layer on top. The whole model was trained for classifying message types from Telegram marketplaces. **Labels**: - **Supply**: Somebody willing to sell something or provide a service. - **Demand**: Somebody wants to buy something or hire someone. - **Noise**: Messages unrelated to the topic. ## Supported Languages Russian, with English included.