| | --- |
| | base_model: pawan2411/address_net |
| | datasets: [] |
| | language: [] |
| | library_name: sentence-transformers |
| | pipeline_tag: sentence-similarity |
| | tags: |
| | - sentence-transformers |
| | - sentence-similarity |
| | - feature-extraction |
| | - generated_from_trainer |
| | - dataset_size:4008 |
| | - loss:MultipleNegativesRankingLoss |
| | widget: |
| | - source_sentence: Orchard Road 313, Singapore 238895 |
| | sentences: |
| | - Orchard Rd 313, Singapore 238895 |
| | - 15 Rue de la Paix/75002/France |
| | - NY, 5th Avenue and 57th Street |
| | - source_sentence: 1 Raffles Place, One Raffles Place, Singapore 048616 |
| | sentences: |
| | - 1 Raffles Place, Singapore 048616 |
| | - Madrid 28001 Spain Calle Serrano 30 |
| | - Kurfürstendamm 185/10707 Berlin/Germany |
| | - source_sentence: Kurfürstendamm 207-208, 10719 Berlin, Germany |
| | sentences: |
| | - Argentina CABA C1073ABA 1925 Avenida 9 de Julio |
| | - Kurfürstendamm ๒๐๗-๒๐๘, ๑๐๗๑๙ Berlin, Germany |
| | - 123 Main St, Anytown, AB T1A 1A1 |
| | - source_sentence: Via Tornabuoni, 50123 Firenze FI, Italy |
| | sentences: |
| | - Hamngatan 18-20, Stockholm, Sweden |
| | - 1 Florida, Argentina |
| | - Tornabuoni St, 50123 Italy |
| | - source_sentence: Nanjing Road Pedestrian Street, Huangpu, Shanghai 200001, China |
| | sentences: |
| | - Nanjing Rd Ped St, Huangpu Dist, Shanghai, China |
| | - 5 Rue du Faubourg Saint-Honoré, Paris, France |
| | - 6 Place d'Italie, Paris |
| | --- |
| | ## Address Embedding Model |
| |
|
| |  |
| |
|
| | This model generates embeddings for addresses, designed to facilitate address matching, deduplication, and standardization tasks. |
| | ## Model description |
| | The Address Matching Embedding Model is designed to create vector representations of addresses that capture semantic similarities, making it easier to match and deduplicate addresses across different formats and styles. |
| | - **Model Type:** Sentence Transformer |
| | - **Base model:** [pawan2411/address_net](https://huggingface.co/pawan2411/address_net) <!-- at revision 59a25ad94c91cf025ae8d44f21e404c387065b4b --> |
| | - **Maximum Sequence Length:** 512 tokens |
| | - **Output Dimensionality:** 768 tokens |
| | - **Similarity Function:** Cosine Similarity |
| |
|
| | ### Model Sources |
| |
|
| | - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
| | - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
| | - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
| |
|
| | ### Full Model Architecture |
| |
|
| | ``` |
| | SentenceTransformer( |
| | (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel |
| | (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| | ) |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Direct Usage (Sentence Transformers) |
| |
|
| | First install the Sentence Transformers library: |
| |
|
| | ```bash |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | Then you can load this model and run inference. |
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | # Download from the 🤗 Hub |
| | model = SentenceTransformer("pawan2411/address-emnet") |
| | # Run inference |
| | sentences = [ |
| | '60 Ratchadaphisek Rd, Khwaeng Khlong Toei, Khet Khlong Toei, Krung Thep Maha Nakhon 10110', |
| | '60 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand', |
| | '61 Ratchadaphisek Road, Krung Thep Maha Nakhon, Thailand' |
| | ] |
| | embeddings = model.encode(sentences) |
| | print(embeddings.shape) |
| | # [3, 768] |
| | |
| | # Get the similarity scores for the embeddings |
| | similarities = model.similarity(embeddings, embeddings) |
| | print(similarities) |
| | ``` |
| |
|
| |
|
| | <!-- |
| | ### Direct Usage (Transformers) |
| |
|
| | <details><summary>Click to see the direct usage in Transformers</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Downstream Usage (Sentence Transformers) |
| |
|
| | You can finetune this model on your own dataset. |
| |
|
| | <details><summary>Click to expand</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Out-of-Scope Use |
| |
|
| | *List how the model may foreseeably be misused and address what users ought not to do with the model.* |
| | --> |
| |
|
| | <!-- |
| | ## Bias, Risks and Limitations |
| |
|
| | *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
| | --> |
| |
|
| | <!-- |
| | ### Recommendations |
| |
|
| | *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
| | --> |
| |
|
| |
|
| |
|
| | <!-- * Size: 4,008 training samples |
| | * Columns: <code>sentence_0</code> and <code>sentence_1</code> |
| | * Approximate statistics based on the first 1000 samples: |
| | | | sentence_0 | sentence_1 | |
| | |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| |
| | | type | string | string | |
| | | details | <ul><li>min: 10 tokens</li><li>mean: 16.73 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 11.4 tokens</li><li>max: 27 tokens</li></ul> | |
| | * Samples: |
| | | sentence_0 | sentence_1 | |
| | |:------------------------------------------------------------------------------------|:------------------------------------------------| |
| | | <code>1-7-1 Konan, Minato City, Tokyo 108-0075, Japan</code> | <code>1-7-1 Konan, Tokyo 108-0075, Japan</code> | |
| | | <code>Avenida Paulista, 1000 - Bela Vista, São Paulo - SP, 01310-100, Brazil</code> | <code>Bela Vista 01310-100</code> | |
| | | <code>Strada Lipscani 25, București 030031, Romania</code> | <code>Strada Lipscani București</code> | |
| | * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: |
| | ```json |
| | { |
| | "scale": 20.0, |
| | "similarity_fct": "cos_sim" |
| | } |
| | ``` |
| | --> |
| |
|
| | ## Citation |
| |
|
| | ### BibTeX |
| |
|
| | #### Sentence Transformers |
| | ```bibtex |
| | @inproceedings{reimers-2019-sentence-bert, |
| | title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| | author = "Reimers, Nils and Gurevych, Iryna", |
| | booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
| | month = "11", |
| | year = "2019", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://arxiv.org/abs/1908.10084", |
| | } |
| | ``` |
| |
|
| |
|
| | <!-- |
| | ## Glossary |
| |
|
| | *Clearly define terms in order to be accessible across audiences.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Authors |
| |
|
| | *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
| | --> |
| |
|
| | <!-- |
| | ## Model Card Contact |
| |
|
| | *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
| | --> |