embedding_finetuned / README.md
SamagraDataGov's picture
pytorch_model.bin upload/update
757cf7f verified
|
raw
history blame
39.8 kB
metadata
base_model: BAAI/bge-small-en-v1.5
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@5
  - cosine_ndcg@10
  - cosine_ndcg@100
  - cosine_mrr@5
  - cosine_mrr@10
  - cosine_mrr@100
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@5
  - dot_ndcg@10
  - dot_ndcg@100
  - dot_mrr@5
  - dot_mrr@10
  - dot_mrr@100
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:7033
  - loss:GISTEmbedLoss
widget:
  - source_sentence: Are Producer Companies required to maintain a general reserve?
    sentences:
      - >-
        'DONATIONS OR SUBSCRIPTION BY PRODUCER COMPANY A Producer Company may,
        by special resolution, make donation or subscription to any institution
        or individual for the purposes of -  (a) promoting the social and
        economic welfare of Producer Members or producers or general public; or
        (b) *promoting the mutual assistance principles:* Provided that the
        aggregate amount of all such donation and subscription in any financial
        year shall not exceed three per cent of the net profit of the Producer
        Company in the financial year immediately preceding the financial year
        in which the donation or subscription was made: Provided further that no
        Producer Company shall make directly or indirectly to any political
        party or for any political purpose to any person any contribution or
        subscription or make available any facilities including personnel or
        material.    581ZI. GENERAL AND OTHER RESERVES  (1) Every Producer
        Company shall maintain a general reserve in every financial year, in
        addition to any reserve maintained by it as may be specified in
        articles.  (2) In a case where the Producer Company does not have
        sufficient funds in any financial year for transfer to maintain the
        reserves as may be specified in articles, the contribution to the
        reserve shall be shared amongst the Members in proportion to their
        patronage in the business of that company in that year.    581ZJ. ISSUE
        OF BONUS SHARES Any Producer Company may, upon recommendation of the
        Board and passing of resolution in the general meeting, issue bonus
        shares by capitalisation of amounts from general reserves referred to in
        section 581ZI in proportion to the shares held by the Members on the
        date of the issue of such shares.'
      - >-
        '    POPI will be required to submit a Utilization Certificate as per
        Annexure II, in respect of  funds released earlier, for processing of
        release proposal from second instalment onwards.  xi.     POPI will
        maintain detailed account of expenditure of all approved items in
        respect of each  FPO separately and retain all original vouchers and
        receipts for verification by NABARD and RSA.  xii.     POPI shall submit
        monthly progress report to NABARD Regional Office before 5th of the 
        succeeding month as per **Annexure III**  xiii.     POPI shall
        constitute a \'Project Monitoring Committee (PMC) consisting of 
        representative  of POPI, RSA, DDM **of** NABARD, Lead District Manager,
        ATMA, Agriculture department  and a Board member of FPO(to be promoted).
        The PMC shall meet quarterly to review the progress, guide the project
        execution and make recommendation for release of grant to POPI/FPO.  
        xiv.     POPI will submit all such information and data as required for
        the periodic monitoring of  the project by NABARD/its representatives.
        POPI shall not publish the reports/research findings/results without a
        written permission from NABARD. Further, NABARD shall have the right to
        use the same for its internal use, training, publicity, etc., after duly
        acknowledging the source(s).  xv.     POPI may undertake to document its
        experience during the course of implementation of  the project and
        submit to NABARD Regional Office for information/record.  xvi.     The
        assistance of NABARD shall be duly acknowledged by displaying suitable
        sign board  containing **\'Project supported under NABARD assistance\'**
        at the FPO Office and also while organising training programmes and
        printing of publicity/documentation material in respect of the project. 
        xvii.     POPI shall not sub-contract the work assigned to it to any
        other institution/entity.  xviii.     In the event of POPI availing
        assistance from any other agency for any activity of the same  project,
        NABARD's assistance will be reduced to that extent.'
      - >-
        '6.3.1   Aadhaar has been made mandatory for availing Crop insurance
        from Kharif 2017 season onwards.    Therefore, all banks are advised to
        mandatorily obtain Aadhaar number of their farmers and the same   
        applies  for  non-loanee  farmers  enrolled  through  banks/Insurance 
        companies/insurance    intermediaries.  6.3.2   Farmers not having
        Aadhaar ID may also enrol under PMFBY subject to their enrolment for   
        Aadhaar and submission of proof of such enrolment as per notification
        No. 334.dated 8th February,    2017 issued by GOI under Section 7 of
        Aadhaar Act 2016(Targeted Delivery of Financial and other    Subsidies,
        Benefits and Services). Copy of the notification may be perused on
        www.pmfby.gov.in. This    may be  subject to further directions issued
        by Govt. from time to time.  6.3.3    All banks have to compulsorily
        take Aadhaar/Aadhaar enrolment number as per notification under  Aadhaar
        Act before sanction of crop loan/KCC under Interest Subvention Scheme.
        Hence the coverage    of loanee farmers without Aadhaar does not arise
        and such accounts need to be reviewed by the    concerned bank branch
        regularly.'
  - source_sentence: Where is the shoot borer widely distributed?
    sentences:
      - >-
        'Sugarcane is an important commercial crop in India . It is cultivated
        under diverse  agro-climatic conditions . The crop is damaged by 5
        important moth borers . Among these borers the shoot borer, Chilo
        infuscatellus is an important one and is widely distributed in all cane
        growing areas in India. The infestation reduces cane production,
        Parthasarathy et al (1953) observed a loss in weight of the infested
        clumps varying from 15.8 to 41.7 % A decrease in yield by 10 t /ha has
        been calculated by Ramachandrachari (1959) Avasthy (1968) correlated 
        the incidence of shoot borer with cane yield and found 3.5 % loss in
        yield at every 5 % increase in borer infestation. High temperature , low
        humidity and scanty rainfall and poor irrigation facilitate high
        incidence of shoot borer.'
      - >-
        '3. Whitefly , Bemisia tabaci , Aleyrodidae, Hemiptera Symptom of
        damage: Yellowing of leaves, plant vitality reduced, development of
        sooty mould, plant dies in case of severe attack. Nature of damage: 
        Nymphs and adults suck the plant sap and also transmits yellow mosaic
        virus (YMV). Egg: Stalked, sub-elliptical, light yellow at first, and
        turning brown later on. Eggs laid singly on adaxial (lower) side of
        leaves. Nymph: Elliptical on emergence, soon they fix their mouthparts
        into the plant tissues and feed on the cell sap. Greenish yellow, oval
        on undersurface of leaves. Adult:  Small with yellow body covered with
        white waxy bloom.'
      - >-
        'Kharif / Kharif Kharif / Rabi food grains To get a higher yield of
        wheat, it is necessary to pay attention to the following points: -. For
        field preparation, plough first with a cultivator and then use a
        rotavator 'harrow'. Organic fertilizers must be used. As much as
        possible, half of the nutrients should be provided by organic
        fertilizers. The species should be selected according to regional
        compatibility and seasonality. Pure and certified seeds should be sown
        after seed treatment. Balanced amounts of fertilizers should be used at
        the right time and in the right manner based on soil testing. Irrigation
        at critical stages (crown root stage and flowering stage) should be done
        in a timely manner and in adequate quantity. Outbreaks of wheatgrass
        (Phalaris minor) and wild oats should be controlled in time. & 4S HA #
        (4? (A) Other activities should be completed on time based on the
        recommendation |0. Seeds must be replaced after the third year..
        Gerotillage and raised bed method should be used. 2. Special care should
        be taken to prevent pests and diseases. Intensive methods: In case of
        irrigated sowing: About 97% of the total wheat area in the state is
        irrigated but assured or assured irrigation is available in a small
        area. Hence, the sowing of wheat is often delayed. We have to decide in
        advance which variety of paddy to choose in kharif and which variety of
        wheat to sow in rabi. To get a good yield of wheat, it is necessary to
        sow paddy in time, so that the field is empty for wheat in October.
        Another thing to be noted is that puddling or leva in paddy causes the
        soil to harden. In heavy soils, it is advisable to sow wheat by first
        ploughing with a soil-reversing plough and then ploughing the soil twice
        with a disc harrow. Paddy stalks are cut into small pieces using disc
        harrows. To decompose them quickly, 45-20 kg. Nitrogen (as urea) per se.
        When preparing the field, it must be given at the first ploughing. The
        field is fully prepared in a single ploughing by a tractor-driven
        rotavator. |बुवाई: Wheat must be sown on time and at sufficient
        moisture. Late-maturing varieties must be sown on time, otherwise the
        yield decreases. As sowing is delayed, the rate of decline in wheat
        yields increases. Wheat yields increase from 3 to 4 kg / ha when sown
        from December onwards. And 4 to 5 k.g. / ha when sown in January. The
        rate per week decreases. Sowing wheat with a seed drill can save
        fertilizer and seed. 4'
  - source_sentence: >-
      Why is the development of Best Practices, Pilot Projects, and Success
      Stories important for FPOs?
    sentences:
      - >-
        'III. SAP FEEDERS 8. Shoot bug : Peregrinus maidis : Delphacidae: 
        Hemiptera Symptom of attack: The leaves turn yellow due to sucking;
        plants become weak and the yield goes down. The mid rib of the leaves
        become red due to egg laying and may dry up subsequently. Nature of
        damage: Both adults and nymphs suck the plant sap from the leaves and
        cause the shoot to dry. They feed gregariously within the leaf sheaths.
        It is not a serious pest, but sometimes causes appreciable damage. Life
        stages: It is a small active, grayish brown bug. Colonies of this bug
        (both adults and nymphs) live within the whorl of the central leaf or in
        the root region. This pest is very common in Coimbatore during summer.
        The large black ant attends these insects.'
      - >-
        'a.  Through a survey; or  b.  Through Focused Group Discussion 
        Determine key indicators for the monitoring process- Develop formats
        Secondary Data - The returns submitted by the PO, data available from
        the Government Departments and also published data from other projects.
        10.16  What are the methods of sampling? There are 3 sampling
        techniques: random sampling, stratified sampling and cluster sampling
        Random sampling:  Sampling of households on random basis Stratified
        sampling:  The producers are categorized into different strata like big,
        medium and small.  Data are collected from each strata in a specified
        proportion  i.e., say,  every fifth producer's household data from the
        big producers, every third producer's house hold data from small
        producers  every second house hold data  from the very small producers'
        category Cluster sampling: In this case, data of only those producers
        households will be collected who are in the cluster for a specified
        period 10.17 How to analyze the data? Analysis is the process of turning
        the detailed data into an understanding of patterns, trends and
        interpretations. The step by step process involved in monitoring
        analysis is enumerated below:'
      - >-
        'i.  Identification of potential FPOs among successful Watershed
        Development projects, Wadi Projects and their Federations.  ii. 
        Identification of natural clusters of farmers groups to facilitate
        formation of FPOs  iii.  Close involvement of stakeholders such as NGOs,
        Banks, Govt. line departments, commodity Boards, Corporations,
        Corporate, functional Universities, cooperatives, Federations, Trade
        bodies, etc. for identification, promotion, nurturing, development,
        capacity building, evaluation etc. of FPOs  iv.  Development of Best
        Practices, Pilot Projects and Success Stories for wider publicity and
        field level replication  v.  Adoption of mission mode with periodic
        qualitative and quantitative milestones with  timelines  vi.  Wide
        publicity to the FPO Scheme through print, electronic media and 
        adopting other Mass Communication Strategies  vii. 
        Conventional/non-conventional publicity and awareness creation methods 
        viii.  Launching of pilot projects, action research projects,
        experimental projects, field trials etc. to learn and understand various
        models of FPOs and successful strategies for wider replication'
  - source_sentence: >-
      Apart from nutrients and protein, what role does Moong have in pulses
      crops?
    sentences:
      - >-
        'DONATIONS OR SUBSCRIPTION BY PRODUCER COMPANY A Producer Company may,
        by special resolution, make donation or subscription to any institution
        or individual for the purposes of -  (a) promoting the social and
        economic welfare of Producer Members or producers or general public; or
        (b) *promoting the mutual assistance principles:* Provided that the
        aggregate amount of all such donation and subscription in any financial
        year shall not exceed three per cent of the net profit of the Producer
        Company in the financial year immediately preceding the financial year
        in which the donation or subscription was made: Provided further that no
        Producer Company shall make directly or indirectly to any political
        party or for any political purpose to any person any contribution or
        subscription or make available any facilities including personnel or
        material.    581ZI. GENERAL AND OTHER RESERVES  (1) Every Producer
        Company shall maintain a general reserve in every financial year, in
        addition to any reserve maintained by it as may be specified in
        articles.  (2) In a case where the Producer Company does not have
        sufficient funds in any financial year for transfer to maintain the
        reserves as may be specified in articles, the contribution to the
        reserve shall be shared amongst the Members in proportion to their
        patronage in the business of that company in that year.    581ZJ. ISSUE
        OF BONUS SHARES Any Producer Company may, upon recommendation of the
        Board and passing of resolution in the general meeting, issue bonus
        shares by capitalisation of amounts from general reserves referred to in
        section 581ZI in proportion to the shares held by the Members on the
        date of the issue of such shares.'
      - >-
        'Larval rearing : It is to be done in GI round basins (28 cm dia ) at
        250 larvae /basin covered with khada cloth . The eggs of Corcyra
        cephlonica are  given as feeding material for the larvae in the
        laboratory. For rearing 500 Chrysoperla larvae the total quantity of
        Corcyra eggs required is 25 CC at the rate of 5.0 CC / feeding for 5
        feedings in alternate days. The Chrysoperla larvae pupated into round
        white coloured silken cocoon  in 10 days. The cocoons are  collected
        with fine brush and transferred into a one litre plastic containers with
        wire mesh window for emergence of adults. From the cocoons, pale green
        colored adults with transparent lace like wings emerge in 9-10 days.'
      - >-
        'Advanced cultivation of Kharif / Kharif Rabi / Rabi pulses is the major
        crop of Moong Zaid. Moong has a multifaceted role in pulses crops. Apart
        from providing nutrients and protein, it also replenishes green manure
        by replanting crops after plucking the pods. Etawaligarh, Deoria,
        Etawah, Farrukhabad, Mathura, Lalitpur, Kanpur Dehat, Hardoi, and
        Ghazipur districts of the state have emerged as major groundnut
        producing districts. Other districts also have potential. Good yield can
        be obtained in Zaid by considering the following factors - Field
        preparation: Loam land is suitable for mung bean cultivation. Ploughing
        two tillers makes the field ready. If there is a shortage of seeds, they
        should be replanted and sown. Farm preparation can be done quickly with
        tractors, power tillers, rotovators, or other modern agricultural
        machinery. Recommended varieties: The following varieties with short
        maturation are suitable for good yield: - Species Notification
        Speciality Ripe Produce Kuntala Pest Disease Preference Suitable Area
        Year Period (days) Per Hectare Utilization 2 3 4 5 6 74. Narendra Moong
        - 992. Dana Dhumil. . 65-70 4 - 3 yellow mosaic whole U.P. 2. Malviya
        2000 green grain. 65-70 2 - 5 Tolerant, Tadeva Sampoorna U.P. Jagrati
        (H.P. UM-2) 3. Emperor 2004 Green Shining. 60-65 9 - 0 Yellow Magic
        Whole U.P. PDM-39) Avrodhi4. Malviya Janapriya. .200] - 60-65 2 - 5
        Tadaiv Sampoorn Uttar Pradesh (HUM-6) 5. Azad Moong -] 2020 Bright
        green. 62-65 0 - 2 MYMV, Whole U.P. (K, M-2342) Colour Medium CLS,
        Ansharqunose, Bold Grain Leaf Crinkle and Web Blight Resistant and
        Height Fly, Jasid and Shrips Resistant |6. IPM 32-20 2020 Green and
        Medium 65-85 6 MYMV, Whole U.P. Large Grain Powdery Mildew, Resistant to
        Sarcosporalife Spots and Resistant to Whitefly and Shrips. < 84 >'
  - source_sentence: What is the purpose of heating the fresh fruit tissue in alcohol or HCl?
    sentences:
      - >-
        'a. Business Processes: Aggregation, segregation and logistics b.
        Productivity: Man, material, money, input and output c. Warehousing:
        Space, costs and logistics d. Processing : Own  vs. out-source     e.
        Products: Whole foods to  processed foods and to derivatives f. Risk
        mitigation    7.4 What is a business plan? Business plan is a succinct
        document that specifies the components of a strategy with regard to the
        business mission, external and internal environments and problems
        identified in earlier analysis. A business plan is not written each time
        a modification to a strategy is made. It should be written when a new
        venture is developed or a major new initiative is launched. Sincere
        contemplation is needed about the business concept, the business
        opportunity, the competitive landscape, the essential elements for
        success, and the people who will be involved. The exercise will often
        lead to more questions, and these new questions must be properly
        researched to gain deep insight into the issues and challenges that lie
        ahead. In short, the business plan must contain answers to the
        questions  \'Who/What/Where/When/Why/How/How Much\'. 7.5 What is
        business planning? The business planning process starts with Generation
        of Business Ideas, followed by Opportunities & Threats Analysis leading
        to Identification of suitable Business Opportunities. Once Business
        Opportunity is identified, a Marketing Plan is prepared. The final part
        of the process deals with the Financial Plan.'
      - >-
        'The fresh fruit tissue or separated parts, including the peel and core
        are heated in 95% alcohol or 0.05N HCl (pH 2.0) for 10-20 min at 70 o C
        to inactivate pectic enzymes. After the pretreatment, the materials is
        ground in an electric blender and placed in water. Versene or Na-EDTA is
        added at 2.0%. The pH is adjusted to 6.0. The mixture is heated for
        about an hour at 90-95 o C. The slurry formed is rapidly filtered and
        the pectin is precipitated from the solution using acidified alcohol.
        The precipitate is centrifuged and repeatedly washed with 70% alcohol.
        Acetone is used for dehydration and the pectin produced is vacuum-dried.
        It may also be dried in a hot-air oven at 50 o C for 4 h.'
      - >-
        'Advanced cultivation of Kharif / Kharif Kharif / Rabi foodgrains Paddy
        is the major crop of the state in Kharif. It is the largest area sown /
        sown and has great potential to increase productivity |यह. To achieve
        higher rice yields, the following factors must be taken into
        consideration: |2. Select the recommended varieties of paddy according
        to local conditions such as regional climate, soil, irrigation
        facilities, water logging, and suitability for sowing and transplanting.
        |मृदा Sow pure, certified and researched seeds |मृदा On a trial basis,
        timely and recommended quantities of balanced fertilizers, organic
        manure, and green manure. Make good use of the available irrigation
        potential by timely sowing / transplanting. The number of plants per
        unit area should be ensured. |कीट Disease and weed control should be
        done. |कम The ratio of fertilizers should be kept 2: 4: 4 even in the
        case of fertilizer availability. |4 Preparation of the field should be
        done by ploughing 2 - 3 after ploughing the land. At the same time, the
        farm should be made strong so that rainwater can be stored in the field
        for a long time. If green manure is being taken then phosphorus should
        be used along with its sowing. Irrigate the field a week before sowing /
        transplanting paddy so that weeds grow. Volume per hectare 60-75 kg. Mix
        rotten cow dung manure, sprinkle with lukewarm water and leave it in
        shade for 8-0 days, then add it to the fields at the time of last mowing
        to protect them from pests such as termites, white weeds, nematodes,
        root bugs, cutworms, etc. Volume per hectare 60-75 kg. After sprinkling
        light water mixed with cow dung manure and keeping it in shade for 8-40
        days, the land should be tilled at the last ploughing before sowing.
        P749AF #े 3. Rice cultivation in the region is done by direct sowing and
        transplanting in non-irrigated and irrigated conditions. The recommended
        varieties of paddy for different climatic zones and conditions of the
        state are mentioned in Table-4. The qualities and characteristics of the
        main varieties are also listed in Table-2. 04'
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: val evaluator
          type: val_evaluator
        metrics:
          - type: cosine_accuracy@1
            value: 0.45012787723785164
            name: Cosine Accuracy@1
          - type: cosine_accuracy@5
            value: 0.8580562659846548
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9207161125319693
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.45012787723785164
            name: Cosine Precision@1
          - type: cosine_precision@5
            value: 0.17161125319693094
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09207161125319692
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.45012787723785164
            name: Cosine Recall@1
          - type: cosine_recall@5
            value: 0.8580562659846548
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9207161125319693
            name: Cosine Recall@10
          - type: cosine_ndcg@5
            value: 0.6776887809935845
            name: Cosine Ndcg@5
          - type: cosine_ndcg@10
            value: 0.6982045153363013
            name: Cosine Ndcg@10
          - type: cosine_ndcg@100
            value: 0.7149326391576375
            name: Cosine Ndcg@100
          - type: cosine_mrr@5
            value: 0.6166240409207153
            name: Cosine Mrr@5
          - type: cosine_mrr@10
            value: 0.6252430682417891
            name: Cosine Mrr@10
          - type: cosine_mrr@100
            value: 0.6289243546015818
            name: Cosine Mrr@100
          - type: cosine_map@100
            value: 0.6289243546015826
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.4514066496163683
            name: Dot Accuracy@1
          - type: dot_accuracy@5
            value: 0.8580562659846548
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.9207161125319693
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.4514066496163683
            name: Dot Precision@1
          - type: dot_precision@5
            value: 0.17161125319693094
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.09207161125319692
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.4514066496163683
            name: Dot Recall@1
          - type: dot_recall@5
            value: 0.8580562659846548
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.9207161125319693
            name: Dot Recall@10
          - type: dot_ndcg@5
            value: 0.6781607378304497
            name: Dot Ndcg@5
          - type: dot_ndcg@10
            value: 0.6986764721731665
            name: Dot Ndcg@10
          - type: dot_ndcg@100
            value: 0.7154045959945029
            name: Dot Ndcg@100
          - type: dot_mrr@5
            value: 0.6172634271099737
            name: Dot Mrr@5
          - type: dot_mrr@10
            value: 0.6258824544310474
            name: Dot Mrr@10
          - type: dot_mrr@100
            value: 0.6295637407908401
            name: Dot Mrr@100
          - type: dot_map@100
            value: 0.6295637407908409
            name: Dot Map@100

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("SamagraDataGov/embedding_finetuned")
# Run inference
sentences = [
    'What is the purpose of heating the fresh fruit tissue in alcohol or HCl?',
    "'The fresh fruit tissue or separated parts, including the peel and core are heated in 95% alcohol or 0.05N HCl (pH 2.0) for 10-20 min at 70 o C to inactivate pectic enzymes. After the pretreatment, the materials is ground in an electric blender and placed in water. Versene or Na-EDTA is added at 2.0%. The pH is adjusted to 6.0. The mixture is heated for about an hour at 90-95 o C. The slurry formed is rapidly filtered and the pectin is precipitated from the solution using acidified alcohol. The precipitate is centrifuged and repeatedly washed with 70% alcohol. Acetone is used for dehydration and the pectin produced is vacuum-dried. It may also be dried in a hot-air oven at 50 o C for 4 h.'",
    "'Advanced cultivation of Kharif / Kharif Kharif / Rabi foodgrains Paddy is the major crop of the state in Kharif. It is the largest area sown / sown and has great potential to increase productivity |यह. To achieve higher rice yields, the following factors must be taken into consideration: |2. Select the recommended varieties of paddy according to local conditions such as regional climate, soil, irrigation facilities, water logging, and suitability for sowing and transplanting. |मृदा Sow pure, certified and researched seeds |मृदा On a trial basis, timely and recommended quantities of balanced fertilizers, organic manure, and green manure. Make good use of the available irrigation potential by timely sowing / transplanting. The number of plants per unit area should be ensured. |कीट Disease and weed control should be done. |कम The ratio of fertilizers should be kept 2: 4: 4 even in the case of fertilizer availability. |4 Preparation of the field should be done by ploughing 2 - 3 after ploughing the land. At the same time, the farm should be made strong so that rainwater can be stored in the field for a long time. If green manure is being taken then phosphorus should be used along with its sowing. Irrigate the field a week before sowing / transplanting paddy so that weeds grow. Volume per hectare 60-75 kg. Mix rotten cow dung manure, sprinkle with lukewarm water and leave it in shade for 8-0 days, then add it to the fields at the time of last mowing to protect them from pests such as termites, white weeds, nematodes, root bugs, cutworms, etc. Volume per hectare 60-75 kg. After sprinkling light water mixed with cow dung manure and keeping it in shade for 8-40 days, the land should be tilled at the last ploughing before sowing. P749AF #े 3. Rice cultivation in the region is done by direct sowing and transplanting in non-irrigated and irrigated conditions. The recommended varieties of paddy for different climatic zones and conditions of the state are mentioned in Table-4. The qualities and characteristics of the main varieties are also listed in Table-2. 04'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4501
cosine_accuracy@5 0.8581
cosine_accuracy@10 0.9207
cosine_precision@1 0.4501
cosine_precision@5 0.1716
cosine_precision@10 0.0921
cosine_recall@1 0.4501
cosine_recall@5 0.8581
cosine_recall@10 0.9207
cosine_ndcg@5 0.6777
cosine_ndcg@10 0.6982
cosine_ndcg@100 0.7149
cosine_mrr@5 0.6166
cosine_mrr@10 0.6252
cosine_mrr@100 0.6289
cosine_map@100 0.6289
dot_accuracy@1 0.4514
dot_accuracy@5 0.8581
dot_accuracy@10 0.9207
dot_precision@1 0.4514
dot_precision@5 0.1716
dot_precision@10 0.0921
dot_recall@1 0.4514
dot_recall@5 0.8581
dot_recall@10 0.9207
dot_ndcg@5 0.6782
dot_ndcg@10 0.6987
dot_ndcg@100 0.7154
dot_mrr@5 0.6173
dot_mrr@10 0.6259
dot_mrr@100 0.6296
dot_map@100 0.6296

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • gradient_accumulation_steps: 4
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 1.0
  • warmup_ratio: 0.1
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1.0
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss val_evaluator_dot_map@100
0.0682 15 0.5269 0.3693 0.6033
0.1364 30 0.2825 0.2129 0.6057
0.2045 45 0.3093 0.1710 0.6080
0.2727 60 0.1677 0.1486 0.6196
0.3409 75 0.2368 0.1256 0.6199
0.4091 90 0.161 0.1113 0.6255
0.4773 105 0.1452 0.1006 0.6256
0.5455 120 0.1323 0.1008 0.6266
0.6136 135 0.1138 0.0986 0.6270
0.6818 150 0.1129 0.0954 0.6289
0.75 165 0.1322 0.0914 0.6290
0.8182 180 0.2063 0.0898 0.6307
0.8864 195 0.1055 0.0891 0.6300
0.9545 210 0.0931 0.0888 0.6296
1.0 220 - 0.0888 0.6296
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.43.4
  • PyTorch: 2.4.1+cu121
  • Accelerate: 0.33.0
  • Datasets: 2.21.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, 
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}