| | --- |
| | tags: |
| | - unsloth |
| | - sentence-transformers |
| | - sentence-similarity |
| | - feature-extraction |
| | - dense |
| | - generated_from_trainer |
| | - dataset_size:4927 |
| | - loss:TripletLoss |
| | base_model: unsloth/embeddinggemma-300m |
| | widget: |
| | - source_sentence: organization id |
| | sentences: |
| | - 'Primary reference table for classifying Payers into broader financial or business |
| | categories. This table groups Payers into segments such as ''Insurance Private'', |
| | ''Insurance Government (BPJS)'', ''Corporate'', and ''Related Parties''. Use this |
| | table to aggregate revenue reporting by payer channel, analyze market segmentation |
| | (e.g., Private Insurance vs. Government Scheme), or apply high-level billing policies |
| | to groups of payers. Note: This serves as a categorization layer above the individual |
| | ''Payer'' table.' |
| | - 'Operational transaction table recording unstructured free-text medical notes |
| | and preliminary clinical remarks associated with a patient admission. It captures |
| | initial diagnosis impressions, symptoms, or observation notes (e.g., ''Asthma'', |
| | ''Observation Febris'') entered during the admission process. Use this table to |
| | retrieve qualitative clinical context for a visit or search for specific medical |
| | conditions mentioned in preliminary notes. Note: This table contains raw free-text |
| | descriptions, NOT structured ICD-10 diagnosis codes used for billing.' |
| | - 'Operational transaction table recording every patient registration and visit |
| | event at the hospital. This table consolidates patient demographics, visit types |
| | (Inpatient, Outpatient, Emergency), primary and referral doctors, payer/insurance |
| | eligibility, and critical timelines (Admission and Discharge dates). Use this |
| | table to calculate patient census, Average Length of Stay (ALOS), track patient |
| | flow, or analyze admission volume by doctor or department. Note: This table focuses |
| | on administrative registration and billing initiation; it does not contain detailed |
| | clinical notes, specific lab results, or medication prescriptions. When analyzing |
| | patient administrative inflow and outflow data, this table is the primary and |
| | essential source for all patient visit metrics.' |
| | - source_sentence: What is the total count of admissions for each patient payment |
| | category (e.g., 'Private', 'Payer') as defined in the PatientType master, grouped |
| | by the AdmissionType from the Admission table, for the year 2024? |
| | sentences: |
| | - 'Operational transaction table recording unstructured free-text medical notes |
| | and preliminary clinical remarks associated with a patient admission. It captures |
| | initial diagnosis impressions, symptoms, or observation notes (e.g., ''Asthma'', |
| | ''Observation Febris'') entered during the admission process. Use this table to |
| | retrieve qualitative clinical context for a visit or search for specific medical |
| | conditions mentioned in preliminary notes. Note: This table contains raw free-text |
| | descriptions, NOT structured ICD-10 diagnosis codes used for billing.' |
| | - 'Operational transaction table recording every patient registration and visit |
| | event at the hospital. This table consolidates patient demographics, visit types |
| | (Inpatient, Outpatient, Emergency), primary and referral doctors, payer/insurance |
| | eligibility, and critical timelines (Admission and Discharge dates). Use this |
| | table to calculate patient census, Average Length of Stay (ALOS), track patient |
| | flow, or analyze admission volume by doctor or department. Note: This table focuses |
| | on administrative registration and billing initiation; it does not contain detailed |
| | clinical notes, specific lab results, or medication prescriptions. When analyzing |
| | patient administrative inflow and outflow data, this table is the primary and |
| | essential source for all patient visit metrics.' |
| | - 'Core reference table that links a central patient profile to their local record |
| | at a specific hospital branch. It connects the central `PatientId` to a local |
| | Medical Record Number (`MrNo`) at a specific hospital (`OrganizationId`). The |
| | table also includes the patient''s registration date at that particular location |
| | and the status of their medical record file (e.g., Active, Merged). **Use this |
| | table to** find a patient''s local MR Number for a specific hospital, determine |
| | when a patient first registered at a site, or check the administrative status |
| | of a patient''s file at a given location. **Note: This table defines the relationship |
| | and local record number, not the patient''s demographic details (found in the |
| | `Patient` table) or their visit history (found in `Admission` or `Encounter` tables).**' |
| | - source_sentence: List the patient names, their primary payer's name, and the invoice |
| | numbers for all invoices issued in the last 90 days to male patients whose payer |
| | is a 'Corporate' type. |
| | sentences: |
| | - Operational transaction table that records the movement of inventory items from |
| | one storage location (store) to another within the hospital network. It captures |
| | the header-level details of each transfer, including the transaction number, date, |
| | the originating store, and the receiving store. **Use this table to** track the |
| | flow of goods, monitor stock levels across different warehouses or departments, |
| | and audit inventory movements for logistics and supply chain management. **Note:** |
| | This table contains only the header information for the transfer event; it does |
| | NOT list the specific items or quantities transferred. Join with the Transfer |
| | Detail table for item-level information. |
| | - 'Operational transaction table recording the official event of a patient leaving |
| | the hospital (Discharge). It captures the precise discharge timestamp, the patient''s |
| | condition upon exit (e.g., Recovered, Improved), and the type of discharge (e.g., |
| | Medical Consent, Transfer) linked to their Admission. **Use this table to** calculate |
| | Length of Stay (LOS), analyze clinical outcomes, or track bed turnover rates. |
| | **Note: This table signifies the physical or administrative end of a visit; it |
| | does NOT contain the final invoice amount, though it triggers the billing closure |
| | process.**' |
| | - 'Primary reference table containing the master list of all external organizations |
| | responsible for patient payment guarantees. This includes Insurance Companies, |
| | Corporate Clients/Employers, and Government Health Schemes (e.g., BPJS, Jamkesda). |
| | The table stores Payer details such as Legal Name, Address, Contact Information, |
| | and specific Payer Group classifications. Use this table to link patient visits |
| | to their financial guarantors, generate invoices for corporate clients, or analyze |
| | revenue contribution by payer. Note: This table defines the ''Who Pays'' entity; |
| | specific policy terms or benefit limits are typically stored in separate configuration |
| | tables.' |
| | - source_sentence: Identify `OrganizationId`s that have more than 100 `Admission` |
| | records currently in 'Active' `AdmissionStatus` where an `ArInvoice` exists, and |
| | the `InvoiceDate` is more than 7 days after the `AdmissionDate`. |
| | sentences: |
| | - 'Primary reference table listing the specific bank accounts owned or utilized |
| | by various Siloam Hospital units (Organizations). It stores detailed Account Numbers, |
| | Account Names, and operational notes (e.g., Receipt or Payment accounts), linking |
| | them to the parent Bank entity. **Use this table to** identify the destination |
| | account for financial settlements, reconcile deposits, or manage treasury master |
| | data. **Note: This defines static master data for the hospital''s bank accounts, |
| | NOT a transaction log of transfers or balances.**' |
| | - 'Operational transaction table recording every patient registration and visit |
| | event at the hospital. This table consolidates patient demographics, visit types |
| | (Inpatient, Outpatient, Emergency), primary and referral doctors, payer/insurance |
| | eligibility, and critical timelines (Admission and Discharge dates). Use this |
| | table to calculate patient census, Average Length of Stay (ALOS), track patient |
| | flow, or analyze admission volume by doctor or department. Note: This table focuses |
| | on administrative registration and billing initiation; it does not contain detailed |
| | clinical notes, specific lab results, or medication prescriptions. When analyzing |
| | patient administrative inflow and outflow data, this table is the primary and |
| | essential source for all patient visit metrics.' |
| | - 'Operational transaction table (Financial Log) recording the header-level details |
| | of patient invoices and billing events. This table captures the financial breakdown |
| | of a visit, distinguishing between Patient responsibility (Out-of-pocket) and |
| | Payer responsibility (Insurance/Corporate Coverage), including Gross Amounts, |
| | Discounts, Taxes, and Net Payable values. Use this table to analyze hospital revenue |
| | streams, track Accounts Receivable (AR), monitor billing cancellations, or calculate |
| | the financial yield per admission. Note: This is the Invoice HEADER table containing |
| | total values; it does not typically list the specific individual line items (drugs, |
| | labs, services) charged within the bill. For any financial analysis related to |
| | hospital revenue, Payments, Accounts Receivable (AR), billing breakdowns, or insurance |
| | claims, this invoice header table is the definitive starting point.' |
| | - source_sentence: master data payer group |
| | sentences: |
| | - 'Strategic reference table linking Payers to specific Hospital Organizations (Units/Branches). |
| | This table manages the contractual relationships between insurance providers/corporate |
| | clients and individual hospital sites. It stores Contract Numbers, Validity Periods |
| | (Start/End Dates), Contract Status, and site-specific contact details. Use this |
| | table to validate insurance acceptance at a specific hospital branch, track contract |
| | expiration dates, or manage site-specific payer agreements. Note: This table enables |
| | the many-to-many relationship between Payers (Global) and Organizations (Local |
| | Sites).' |
| | - 'Primary reference table containing the master list of all external organizations |
| | responsible for patient payment guarantees. This includes Insurance Companies, |
| | Corporate Clients/Employers, and Government Health Schemes (e.g., BPJS, Jamkesda). |
| | The table stores Payer details such as Legal Name, Address, Contact Information, |
| | and specific Payer Group classifications. Use this table to link patient visits |
| | to their financial guarantors, generate invoices for corporate clients, or analyze |
| | revenue contribution by payer. Note: This table defines the ''Who Pays'' entity; |
| | specific policy terms or benefit limits are typically stored in separate configuration |
| | tables.' |
| | - 'Operational transaction table recording the official event of a patient leaving |
| | the hospital (Discharge). It captures the precise discharge timestamp, the patient''s |
| | condition upon exit (e.g., Recovered, Improved), and the type of discharge (e.g., |
| | Medical Consent, Transfer) linked to their Admission. **Use this table to** calculate |
| | Length of Stay (LOS), analyze clinical outcomes, or track bed turnover rates. |
| | **Note: This table signifies the physical or administrative end of a visit; it |
| | does NOT contain the final invoice amount, though it triggers the billing closure |
| | process.**' |
| | pipeline_tag: sentence-similarity |
| | library_name: sentence-transformers |
| | metrics: |
| | - cosine_accuracy@1 |
| | - cosine_accuracy@3 |
| | - cosine_accuracy@5 |
| | - cosine_accuracy@10 |
| | - cosine_precision@1 |
| | - cosine_precision@3 |
| | - cosine_precision@5 |
| | - cosine_precision@10 |
| | - cosine_recall@1 |
| | - cosine_recall@3 |
| | - cosine_recall@5 |
| | - cosine_recall@10 |
| | - cosine_ndcg@10 |
| | - cosine_mrr@10 |
| | - cosine_map@100 |
| | model-index: |
| | - name: SentenceTransformer based on unsloth/embeddinggemma-300m |
| | results: |
| | - task: |
| | type: information-retrieval |
| | name: Information Retrieval |
| | dataset: |
| | name: his retrieval eval |
| | type: his-retrieval-eval |
| | metrics: |
| | - type: cosine_accuracy@1 |
| | value: 0.0016233766233766235 |
| | name: Cosine Accuracy@1 |
| | - type: cosine_accuracy@3 |
| | value: 0.00487012987012987 |
| | name: Cosine Accuracy@3 |
| | - type: cosine_accuracy@5 |
| | value: 0.007305194805194805 |
| | name: Cosine Accuracy@5 |
| | - type: cosine_accuracy@10 |
| | value: 0.012175324675324676 |
| | name: Cosine Accuracy@10 |
| | - type: cosine_precision@1 |
| | value: 0.0016233766233766235 |
| | name: Cosine Precision@1 |
| | - type: cosine_precision@3 |
| | value: 0.0016233766233766235 |
| | name: Cosine Precision@3 |
| | - type: cosine_precision@5 |
| | value: 0.001461038961038961 |
| | name: Cosine Precision@5 |
| | - type: cosine_precision@10 |
| | value: 0.0012175324675324677 |
| | name: Cosine Precision@10 |
| | - type: cosine_recall@1 |
| | value: 0.0016233766233766235 |
| | name: Cosine Recall@1 |
| | - type: cosine_recall@3 |
| | value: 0.00487012987012987 |
| | name: Cosine Recall@3 |
| | - type: cosine_recall@5 |
| | value: 0.007305194805194805 |
| | name: Cosine Recall@5 |
| | - type: cosine_recall@10 |
| | value: 0.012175324675324676 |
| | name: Cosine Recall@10 |
| | - type: cosine_ndcg@10 |
| | value: 0.005969101397650474 |
| | name: Cosine Ndcg@10 |
| | - type: cosine_mrr@10 |
| | value: 0.004109977324263038 |
| | name: Cosine Mrr@10 |
| | - type: cosine_map@100 |
| | value: 0.005835580133437121 |
| | name: Cosine Map@100 |
| | --- |
| | |
| | # SentenceTransformer |
| |
|
| | This model was finetuned with [Unsloth](https://github.com/unslothai/unsloth). |
| |
|
| | [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
| | based on unsloth/embeddinggemma-300m |
| |
|
| | This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [unsloth/embeddinggemma-300m](https://huggingface.co/unsloth/embeddinggemma-300m) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| | - **Model Type:** Sentence Transformer |
| | - **Base model:** [unsloth/embeddinggemma-300m](https://huggingface.co/unsloth/embeddinggemma-300m) <!-- at revision bfa3c846ac738e62aa61806ef9112d34acb1dc5a --> |
| | - **Maximum Sequence Length:** 768 tokens |
| | - **Output Dimensionality:** 768 dimensions |
| | - **Similarity Function:** Cosine Similarity |
| | - **Training Dataset:** |
| | - json |
| | <!-- - **Language:** Unknown --> |
| | <!-- - **License:** Unknown --> |
| |
|
| | ### Model Sources |
| |
|
| | - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
| | - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) |
| | - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
| |
|
| | ### Full Model Architecture |
| |
|
| | ``` |
| | SentenceTransformer( |
| | (0): Transformer({'max_seq_length': 768, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'}) |
| | (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
| | (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'}) |
| | (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'}) |
| | (4): Normalize() |
| | ) |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Direct Usage (Sentence Transformers) |
| |
|
| | First install the Sentence Transformers library: |
| |
|
| | ```bash |
| | pip install -U sentence-transformers |
| | ``` |
| |
|
| | Then you can load this model and run inference. |
| | ```python |
| | from sentence_transformers import SentenceTransformer |
| | |
| | # Download from the 🤗 Hub |
| | model = SentenceTransformer("sentence_transformers_model_id") |
| | # Run inference |
| | sentences = [ |
| | 'master data payer group', |
| | "Primary reference table containing the master list of all external organizations responsible for patient payment guarantees. This includes Insurance Companies, Corporate Clients/Employers, and Government Health Schemes (e.g., BPJS, Jamkesda). The table stores Payer details such as Legal Name, Address, Contact Information, and specific Payer Group classifications. Use this table to link patient visits to their financial guarantors, generate invoices for corporate clients, or analyze revenue contribution by payer. Note: This table defines the 'Who Pays' entity; specific policy terms or benefit limits are typically stored in separate configuration tables.", |
| | 'Strategic reference table linking Payers to specific Hospital Organizations (Units/Branches). This table manages the contractual relationships between insurance providers/corporate clients and individual hospital sites. It stores Contract Numbers, Validity Periods (Start/End Dates), Contract Status, and site-specific contact details. Use this table to validate insurance acceptance at a specific hospital branch, track contract expiration dates, or manage site-specific payer agreements. Note: This table enables the many-to-many relationship between Payers (Global) and Organizations (Local Sites).', |
| | ] |
| | embeddings = model.encode(sentences) |
| | print(embeddings.shape) |
| | # [3, 768] |
| | |
| | # Get the similarity scores for the embeddings |
| | similarities = model.similarity(embeddings, embeddings) |
| | print(similarities) |
| | # tensor([[ 1.0000, 0.9426, -0.8527], |
| | # [ 0.9426, 1.0000, -0.8639], |
| | # [-0.8527, -0.8639, 1.0000]]) |
| | ``` |
| |
|
| | <!-- |
| | ### Direct Usage (Transformers) |
| |
|
| | <details><summary>Click to see the direct usage in Transformers</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Downstream Usage (Sentence Transformers) |
| |
|
| | You can finetune this model on your own dataset. |
| |
|
| | <details><summary>Click to expand</summary> |
| |
|
| | </details> |
| | --> |
| |
|
| | <!-- |
| | ### Out-of-Scope Use |
| |
|
| | *List how the model may foreseeably be misused and address what users ought not to do with the model.* |
| | --> |
| |
|
| | ## Evaluation |
| |
|
| | ### Metrics |
| |
|
| | #### Information Retrieval |
| |
|
| | * Dataset: `his-retrieval-eval` |
| | * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
| |
|
| | | Metric | Value | |
| | |:--------------------|:----------| |
| | | cosine_accuracy@1 | 0.0016 | |
| | | cosine_accuracy@3 | 0.0049 | |
| | | cosine_accuracy@5 | 0.0073 | |
| | | cosine_accuracy@10 | 0.0122 | |
| | | cosine_precision@1 | 0.0016 | |
| | | cosine_precision@3 | 0.0016 | |
| | | cosine_precision@5 | 0.0015 | |
| | | cosine_precision@10 | 0.0012 | |
| | | cosine_recall@1 | 0.0016 | |
| | | cosine_recall@3 | 0.0049 | |
| | | cosine_recall@5 | 0.0073 | |
| | | cosine_recall@10 | 0.0122 | |
| | | **cosine_ndcg@10** | **0.006** | |
| | | cosine_mrr@10 | 0.0041 | |
| | | cosine_map@100 | 0.0058 | |
| | |
| | <!-- |
| | ## Bias, Risks and Limitations |
| | |
| | *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
| | --> |
| | |
| | <!-- |
| | ### Recommendations |
| | |
| | *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
| | --> |
| | |
| | ## Training Details |
| | |
| | ### Training Dataset |
| | |
| | #### json |
| | |
| | * Dataset: json |
| | * Size: 4,927 training samples |
| | * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> |
| | * Approximate statistics based on the first 1000 samples: |
| | | | anchor | positive | negative | |
| | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------| |
| | | type | string | string | string | |
| | | details | <ul><li>min: 3 tokens</li><li>mean: 12.32 tokens</li><li>max: 76 tokens</li></ul> | <ul><li>min: 63 tokens</li><li>mean: 128.13 tokens</li><li>max: 171 tokens</li></ul> | <ul><li>min: 63 tokens</li><li>mean: 108.19 tokens</li><li>max: 171 tokens</li></ul> | |
| | * Samples: |
| | | anchor | positive | negative | |
| | |:------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | | <code>master patient</code> | <code>Primary reference table defining the lifecycle stages of a patient admission event. It categorizes visits into states such as 'Active' (currently in hospital), 'Discharged' (left hospital), 'Invoiced' (bill generated), or 'Cancelled'. Use this table to interpret AdmissionStatusId in transaction tables to filter visits by their current operational state (e.g., calculating current census vs. historical discharges). Note: This is a static lookup table for status definitions, NOT a transaction log of patient movements.</code> | <code>Operational transaction table recording unstructured free-text medical notes and preliminary clinical remarks associated with a patient admission. It captures initial diagnosis impressions, symptoms, or observation notes (e.g., 'Asthma', 'Observation Febris') entered during the admission process. Use this table to retrieve qualitative clinical context for a visit or search for specific medical conditions mentioned in preliminary notes. Note: This table contains raw free-text descriptions, NOT structured ICD-10 diagnosis codes used for billing.</code> | |
| | | <code>transaction ar invoice</code> | <code>Primary reference table containing the master list of all external organizations responsible for patient payment guarantees. This includes Insurance Companies, Corporate Clients/Employers, and Government Health Schemes (e.g., BPJS, Jamkesda). The table stores Payer details such as Legal Name, Address, Contact Information, and specific Payer Group classifications. Use this table to link patient visits to their financial guarantors, generate invoices for corporate clients, or analyze revenue contribution by payer. Note: This table defines the 'Who Pays' entity; specific policy terms or benefit limits are typically stored in separate configuration tables.</code> | <code>Operational transaction table recording unstructured free-text medical notes and preliminary clinical remarks associated with a patient admission. It captures initial diagnosis impressions, symptoms, or observation notes (e.g., 'Asthma', 'Observation Febris') entered during the admission process. Use this table to retrieve qualitative clinical context for a visit or search for specific medical conditions mentioned in preliminary notes. Note: This table contains raw free-text descriptions, NOT structured ICD-10 diagnosis codes used for billing.</code> | |
| | | <code>admission date</code> | <code>Primary reference table defining the high-level classification of patient visits and hospital service lines. Contains standard categories including Inpatient (Hospitalization), Outpatient (Clinical visits), Emergency (ER), and Health Checkups (MCU). Use this table to group patient volume by service type, filter admission logs, or analyze revenue streams by visit category. Note: This is a static lookup list defining the 'Types' of visits; it does not contain actual patient visit transaction records.</code> | <code>Operational transaction table recording unstructured free-text medical notes and preliminary clinical remarks associated with a patient admission. It captures initial diagnosis impressions, symptoms, or observation notes (e.g., 'Asthma', 'Observation Febris') entered during the admission process. Use this table to retrieve qualitative clinical context for a visit or search for specific medical conditions mentioned in preliminary notes. Note: This table contains raw free-text descriptions, NOT structured ICD-10 diagnosis codes used for billing.</code> | |
| | * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters: |
| | ```json |
| | { |
| | "distance_metric": "TripletDistanceMetric.COSINE", |
| | "triplet_margin": 0.5 |
| | } |
| | ``` |
| | |
| | ### Evaluation Dataset |
| | |
| | #### json |
| | |
| | * Dataset: json |
| | * Size: 1,232 evaluation samples |
| | * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code> |
| | * Approximate statistics based on the first 1000 samples: |
| | | | anchor | positive | negative | |
| | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| |
| | | type | string | string | string | |
| | | details | <ul><li>min: 3 tokens</li><li>mean: 13.51 tokens</li><li>max: 76 tokens</li></ul> | <ul><li>min: 63 tokens</li><li>mean: 127.79 tokens</li><li>max: 171 tokens</li></ul> | <ul><li>min: 63 tokens</li><li>mean: 107.3 tokens</li><li>max: 171 tokens</li></ul> | |
| | * Samples: |
| | | anchor | positive | negative | |
| | |:-----------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | | <code>transaction ar item</code> | <code>Operational transaction table recording every patient registration and visit event at the hospital. This table consolidates patient demographics, visit types (Inpatient, Outpatient, Emergency), primary and referral doctors, payer/insurance eligibility, and critical timelines (Admission and Discharge dates). Use this table to calculate patient census, Average Length of Stay (ALOS), track patient flow, or analyze admission volume by doctor or department. Note: This table focuses on administrative registration and billing initiation; it does not contain detailed clinical notes, specific lab results, or medication prescriptions. When analyzing patient administrative inflow and outflow data, this table is the primary and essential source for all patient visit metrics.</code> | <code>Operational transaction table recording individual line items within patient invoices (Accounts Receivable). It captures granular billing details including specific items sold (drugs, services), quantities, unit prices, discounts, tax calculations, and the financial split between Patient and Payer (Insurance/Guarantor). It also tracks revenue allocation (Hospital vs. Doctor portion). **Use this table to** generate detailed patient bills, audit revenue streams per item, calculate doctor performance fees, or analyze discount utilization. **Note: This table contains financial billing data per item, NOT the clinical medical results or the master list of available services.**</code> | |
| | | <code>patient demographic country</code> | <code>Operational transaction table (Financial Log) recording the header-level details of patient invoices and billing events. This table captures the financial breakdown of a visit, distinguishing between Patient responsibility (Out-of-pocket) and Payer responsibility (Insurance/Corporate Coverage), including Gross Amounts, Discounts, Taxes, and Net Payable values. Use this table to analyze hospital revenue streams, track Accounts Receivable (AR), monitor billing cancellations, or calculate the financial yield per admission. Note: This is the Invoice HEADER table containing total values; it does not typically list the specific individual line items (drugs, labs, services) charged within the bill. For any financial analysis related to hospital revenue, Payments, Accounts Receivable (AR), billing breakdowns, or insurance claims, this invoice header table is the definitive starting point.</code> | <code>Operational transaction table recording individual line items within patient invoices (Accounts Receivable). It captures granular billing details including specific items sold (drugs, services), quantities, unit prices, discounts, tax calculations, and the financial split between Patient and Payer (Insurance/Guarantor). It also tracks revenue allocation (Hospital vs. Doctor portion). **Use this table to** generate detailed patient bills, audit revenue streams per item, calculate doctor performance fees, or analyze discount utilization. **Note: This table contains financial billing data per item, NOT the clinical medical results or the master list of available services.**</code> | |
| | | <code>BPJS Kesehatan</code> | <code>Primary reference table that classifies financial guarantors into high-level categories such as 'Government' programs, 'Corporate' accounts, 'Insurance' companies, and Third Party Administrators ('TPA'). **Use this table to** group and analyze patient revenue streams by the type of financial coverage or to interpret the `PayerTypeId` in the main `Payer` master data table. **Note: This table defines the broad categories of payers only, not the specific insurance companies or corporate entities themselves (which are listed in the `Payer` table).**</code> | <code>Operational transaction table that records the movement of inventory items from one storage location (store) to another within the hospital network. It captures the header-level details of each transfer, including the transaction number, date, the originating store, and the receiving store. **Use this table to** track the flow of goods, monitor stock levels across different warehouses or departments, and audit inventory movements for logistics and supply chain management. **Note:** This table contains only the header information for the transfer event; it does NOT list the specific items or quantities transferred. Join with the Transfer Detail table for item-level information.</code> | |
| | * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters: |
| | ```json |
| | { |
| | "distance_metric": "TripletDistanceMetric.COSINE", |
| | "triplet_margin": 0.5 |
| | } |
| | ``` |
| | |
| | ### Training Hyperparameters |
| | #### Non-Default Hyperparameters |
| | |
| | - `eval_strategy`: steps |
| | - `per_device_train_batch_size`: 64 |
| | - `per_device_eval_batch_size`: 64 |
| | - `gradient_accumulation_steps`: 2 |
| | - `learning_rate`: 2e-05 |
| | - `lr_scheduler_type`: cosine |
| | - `warmup_ratio`: 0.1 |
| | - `bf16`: True |
| | - `prompts`: {'anchor': ' ', 'positive': '', 'negative': ''} |
| | - `batch_sampler`: no_duplicates |
| | |
| | #### All Hyperparameters |
| | <details><summary>Click to expand</summary> |
| | |
| | - `overwrite_output_dir`: False |
| | - `do_predict`: False |
| | - `eval_strategy`: steps |
| | - `prediction_loss_only`: True |
| | - `per_device_train_batch_size`: 64 |
| | - `per_device_eval_batch_size`: 64 |
| | - `per_gpu_train_batch_size`: None |
| | - `per_gpu_eval_batch_size`: None |
| | - `gradient_accumulation_steps`: 2 |
| | - `eval_accumulation_steps`: None |
| | - `torch_empty_cache_steps`: None |
| | - `learning_rate`: 2e-05 |
| | - `weight_decay`: 0.0 |
| | - `adam_beta1`: 0.9 |
| | - `adam_beta2`: 0.999 |
| | - `adam_epsilon`: 1e-08 |
| | - `max_grad_norm`: 1.0 |
| | - `num_train_epochs`: 3 |
| | - `max_steps`: -1 |
| | - `lr_scheduler_type`: cosine |
| | - `lr_scheduler_kwargs`: {} |
| | - `warmup_ratio`: 0.1 |
| | - `warmup_steps`: 0 |
| | - `log_level`: passive |
| | - `log_level_replica`: warning |
| | - `log_on_each_node`: True |
| | - `logging_nan_inf_filter`: True |
| | - `save_safetensors`: True |
| | - `save_on_each_node`: False |
| | - `save_only_model`: False |
| | - `restore_callback_states_from_checkpoint`: False |
| | - `no_cuda`: False |
| | - `use_cpu`: False |
| | - `use_mps_device`: False |
| | - `seed`: 42 |
| | - `data_seed`: None |
| | - `jit_mode_eval`: False |
| | - `use_ipex`: False |
| | - `bf16`: True |
| | - `fp16`: False |
| | - `fp16_opt_level`: O1 |
| | - `half_precision_backend`: auto |
| | - `bf16_full_eval`: False |
| | - `fp16_full_eval`: False |
| | - `tf32`: None |
| | - `local_rank`: 0 |
| | - `ddp_backend`: None |
| | - `tpu_num_cores`: None |
| | - `tpu_metrics_debug`: False |
| | - `debug`: [] |
| | - `dataloader_drop_last`: False |
| | - `dataloader_num_workers`: 0 |
| | - `dataloader_prefetch_factor`: None |
| | - `past_index`: -1 |
| | - `disable_tqdm`: False |
| | - `remove_unused_columns`: True |
| | - `label_names`: None |
| | - `load_best_model_at_end`: False |
| | - `ignore_data_skip`: False |
| | - `fsdp`: [] |
| | - `fsdp_min_num_params`: 0 |
| | - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
| | - `fsdp_transformer_layer_cls_to_wrap`: None |
| | - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
| | - `parallelism_config`: None |
| | - `deepspeed`: None |
| | - `label_smoothing_factor`: 0.0 |
| | - `optim`: adamw_torch_fused |
| | - `optim_args`: None |
| | - `adafactor`: False |
| | - `group_by_length`: False |
| | - `length_column_name`: length |
| | - `ddp_find_unused_parameters`: None |
| | - `ddp_bucket_cap_mb`: None |
| | - `ddp_broadcast_buffers`: False |
| | - `dataloader_pin_memory`: True |
| | - `dataloader_persistent_workers`: False |
| | - `skip_memory_metrics`: True |
| | - `use_legacy_prediction_loop`: False |
| | - `push_to_hub`: False |
| | - `resume_from_checkpoint`: None |
| | - `hub_model_id`: None |
| | - `hub_strategy`: every_save |
| | - `hub_private_repo`: None |
| | - `hub_always_push`: False |
| | - `hub_revision`: None |
| | - `gradient_checkpointing`: False |
| | - `gradient_checkpointing_kwargs`: None |
| | - `include_inputs_for_metrics`: False |
| | - `include_for_metrics`: [] |
| | - `eval_do_concat_batches`: True |
| | - `fp16_backend`: auto |
| | - `push_to_hub_model_id`: None |
| | - `push_to_hub_organization`: None |
| | - `mp_parameters`: |
| | - `auto_find_batch_size`: False |
| | - `full_determinism`: False |
| | - `torchdynamo`: None |
| | - `ray_scope`: last |
| | - `ddp_timeout`: 1800 |
| | - `torch_compile`: False |
| | - `torch_compile_backend`: None |
| | - `torch_compile_mode`: None |
| | - `include_tokens_per_second`: False |
| | - `include_num_input_tokens_seen`: False |
| | - `neftune_noise_alpha`: None |
| | - `optim_target_modules`: None |
| | - `batch_eval_metrics`: False |
| | - `eval_on_start`: False |
| | - `use_liger_kernel`: False |
| | - `liger_kernel_config`: None |
| | - `eval_use_gather_object`: False |
| | - `average_tokens_across_devices`: False |
| | - `prompts`: {'anchor': ' ', 'positive': '', 'negative': ''} |
| | - `batch_sampler`: no_duplicates |
| | - `multi_dataset_batch_sampler`: proportional |
| | - `router_mapping`: {} |
| | - `learning_rate_mapping`: {} |
| | |
| | </details> |
| | |
| | ### Training Logs |
| | | Epoch | Step | Training Loss | Validation Loss | his-retrieval-eval_cosine_ndcg@10 | |
| | |:------:|:----:|:-------------:|:---------------:|:---------------------------------:| |
| | | -1 | -1 | - | - | 0.0034 | |
| | | 0.1299 | 5 | 0.4772 | - | - | |
| | | 0.2597 | 10 | 0.2436 | 0.0812 | 0.0021 | |
| | | 0.3896 | 15 | 0.1075 | - | - | |
| | | 0.5195 | 20 | 0.0794 | 0.0613 | 0.0036 | |
| | | 0.6494 | 25 | 0.089 | - | - | |
| | | 0.7792 | 30 | 0.1045 | 0.0248 | 0.0030 | |
| | | 0.9091 | 35 | 0.0629 | - | - | |
| | | 1.0260 | 40 | 0.0552 | 0.0298 | 0.0025 | |
| | | 1.1558 | 45 | 0.0597 | - | - | |
| | | 1.2857 | 50 | 0.0684 | 0.0236 | 0.0017 | |
| | | 1.4156 | 55 | 0.0629 | - | - | |
| | | 1.5455 | 60 | 0.0438 | 0.0213 | 0.0040 | |
| | | 1.6753 | 65 | 0.0504 | - | - | |
| | | 1.8052 | 70 | 0.0501 | 0.0237 | 0.0010 | |
| | | 1.9351 | 75 | 0.0443 | - | - | |
| | | 2.0519 | 80 | 0.0202 | 0.0232 | 0.0052 | |
| | | 2.1818 | 85 | 0.0414 | - | - | |
| | | 2.3117 | 90 | 0.0497 | 0.0233 | 0.0033 | |
| | | 2.4416 | 95 | 0.0367 | - | - | |
| | | 2.5714 | 100 | 0.0491 | 0.0232 | 0.0054 | |
| | | 2.7013 | 105 | 0.0262 | - | - | |
| | | 2.8312 | 110 | 0.0222 | 0.0232 | 0.0045 | |
| | | 2.9610 | 115 | 0.0225 | - | - | |
| | | -1 | -1 | - | - | 0.0060 | |
| | |
| | |
| | ### Framework Versions |
| | - Python: 3.11.6 |
| | - Sentence Transformers: 5.2.2 |
| | - Transformers: 4.56.2 |
| | - PyTorch: 2.10.0+cu128 |
| | - Accelerate: 1.12.0 |
| | - Datasets: 4.3.0 |
| | - Tokenizers: 0.22.2 |
| | |
| | ## Citation |
| | |
| | ### BibTeX |
| | |
| | #### Sentence Transformers |
| | ```bibtex |
| | @inproceedings{reimers-2019-sentence-bert, |
| | title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
| | author = "Reimers, Nils and Gurevych, Iryna", |
| | booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
| | month = "11", |
| | year = "2019", |
| | publisher = "Association for Computational Linguistics", |
| | url = "https://arxiv.org/abs/1908.10084", |
| | } |
| | ``` |
| | |
| | #### TripletLoss |
| | ```bibtex |
| | @misc{hermans2017defense, |
| | title={In Defense of the Triplet Loss for Person Re-Identification}, |
| | author={Alexander Hermans and Lucas Beyer and Bastian Leibe}, |
| | year={2017}, |
| | eprint={1703.07737}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV} |
| | } |
| | ``` |
| | |
| | <!-- |
| | ## Glossary |
| | |
| | *Clearly define terms in order to be accessible across audiences.* |
| | --> |
| | |
| | <!-- |
| | ## Model Card Authors |
| | |
| | *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
| | --> |
| | |
| | <!-- |
| | ## Model Card Contact |
| | |
| | *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
| | --> |