Instructions to use Pravallika6/scibert-cross-domain-embeddings with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries
How to use Pravallika6/scibert-cross-domain-embeddings with sentence-transformers:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Pravallika6/scibert-cross-domain-embeddings")

sentences = [
    "An Imbalanced Dataset with Multiple Feature Representations for Studying Quality Control of Next-Generation Sequencing [SEP] Next-generation sequencing (NGS) is a key technique for studying the DNA and RNA of organisms. However, identifying quality problems in NGS data across different experimental settings remains challenging. To develop automated quality-control tools, researchers require datasets with features that capture the characteristics of quality problems. Existing NGS repositories, however, offer only a limited number of quality-related features. To address this gap, we propose a dataset derived from 37.491 NGS samples with two types of quality-related feature representations. The first type consists of 34 features derived from quality control tools (QC-34 features). The second type has a variable number of features ranging from eight to 1.183. These features were derived from read counts in problematic genomic regions identified by the ENCODE blocklist (BL features). All features describe the same human and mouse samples from five genomic assays, allowing direct comparison of feature representations. The proposed dataset includes a binary quality label, derived from automated quality control and domain experts. Among all samples, $3.2\\%$ are of low quality. Supervised machine learning algorithms accurately predicted quality labels from the features, confirming the relevance of the provided feature representations. The proposed feature representations enable researchers to study how different feature types (QC-34 vs. BL features) and granularities (varying number of BL features) affect the detection of quality problems.",
    "Experimentally Resolving Gravity-Capillary Wave Evolution in Vessels of Unknown Boundary Conditions [SEP] The geometries of surface wave modes are determined by the highly nontrivial interplay of capillarity and wetting effects at the boundaries of their domain. Aside from idealised scenarios, this commonly leads to unknown boundary conditions, thereby hindering theoretical formulation and experimental analysis. To address this problem, we introduce Extracted Mode Tracking (EMT), a data-analysis framework to obtain instantaneous amplitude and phase content of axisymmetric surface-wave modes from spatio-temporal measurements. This approach uses unsupervised machine learning techniques to extract a basis of wave modes directly from collected data; the spatial profiles require no prior theoretical modelling, and so the issue of unknown boundary conditions is circumvented. Time-resolved mode amplitudes are reconstructed by geometric fitting at each recorded time-step, and the success is evaluated by a spectral signal-to-noise quantifier. Capabilities and limitations of EMT are systematically benchmarked on synthetic datasets, finding strong resilience against noise, improved accuracy over alternative methodologies, and the ability to operate with restricted domains which poses significant merit for use in experimental systems with limited measurement field-of-view. Finally, we conduct a Faraday-wave experiment in a regime highly sensitive to boundary effects in order to further validate the method, and demonstrate the observational access to nonlinear wave-dynamics enabled by EMT. These results establish EMT as a general tool for analysing wave mode dynamics of axially-symmetric fluid interface systems, and open pathways for quantitative studies of nonlinear mode-interactions, stability, and turbulence.",
    "AFSS: Artifact-Focused Self-Synthesis for Mitigating Bias in Audio Deepfake Detection [SEP] The rapid advancement of generative models has enabled highly realistic audio deepfakes, yet current detectors suffer from a critical bias problem, leading to poor generalization across unseen datasets. This paper proposes Artifact-Focused Self-Synthesis (AFSS), a method designed to mitigate this bias by generating pseudo-fake samples from real audio via two mechanisms: self-conversion and self-reconstruction. The core insight of AFSS lies in enforcing same-speaker constraints, ensuring that real and pseudo-fake samples share identical speaker identity and semantic content. This forces the detector to focus exclusively on generation artifacts rather than irrelevant confounding factors. Furthermore, we introduce a learnable reweighting loss to dynamically emphasize synthetic samples during training. Extensive experiments across 7 datasets demonstrate that AFSS achieves state-of-the-art performance with an average EER of 5.45\\%, including a significant reduction to 1.23\\% on WaveFake and 2.70\\% on In-the-Wild, all while eliminating the dependency on pre-collected fake datasets. Our code is publicly available at https://github.com/NguyenLeHaiSonGit/AFSS.",
    "Assessing 3D tree model quality and species classification using imbalance indices [SEP] We investigate the use of additional 3D and phylogenetic non-3D tree balance indices for analyzing and monitoring forests using an exemplary \"virtual forest\" dataset from the Wytham Woods, Oxford, UK. This study assesses 3D model quality, species classification performance, and the relevance of these indices. Our study shows that indices stemming from the study of ancestry trees of species can be successfully applied to 3D models of organic trees and, accompanied with recently introduced 3D imbalance indices, offer a complementary perspective on 3D tree models and improve the detection of deviations. Their computational efficiency combined with the simple and reproducible workflow presented in this manuscript form a computationally feasible quality control step in the 3D model construction. Species classification models reached an estimated accuracy of up to 81.8% and allowed to make confident species predictions for a large portion of the unlabeled trees in the dataset. While conventional tree metrics can already provide strong predictive performance, the addition of filtered 3D and non-3D statistics improved results consistently, particularly for minority species classes. Alongside this manuscript, we provide updated functionality in the R package treeDbalance to include the necessary functionalities and release the derived index datasets and species predictions."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]
Notebooks
Google Colab
Kaggle
New discussion
Resources
View closed (0)
Welcome to the community

The community tab is the place to discuss and collaborate with the HF community!