sentence-transformers How to use lufercho/AxvBert-Sentente-Transformer with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("lufercho/AxvBert-Sentente-Transformer")
sentences = [
"A Comprehensive Approach to Universal Piecewise Nonlinear Regression\n Based on Trees",
" In sparse recovery we are given a matrix $A$ (the dictionary) and a vector of\nthe form $A X$ where $X$ is sparse, and the goal is to recover $X$. This is a\ncentral notion in signal processing, statistics and machine learning. But in\napplications such as sparse coding, edge detection, compression and super\nresolution, the dictionary $A$ is unknown and has to be learned from random\nexamples of the form $Y = AX$ where $X$ is drawn from an appropriate\ndistribution --- this is the dictionary learning problem. In most settings, $A$\nis overcomplete: it has more columns than rows. This paper presents a\npolynomial-time algorithm for learning overcomplete dictionaries; the only\npreviously known algorithm with provable guarantees is the recent work of\nSpielman, Wang and Wright who gave an algorithm for the full-rank case, which\nis rarely the case in applications. Our algorithm applies to incoherent\ndictionaries which have been a central object of study since they were\nintroduced in seminal work of Donoho and Huo. In particular, a dictionary is\n$\\mu$-incoherent if each pair of columns has inner product at most $\\mu /\n\\sqrt{n}$.\n The algorithm makes natural stochastic assumptions about the unknown sparse\nvector $X$, which can contain $k \\leq c \\min(\\sqrt{n}/\\mu \\log n, m^{1/2\n-\\eta})$ non-zero entries (for any $\\eta > 0$). This is close to the best $k$\nallowable by the best sparse recovery algorithms even if one knows the\ndictionary $A$ exactly. Moreover, both the running time and sample complexity\ndepend on $\\log 1/\\epsilon$, where $\\epsilon$ is the target accuracy, and so\nour algorithms converge very quickly to the true dictionary. Our algorithm can\nalso tolerate substantial amounts of noise provided it is incoherent with\nrespect to the dictionary (e.g., Gaussian). In the noisy setting, our running\ntime and sample complexity depend polynomially on $1/\\epsilon$, and this is\nnecessary.\n",
" In this paper, we investigate adaptive nonlinear regression and introduce\ntree based piecewise linear regression algorithms that are highly efficient and\nprovide significantly improved performance with guaranteed upper bounds in an\nindividual sequence manner. We use a tree notion in order to partition the\nspace of regressors in a nested structure. The introduced algorithms adapt not\nonly their regression functions but also the complete tree structure while\nachieving the performance of the \"best\" linear mixture of a doubly exponential\nnumber of partitions, with a computational complexity only polynomial in the\nnumber of nodes of the tree. While constructing these algorithms, we also avoid\nusing any artificial \"weighting\" of models (with highly data dependent\nparameters) and, instead, directly minimize the final regression error, which\nis the ultimate performance goal. The introduced methods are generic such that\nthey can readily incorporate different tree construction methods such as random\ntrees in their framework and can use different regressor or partitioning\nfunctions as demonstrated in the paper.\n",
" In this paper we propose a multi-task linear classifier learning problem\ncalled D-SVM (Dictionary SVM). D-SVM uses a dictionary of parameter covariance\nshared by all tasks to do multi-task knowledge transfer among different tasks.\nWe formally define the learning problem of D-SVM and show two interpretations\nof this problem, from both the probabilistic and kernel perspectives. From the\nprobabilistic perspective, we show that our learning formulation is actually a\nMAP estimation on all optimization variables. We also show its equivalence to a\nmultiple kernel learning problem in which one is trying to find a re-weighting\nkernel for features from a dictionary of basis (despite the fact that only\nlinear classifiers are learned). Finally, we describe an alternative\noptimization scheme to minimize the objective function and present empirical\nstudies to valid our algorithm.\n"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]