Instructions to use dwulff/minilm-brl with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dwulff/minilm-brl with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("dwulff/minilm-brl")

sentences = [
    "An article on behavioral reinforcement learning:\n\nTitle: Cell-ŧype-specific responses to associative learning in the primary motor cortex.\nAbstract: The primary motor cortex (M1) is known to be a critical site for movement initiation and motor learning. Surprisingly, it has also been shown to possess reward-related activity, presumably to facilitate reward-based learning of new movements. However, whether reward-related signals are represented among different cell types in M1, and whether their response properties change after cue-reward conditioning remains unclear. Here, we performed longitudinal in vivo two-photon Ca2+ imaging to monitor the activity of different neuronal cell types in M1 while mice engaged in a classical conditioning task. Our results demonstrate that most of the major neuronal cell types in M1 showed robust but differential responses to both the conditioned cue stimulus (CS) and reward, and their response properties undergo cell-ŧype-specific modifications after associative learning. PV-INs’ responses became more reliable to the CS, while VIP-INs’ responses became more reliable to reward. Pyramidal neurons only showed robust responses to novel reward, and they habituated to it after associative learning. Lastly, SOM-INs’ responses emerged and became more reliable to both the CS and reward after conditioning. These observations suggest that cue- and reward-related signals are preferentially represented among different neuronal cell types in M1, and the distinct modifications they undergo during associative learning could be essential in triggering different aspects of local circuit reorganization in M1 during reward-based motor skill learning.",
    "An article on behavioral reinforcement learning:\n\nTitle: Learning to construct sentences in Spanish: A replication of the Weird Word Order technique.\nAbstract: In the present study, children's early ability to organise words into sentences was investigated using the Weird Word Order procedure with Spanish-speaking children. Spanish is a language that allows for more flexibility in the positions of subjects and objects, with respect to verbs, than other previously studied languages (English, French, and Japanese). As in prior studies (Abbot-Smith et al., 2001; Chang et al., 2009; Franck et al., 2011; Matthews et al., 2005, 2007), we manipulated the relative frequency of verbs in training sessions with two age groups (three-A nd four-year-old children). Results supported earlier findings with regard to frequency: Children produced atypical word orders significantly more often with infrequent verbs than with frequent verbs. The findings from the present study support probabilistic learning models which allow higher levels of flexibility and, in turn, oppose hypotheses that defend early access to advanced grammatical knowledge.",
    "An article on behavioral reinforcement learning:\n\nTitle: What are the computations of the cerebellum, the basal ganglia and the cerebral cortex?.\nAbstract: The classical notion that the cerebellum and the basal ganglia are dedicated to motor control is under dispute given increasing evidence of their involvement in non-motor functions. Is it then impossible to characterize the functions of the cerebellum, the basal ganglia and the cerebral cortex in a simplistic manner? This paper presents a novel view that their computational roles can be characterized not by asking what are the 'goals' of their computation, such as motor or sensory, but by asking what are the 'methods' of their computation, specifically, their learning algorithms. There is currently enough anatomical, physiological, and theoretical evidence to support the hypotheses that the cerebellum is a specialized organism for supervised learning, the basal ganglia are for reinforcement learning, and the cerebral cortex is for unsupervised learning.This paper investigates how the learning modules specialized for these three kinds of learning can be assembled into goal-oriented behaving systems. In general, supervised learning modules in the cerebellum can be utilized as 'internal models' of the environment. Reinforcement learning modules in the basal ganglia enable action selection by an 'evaluation' of environmental states. Unsupervised learning modules in the cerebral cortex can provide statistically efficient representation of the states of the environment and the behaving system. Two basic action selection architectures are shown, namely, reactive action selection and predictive action selection. They can be implemented within the anatomical constraint of the network linking these structures. Furthermore, the use of the cerebellar supervised learning modules for state estimation, behavioral simulation, and encapsulation of learned skill is considered. Finally, the usefulness of such theoretical frameworks in interpreting brain imaging data is demonstrated in the paradigm of procedural learning.",
    "An article on behavioral reinforcement learning:\n\nTitle: Repeated decisions and attitudes to risk.\nAbstract: In contrast to the underpinnings of expected utility, the experimental pilot study results reported here suggest that current decisions may be influenced both by past decisions and by the possibility of making decisions in the future."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dwulff/minilm-brl")
# Run inference
sentences = [
    'An article on behavioral reinforcement learning:\n\nTitle: Confidence and the description–experience distinction.\nAbstract: In this paper, we extend the literature on the description–experience gap in risky choices by focusing on how the mode of learning—through description or experience—affects confidence. Specifically, we explore how learning through description or experience affects confidence in (1) the information gathered to make a decision and (2) the resulting choice. In two preregistered experiments we tested whether there was a description–experience gap in both dimensions of confidence. Learning from description was associated with higher confidence—both in the information gathered and in the choice made—than was learning from experience. In a third preregistered experiment, we examined the effect of sample size on confidence in decisions from experience. Contrary to the normative view that larger samples foster confidence in statistical inference, we observed that more experience led to less confidence. This observation is reminiscent of recent theories of deliberate ignorance, which highlight the adaptive benefits of deliberately limiting information search.',
    "An article on behavioral reinforcement learning:\n\nTitle: How (in)variant are subjective representations of described and experienced risk and rewards?.\nAbstract: Decisions under risk have been shown to differ depending on whether information on outcomes and probabilities is gleaned from symbolic descriptions or gathered through experience. To some extent, this description–experience gap is due to sampling error in experience-based choice. Analyses with cumulative prospect theory (CPT), investigating to what extent the gap is also driven by differences in people's subjective representations of outcome and probability information (taking into account sampling error), have produced mixed results. We improve on previous analyses of description-based and experience-based choices by taking advantage of both a within-subjects design and a hierarchical Bayesian implementation of CPT. This approach allows us to capture both the differences and the within-person stability of individuals’ subjective representations across the two modes of learning about choice options. Relative to decisions from description, decisions from experience showed reduced sensitivity to probabilities and increased sensitivity to outcomes. For some CPT parameters, individual differences were relatively stable across modes of learning. Our results suggest that outcome and probability information translate into systematically different subjective representations in description- versus experience-based choice. At the same time, both types of decisions seem to tap into the same individual-level regularities.",
    "An article on behavioral reinforcement learning:\n\nTitle: Do narcissists make better decisions? An investigation of narcissism and dynamic decision-making performance.\nAbstract: We investigated whether narcissism affected dynamic decision-making performance in the presence and absence of misleading information. Performance was examined in a two-choice dynamic decision-making task where the optimal strategy was to forego an option providing larger immediate rewards in favor of an option that led to larger delayed rewards. Information regarding foregone rewards from the alternate option was presented or withheld to bias participants toward the sub-optimal choice. The results demonstrated that individuals high in narcissistic traits performed comparably to low narcissism individuals when foregone reward information was absent, but high narcissism individuals outperformed individuals low in narcissistic traits when misleading information was presented. The advantage for participants high in narcissistic traits was strongest within males, and, overall, males outperformed females when foregone rewards were present. While prior research emphasizes narcissists' decision-making deficits, our findings provide evidence that individuals high in narcissistic traits excel at decision-making tasks that involve disregarding ambiguous information and focusing on the long-term utility of each option. Their superior ability at filtering out misleading information may reflect an effort to maintain their self-view or avoid ego threat.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 50,000 training samples
Columns: sentence_0, sentence_1, and label

Approximate statistics based on the first 1000 samples:

	sentence_0	sentence_1	label
type	string	string	float
details	min: 102 tokens mean: 237.66 tokens max: 256 tokens	min: 61 tokens mean: 227.84 tokens max: 256 tokens	min: 0.0 mean: 0.17 max: 0.9

Samples:

sentence_0	sentence_1	label
An article on behavioral reinforcement learning: Title: Working memory and response selection: A computational account of interactions among cortico-basalganglio-thalamic loops. Abstract: Cortico-basalganglio-thalamic loops are involved in both cognitive processes and motor control. We present a biologically meaningful computational model of how these loops contribute to the organization of working memory and the development of response behavior. Via reinforcement learning in basal ganglia, the model develops flexible control of working memory within prefrontal loops and achieves selection of appropriate responses based on working memory content and visual stimulation within a motor loop. We show that both working memory control and response selection can evolve within parallel and interacting cortico-basalganglio-thalamic loops by Hebbian and three-factor learning rules. Furthermore, the model gives a coherent explanation for how complex strategies of working memory control and respo...	An article on behavioral reinforcement learning: Title: The role of basal ganglia in reinforcement learning and imprinting in domestic chicks. Abstract: Effects of bilateral kainate lesions of telencephalic basal ganglia (lobus parolfactorius, LPO) were examined in domestic chicks. In the imprinting paradigm, where chicks learned to selectively approach a moving object without any explicitly associated reward, both the pre- and post-training lesions were without effects. On the other hand, in the water-reinforced pecking task, pre-training lesions of LPO severely impaired immediate reinforcement as well as formation of the association memory. However, post-training LPO lesions did not cause amnesia, and chicks selectively pecked at the reinforced color. The LPO could thus be involved specifically in the evaluation of present rewards and the instantaneous reinforcement of pecking, but not in the execution of selective behavior based on a memorized color cue.	`0.5`
An article on behavioral reinforcement learning: Title: Exploration Disrupts Choice-Predictive Signals and Alters Dynamics in Prefrontal Cortex. Abstract: In uncertain environments, decision-makers must balance two goals: they must “exploit” rewarding options but also “explore” in order to discover rewarding alternatives. Exploring and exploiting necessarily change how the brain responds to identical stimuli, but little is known about how these states, and transitions between them, change how the brain transforms sensory information into action. To address this question, we recorded neural activity in a prefrontal sensorimotor area while monkeys naturally switched between exploring and exploiting rewarding options. We found that exploration profoundly reduced spatially selective, choice-predictive activity in single neurons and delayed choice-predictive population dynamics. At the same time, reward learning was increased in brain and behavior. These results indicate that exploration i...	An article on behavioral reinforcement learning: Title: Counterfactual choice and learning in a Neural Network centered on human lateral frontopolar cortex. Abstract: Decision making and learning in a real-world context require organisms to track not only the choices they make and the outcomes that follow but also other untaken, or counterfactual, choices and their outcomes. Although the neural system responsible for tracking the value of choices actually taken is increasingly well understood, whether a neural system tracks counterfactual information is currently unclear. Using a three-alternative decision-making task, a Bayesian reinforcement-learning algorithm, and fMRI, we investigated the coding of counterfactual choices and prediction errors in the human brain. Rather than representing evidence favoring multiple counterfactual choices, lateral frontal polar cortex (lFPC), dorsomedial frontal cortex (DMFC), and posteromedial cortex (PMC) encode the reward-based evidence favoring t...	`0.5`
An article on behavioral reinforcement learning: Title: Electrophysiological signatures of visual statistical learning in 3-month-old infants at familial and low risk for autism spectrum disorder. Abstract: Visual statistical learning (VSL) refers to the ability to extract associations and conditional probabilities within the visual environment. It may serve as a precursor to cognitive and social communication development. Quantifying VSL in infants at familial risk (FR) for Autism Spectrum Disorder (ASD) provides opportunities to understand how genetic predisposition can influence early learning processes which may, in turn, lay a foundation for cognitive and social communication delays. We examined electroencephalography (EEG) signatures of VSL in 3-month-old infants, examining whether EEG correlates of VSL differentiated FR from low-risk (LR) infants. In an exploratory analysis, we then examined whether EEG correlates of VSL at 3 months relate to cognitive function and ASD symptoms...	An article on behavioral reinforcement learning: Title: Reduced nucleus accumbens reactivity and adolescent depression following early-life stress. Abstract: Depression is a common outcome for those having experienced early-life stress (ELS). For those individuals, depression typically increases during adolescence and appears to endure into adulthood, suggesting alterations in the development of brain systems involved in depression. Developmentally, the nucleus accumbens (NAcc), a limbic structure associated with reward learning and motivation, typically undergoes dramatic functional change during adolescence; therefore, age-related changes in NAcc function may underlie increases in depression in adolescence following ELS. The current study examined the effects of ELS in 38 previously institutionalized children and adolescents in comparison to a group of 31 youths without a history of ELS. Consistent with previous research, the findings showed that depression was higher in adolescents...	`0.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
per_device_eval_batch_size: 64
num_train_epochs: 5
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: no
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin

Training Logs

Epoch	Step	Training Loss
0.6394	500	0.0179
1.2788	1000	0.0124
1.9182	1500	0.0107
2.5575	2000	0.0092
3.1969	2500	0.0086
3.8363	3000	0.0078
4.4757	3500	0.0073

Framework Versions

Python: 3.13.2
Sentence Transformers: 4.0.2
Transformers: 4.50.0.dev0
PyTorch: 2.6.0
Accelerate: 1.5.2
Datasets: 3.5.0
Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}