zc277584121 commited on
Commit
5144c84
·
verified ·
1 Parent(s): 6f2fd2c

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Semantic Highlight Bilingual Model (Preview)
2
+
3
+ ## What is Semantic Highlight?
4
+
5
+ Traditional search highlighting works by matching keywords. When you search for "iPhone performance" on an e-commerce site, only the words "iPhone" and "performance" get highlighted in the results. But what if the product description says "Powered by A15 Bionic chip, scores over 1 million in benchmarks, smooth performance with no lag"? This clearly answers the performance question, yet nothing gets highlighted because it doesn't contain the exact word "performance".
6
+
7
+ **Semantic Highlight** solves this problem by understanding meaning, not just matching words. It highlights text segments that are semantically relevant to your query, even if they don't contain the exact keywords. This is crucial in RAG (Retrieval-Augmented Generation) scenarios where users need to quickly identify relevant information in long retrieved documents.
8
+
9
+ ### Why a Lightweight Model?
10
+
11
+ Highlighting happens on every search query - it needs to be fast and cost-effective. Large language models would be too slow and expensive for this real-time task. This model is designed to be:
12
+ - **Small**: ~560MB, deployable on standard servers
13
+ - **Fast**: Millisecond-level inference
14
+ - **Accurate**: Trained on context-relevance datasets
15
+
16
+ ## Model Details
17
+
18
+ - **Base Model**: BAAI/bge-reranker-v2-m3
19
+ - **Languages**: Chinese and English
20
+ - **Task**: Context relevance prediction for semantic highlighting
21
+ - **Status**: ⚠️ **Preview Version** - This is an experimental release
22
+
23
+ ## Quick Start
24
+
25
+ ### Installation
26
+
27
+ ```bash
28
+ pip install transformers torch
29
+ ```
30
+
31
+ ### Usage
32
+
33
+ #### English Example
34
+
35
+ ```python
36
+ from transformers import AutoModel
37
+
38
+ model = AutoModel.from_pretrained(
39
+ "Zilliz/semantic-highlight-bilingual-pre",
40
+ trust_remote_code=True
41
+ )
42
+
43
+ question = "How to improve Python code performance?"
44
+ context = """
45
+ Python optimization techniques include using numpy for vectorized operations,
46
+ avoiding object creation in loops, and utilizing built-in functions.
47
+ List comprehensions are faster than traditional loops.
48
+ Profiling tools like cProfile help identify bottlenecks.
49
+ """
50
+
51
+ result = model.process(
52
+ question=question,
53
+ context=context,
54
+ threshold=0.5,
55
+ language="en",
56
+ )
57
+
58
+ print("Relevant sentences:")
59
+ print(result["pruned_context"])
60
+ ```
61
+
62
+ #### Chinese Example
63
+
64
+ ```python
65
+ from transformers import AutoModel
66
+
67
+ model = AutoModel.from_pretrained(
68
+ "Zilliz/semantic-highlight-bilingual-pre",
69
+ trust_remote_code=True
70
+ )
71
+
72
+ question = "北京有哪些著名景点?"
73
+ context = """
74
+ 故宫是明清两代的皇家宫殿,占地面积约72万平方米。
75
+ 长城是中国古代的军事防御工程,东起山海关,西至嘉峪关。
76
+ 颐和园是清朝时期的皇家园林,以昆明湖和万寿山为主体。
77
+ 天安门广场是世界上最大的城市广场之一。
78
+ """
79
+
80
+ result = model.process(
81
+ question=question,
82
+ context=context,
83
+ threshold=0.5,
84
+ language="zh",
85
+ )
86
+
87
+ print("相关句子:")
88
+ print(result["pruned_context"])
89
+ ```
90
+
91
+ ## Parameters
92
+
93
+ - `question`: Query text
94
+ - `context`: Document text to highlight
95
+ - `threshold`: Relevance threshold (0-1), default 0.5. Lower values include more sentences.
96
+ - `language`: Language code ("en", "zh", or "auto")
97
+ - `return_sentence_metrics`: Return per-sentence relevance scores
98
+
99
+ ## Output
100
+
101
+ - `pruned_context`: Highlighted text (relevant sentences only)
102
+ - `compression_rate`: Percentage of text removed
103
+ - `sentence_probabilities`: Relevance score for each sentence (if `return_sentence_metrics=True`)
104
+
105
+ ## Notes
106
+
107
+ ⚠️ **This is a preview version.** The model is still under development and improvements are ongoing.
108
+
109
+ ## License
110
+
111
+ Same as base model: MIT License
112
+
113
+ ## Citation
114
+
115
+ If you use this model, please cite the base model:
116
+
117
+ ```bibtex
118
+ @misc{bge-reranker-v2-m3,
119
+ title={BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation},
120
+ author={Jianlv Chen and Shitao Xiao and Peitian Zhang and Kun Luo and Defu Lian and Zheng Liu},
121
+ year={2024},
122
+ eprint={2402.03216},
123
+ archivePrefix={arXiv},
124
+ primaryClass={cs.CL}
125
+ }
126
+ ```
config.json ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "OpenProvenceForSequenceClassification"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "modeling_open_provence_standalone.OpenProvenceConfig",
7
+ "AutoModel": "modeling_open_provence_standalone.OpenProvenceForSequenceClassification",
8
+ "AutoModelForSequenceClassification": "modeling_open_provence_standalone.OpenProvenceForSequenceClassification",
9
+ "AutoModelForTokenClassification": "modeling_open_provence_standalone.OpenProvenceForTokenClassification"
10
+ },
11
+ "base_model_config": {
12
+ "_name_or_path": "BAAI/bge-reranker-v2-m3",
13
+ "add_cross_attention": false,
14
+ "architectures": [
15
+ "XLMRobertaForSequenceClassification"
16
+ ],
17
+ "attention_probs_dropout_prob": 0.1,
18
+ "bad_words_ids": null,
19
+ "begin_suppress_tokens": null,
20
+ "bos_token_id": 0,
21
+ "chunk_size_feed_forward": 0,
22
+ "classifier_dropout": null,
23
+ "cross_attention_hidden_size": null,
24
+ "decoder_start_token_id": null,
25
+ "diversity_penalty": 0.0,
26
+ "do_sample": false,
27
+ "dtype": "float32",
28
+ "early_stopping": false,
29
+ "encoder_no_repeat_ngram_size": 0,
30
+ "eos_token_id": 2,
31
+ "exponential_decay_length_penalty": null,
32
+ "finetuning_task": null,
33
+ "forced_bos_token_id": null,
34
+ "forced_eos_token_id": null,
35
+ "hidden_act": "gelu",
36
+ "hidden_dropout_prob": 0.1,
37
+ "hidden_size": 1024,
38
+ "id2label": {
39
+ "0": "LABEL_0"
40
+ },
41
+ "initializer_range": 0.02,
42
+ "intermediate_size": 4096,
43
+ "is_decoder": false,
44
+ "is_encoder_decoder": false,
45
+ "label2id": {
46
+ "LABEL_0": 0
47
+ },
48
+ "layer_norm_eps": 1e-05,
49
+ "length_penalty": 1.0,
50
+ "max_length": 20,
51
+ "max_position_embeddings": 8194,
52
+ "min_length": 0,
53
+ "model_type": "xlm-roberta",
54
+ "no_repeat_ngram_size": 0,
55
+ "num_attention_heads": 16,
56
+ "num_beam_groups": 1,
57
+ "num_beams": 1,
58
+ "num_hidden_layers": 24,
59
+ "num_return_sequences": 1,
60
+ "output_attentions": false,
61
+ "output_hidden_states": false,
62
+ "output_past": true,
63
+ "output_scores": false,
64
+ "pad_token_id": 1,
65
+ "position_embedding_type": "absolute",
66
+ "prefix": null,
67
+ "problem_type": null,
68
+ "pruned_heads": {},
69
+ "remove_invalid_values": false,
70
+ "repetition_penalty": 1.0,
71
+ "return_dict": true,
72
+ "return_dict_in_generate": false,
73
+ "sep_token_id": null,
74
+ "suppress_tokens": null,
75
+ "task_specific_params": null,
76
+ "temperature": 1.0,
77
+ "tf_legacy_loss": false,
78
+ "tie_encoder_decoder": false,
79
+ "tie_word_embeddings": true,
80
+ "tokenizer_class": null,
81
+ "top_k": 50,
82
+ "top_p": 1.0,
83
+ "torchscript": false,
84
+ "transformers_version": "4.57.1",
85
+ "type_vocab_size": 1,
86
+ "typical_p": 1.0,
87
+ "use_bfloat16": false,
88
+ "use_cache": true,
89
+ "vocab_size": 250002
90
+ },
91
+ "base_model_name_or_path": "BAAI/bge-reranker-v2-m3",
92
+ "default_threadshold": null,
93
+ "default_threshold": null,
94
+ "encoder_architecture": "xlm-roberta",
95
+ "hidden_size": 1024,
96
+ "id2label": {
97
+ "0": "LABEL_0"
98
+ },
99
+ "label2id": {
100
+ "LABEL_0": 0
101
+ },
102
+ "max_length": 512,
103
+ "mode": "reranking_pruning",
104
+ "model_type": "open_provence",
105
+ "num_pruning_labels": 2,
106
+ "pruning_config": {
107
+ "_name_or_path": "",
108
+ "add_cross_attention": false,
109
+ "architectures": null,
110
+ "bad_words_ids": null,
111
+ "begin_suppress_tokens": null,
112
+ "bos_token_id": null,
113
+ "chunk_size_feed_forward": 0,
114
+ "classifier_dropout": 0.1,
115
+ "cross_attention_hidden_size": null,
116
+ "decoder_start_token_id": null,
117
+ "diversity_penalty": 0.0,
118
+ "do_sample": false,
119
+ "dtype": null,
120
+ "early_stopping": false,
121
+ "encoder_no_repeat_ngram_size": 0,
122
+ "eos_token_id": null,
123
+ "exponential_decay_length_penalty": null,
124
+ "finetuning_task": null,
125
+ "forced_bos_token_id": null,
126
+ "forced_eos_token_id": null,
127
+ "hidden_size": 1024,
128
+ "id2label": {
129
+ "0": "LABEL_0",
130
+ "1": "LABEL_1"
131
+ },
132
+ "is_decoder": false,
133
+ "is_encoder_decoder": false,
134
+ "label2id": {
135
+ "LABEL_0": 0,
136
+ "LABEL_1": 1
137
+ },
138
+ "length_penalty": 1.0,
139
+ "max_length": 20,
140
+ "min_length": 0,
141
+ "model_type": "open_provence_head",
142
+ "no_repeat_ngram_size": 0,
143
+ "num_beam_groups": 1,
144
+ "num_beams": 1,
145
+ "num_return_sequences": 1,
146
+ "output_attentions": false,
147
+ "output_hidden_states": false,
148
+ "output_scores": false,
149
+ "pad_token_id": null,
150
+ "prefix": null,
151
+ "problem_type": null,
152
+ "pruned_heads": {},
153
+ "remove_invalid_values": false,
154
+ "repetition_penalty": 1.0,
155
+ "return_dict": true,
156
+ "return_dict_in_generate": false,
157
+ "sentence_pooling": "mean",
158
+ "sep_token_id": null,
159
+ "suppress_tokens": null,
160
+ "task_specific_params": null,
161
+ "temperature": 1.0,
162
+ "tf_legacy_loss": false,
163
+ "tie_encoder_decoder": false,
164
+ "tie_word_embeddings": true,
165
+ "tokenizer_class": null,
166
+ "top_k": 50,
167
+ "top_p": 1.0,
168
+ "torchscript": false,
169
+ "transformers_version": "4.57.1",
170
+ "typical_p": 1.0,
171
+ "use_bfloat16": false,
172
+ "use_weighted_pooling": false
173
+ },
174
+ "tokenizer_name_or_path": null,
175
+ "transformers_version": "4.57.1",
176
+ "vocab_size": 250002
177
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:17222e5588c9913e14292bb6df4a35e863ccdff1346ca94599cf26edafe17a54
3
+ size 2271085700
modeling_open_provence_standalone.py ADDED
The diff for this file is too large to render. See raw diff
 
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bf8afbfd11306bd872018c53bfdf2e160a56f8edbcf49933324404791c148d3
3
+ size 17082900
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 8192,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "sp_model_kwargs": {},
54
+ "tokenizer_class": "XLMRobertaTokenizer",
55
+ "unk_token": "<unk>"
56
+ }
training_args.json ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_args": {
3
+ "model_name_or_path": "BAAI/bge-reranker-v2-m3",
4
+ "num_labels": null,
5
+ "classifier_dropout": 0.1,
6
+ "max_length": 512,
7
+ "config_name": null,
8
+ "tokenizer_name": null,
9
+ "cache_dir": null
10
+ },
11
+ "data_args": {
12
+ "dataset_name": "hotchpotch/wip-msmarco-context-relevance",
13
+ "subset": "msmarco-ja-minimal",
14
+ "teacher_column": null,
15
+ "datasets": [
16
+ {
17
+ "dataset_name": "hotchpotch/msmarco-context-relevance",
18
+ "subset": "freq2",
19
+ "teacher_column": "teacher_scores.gte-reranker-modernbert-base"
20
+ },
21
+ {
22
+ "dataset_name": "hotchpotch/natural-questions-context-relevance",
23
+ "subset": "nodup_freq2",
24
+ "teacher_column": "teacher_scores.gte-reranker-modernbert-base",
25
+ "items": 6
26
+ },
27
+ {
28
+ "dataset_name": "hotchpotch/gooaq-context-relevance-130k",
29
+ "subset": "default",
30
+ "teacher_column": "teacher_scores.gte-reranker-modernbert-base",
31
+ "items": 6
32
+ },
33
+ {
34
+ "dataset_name": "zc277584121/dureader-context-relevance-with-think",
35
+ "subset": "default"
36
+ },
37
+ {
38
+ "dataset_name": "zc277584121/chinese_wiki_0_300k-context-relevance-with-think",
39
+ "subset": "default"
40
+ }
41
+ ],
42
+ "items": null,
43
+ "max_train_samples": null,
44
+ "max_eval_samples": null,
45
+ "validation_split": null,
46
+ "validation_split_samples": null,
47
+ "validation_split_name": "validation",
48
+ "preprocessing_num_workers": null,
49
+ "filter_zero_relevance_max_items": null,
50
+ "filter_zero_relevance_max_items_reverse": false,
51
+ "filter_keep_first_item": false,
52
+ "upsample_factor": null
53
+ },
54
+ "training_args": {
55
+ "output_dir": "./output/bilingual-chinese-english-m3-v10_20251213_152447",
56
+ "overwrite_output_dir": true,
57
+ "do_train": true,
58
+ "do_eval": true,
59
+ "do_predict": false,
60
+ "eval_strategy": "steps",
61
+ "prediction_loss_only": false,
62
+ "per_device_train_batch_size": 2,
63
+ "per_device_eval_batch_size": 8,
64
+ "per_gpu_train_batch_size": null,
65
+ "per_gpu_eval_batch_size": null,
66
+ "gradient_accumulation_steps": 32,
67
+ "eval_accumulation_steps": null,
68
+ "eval_delay": 0,
69
+ "torch_empty_cache_steps": null,
70
+ "learning_rate": 5e-05,
71
+ "weight_decay": 0.01,
72
+ "adam_beta1": 0.9,
73
+ "adam_beta2": 0.999,
74
+ "adam_epsilon": 1e-08,
75
+ "max_grad_norm": 1.0,
76
+ "num_train_epochs": 3,
77
+ "max_steps": -1,
78
+ "lr_scheduler_type": "cosine",
79
+ "lr_scheduler_kwargs": {},
80
+ "warmup_ratio": 0.1,
81
+ "warmup_steps": 0,
82
+ "log_level": "passive",
83
+ "log_level_replica": "warning",
84
+ "log_on_each_node": true,
85
+ "logging_dir": "trainer_output/runs/Dec13_15-24-45_nvidiadgx",
86
+ "logging_strategy": "steps",
87
+ "logging_first_step": false,
88
+ "logging_steps": 363,
89
+ "logging_nan_inf_filter": true,
90
+ "save_strategy": "steps",
91
+ "save_steps": 500,
92
+ "save_total_limit": 3,
93
+ "save_safetensors": true,
94
+ "save_on_each_node": false,
95
+ "save_only_model": false,
96
+ "restore_callback_states_from_checkpoint": false,
97
+ "no_cuda": false,
98
+ "use_cpu": false,
99
+ "use_mps_device": false,
100
+ "seed": 42,
101
+ "data_seed": null,
102
+ "jit_mode_eval": false,
103
+ "bf16": true,
104
+ "fp16": false,
105
+ "fp16_opt_level": "O1",
106
+ "half_precision_backend": "auto",
107
+ "bf16_full_eval": false,
108
+ "fp16_full_eval": false,
109
+ "tf32": null,
110
+ "local_rank": 7,
111
+ "ddp_backend": null,
112
+ "tpu_num_cores": null,
113
+ "tpu_metrics_debug": false,
114
+ "debug": [],
115
+ "dataloader_drop_last": false,
116
+ "eval_steps": 1815,
117
+ "dataloader_num_workers": 4,
118
+ "dataloader_prefetch_factor": null,
119
+ "past_index": -1,
120
+ "run_name": "bilingual-chinese-english-m3-v10-20251213_152447",
121
+ "disable_tqdm": false,
122
+ "remove_unused_columns": false,
123
+ "label_names": null,
124
+ "load_best_model_at_end": true,
125
+ "metric_for_best_model": "eval_loss",
126
+ "greater_is_better": false,
127
+ "ignore_data_skip": false,
128
+ "fsdp": [],
129
+ "fsdp_min_num_params": 0,
130
+ "fsdp_config": {
131
+ "min_num_params": 0,
132
+ "xla": false,
133
+ "xla_fsdp_v2": false,
134
+ "xla_fsdp_grad_ckpt": false
135
+ },
136
+ "fsdp_transformer_layer_cls_to_wrap": null,
137
+ "accelerator_config": "AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False)",
138
+ "parallelism_config": null,
139
+ "deepspeed": null,
140
+ "label_smoothing_factor": 0.0,
141
+ "optim": "adafactor",
142
+ "optim_args": null,
143
+ "adafactor": false,
144
+ "group_by_length": false,
145
+ "length_column_name": "length",
146
+ "report_to": [
147
+ "wandb"
148
+ ],
149
+ "project": "huggingface",
150
+ "trackio_space_id": "trackio",
151
+ "ddp_find_unused_parameters": null,
152
+ "ddp_bucket_cap_mb": null,
153
+ "ddp_broadcast_buffers": null,
154
+ "dataloader_pin_memory": true,
155
+ "dataloader_persistent_workers": false,
156
+ "skip_memory_metrics": true,
157
+ "use_legacy_prediction_loop": false,
158
+ "push_to_hub": false,
159
+ "resume_from_checkpoint": null,
160
+ "hub_model_id": null,
161
+ "hub_strategy": "every_save",
162
+ "hub_token": null,
163
+ "hub_private_repo": null,
164
+ "hub_always_push": false,
165
+ "hub_revision": null,
166
+ "gradient_checkpointing": false,
167
+ "gradient_checkpointing_kwargs": null,
168
+ "include_inputs_for_metrics": false,
169
+ "include_for_metrics": [],
170
+ "eval_do_concat_batches": true,
171
+ "fp16_backend": "auto",
172
+ "push_to_hub_model_id": null,
173
+ "push_to_hub_organization": null,
174
+ "push_to_hub_token": null,
175
+ "mp_parameters": "",
176
+ "auto_find_batch_size": false,
177
+ "full_determinism": false,
178
+ "torchdynamo": null,
179
+ "ray_scope": "last",
180
+ "ddp_timeout": 1800,
181
+ "torch_compile": false,
182
+ "torch_compile_backend": null,
183
+ "torch_compile_mode": null,
184
+ "include_tokens_per_second": false,
185
+ "include_num_input_tokens_seen": "no",
186
+ "neftune_noise_alpha": null,
187
+ "optim_target_modules": null,
188
+ "batch_eval_metrics": false,
189
+ "eval_on_start": false,
190
+ "use_liger_kernel": false,
191
+ "liger_kernel_config": null,
192
+ "eval_use_gather_object": false,
193
+ "average_tokens_across_devices": true,
194
+ "ranking_weight": 0.0,
195
+ "pruning_weight": 1.0,
196
+ "use_teacher_scores": true,
197
+ "sentence_level_pruning": true,
198
+ "eval_datasets": {
199
+ "config": "configs/eval_datasets/bilingual_nano.yaml",
200
+ "threshold": 0.1,
201
+ "batch_size": 32
202
+ },
203
+ "distributed_state": "Distributed environment: DistributedType.MULTI_GPU Backend: nccl\nNum processes: 8\nProcess index: 7\nLocal process index: 7\nDevice: cuda:7\n",
204
+ "deepspeed_plugin": null
205
+ }
206
+ }