beckerv commited on
Commit
ac908fb
·
verified ·
1 Parent(s): 7e50eed

Upload from model-test

Browse files
Files changed (5) hide show
  1. README.md +140 -0
  2. config.json +17 -0
  3. model.pt +3 -0
  4. tokenizer.json +31 -0
  5. vocab.txt +84 -0
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ tags:
7
+ - bert
8
+ - text-classification
9
+ - nlp
10
+ - test
11
+ model_name: "Dummy BERT for Testing"
12
+ model_id: "test/bert-dummy"
13
+ inference: true
14
+
15
+ ---
16
+
17
+ # Dummy BERT Model
18
+
19
+ This is a test model created for experimental upload testing to Hugging Face using dmf-ng.
20
+
21
+ ## Model Details
22
+
23
+ ### Model Description
24
+
25
+ A minimal BERT model for testing artifact upload workflows with dmf-ng to Hugging Face Hub.
26
+
27
+ - **Developed by:** dmf-ng Test Suite
28
+ - **Model type:** Transformer-based language model
29
+ - **Library:** Transformers
30
+ - **License:** MIT
31
+
32
+ ### Model Architecture
33
+
34
+ - **Architecture:** BERT (Bidirectional Encoder Representations from Transformers)
35
+ - **Hidden Size:** 768
36
+ - **Number of Hidden Layers:** 12
37
+ - **Number of Attention Heads:** 12
38
+ - **Intermediate Size:** 3,072
39
+ - **Maximum Position Embeddings:** 512
40
+ - **Vocabulary Size:** 30,522
41
+
42
+ ### Model Configuration
43
+
44
+ ```json
45
+ {
46
+ "model_type": "bert",
47
+ "hidden_size": 768,
48
+ "num_hidden_layers": 12,
49
+ "num_attention_heads": 12,
50
+ "intermediate_size": 3072,
51
+ "hidden_act": "gelu",
52
+ "hidden_dropout_prob": 0.1,
53
+ "attention_probs_dropout_prob": 0.1,
54
+ "max_position_embeddings": 512,
55
+ "type_vocab_size": 2,
56
+ "initializer_range": 0.02,
57
+ "layer_norm_eps": 1e-12,
58
+ "pad_token_id": 0
59
+ }
60
+ ```
61
+
62
+ ## Files
63
+
64
+ - **model.pt** - PyTorch model weights (placeholder)
65
+ - **config.json** - Model configuration in HuggingFace format
66
+ - **tokenizer.json** - Tokenizer configuration
67
+ - **vocab.txt** - Vocabulary file with token mappings
68
+ - **README.md** - This model card
69
+
70
+ ## Intended Use
71
+
72
+ This model is **for testing purposes only** and should not be used for actual inference or production workloads.
73
+
74
+ ### Primary Intended Use
75
+
76
+ - Testing artifact upload workflows with dmf-ng
77
+ - Validating model card metadata
78
+ - Experimenting with Hugging Face Hub integration
79
+ - Testing lineage tracking with MLflow
80
+
81
+ ## Out-of-Scope Use Cases
82
+
83
+ - Production inference
84
+ - Real-world text classification tasks
85
+ - Fine-tuning on real datasets
86
+ - Deploying to inference endpoints
87
+
88
+
89
+
90
+ ## Technical Details
91
+
92
+ ### Model Inputs
93
+
94
+ - **input_ids**: Token IDs (shape: [batch_size, sequence_length])
95
+ - **attention_mask**: Binary mask for padding (shape: [batch_size, sequence_length])
96
+ - **token_type_ids**: Segment IDs for sentence pairs (shape: [batch_size, sequence_length])
97
+
98
+ ### Model Outputs
99
+
100
+ - **Hidden states** from the last transformer layer (shape: [batch_size, sequence_length, 768])
101
+ - **[CLS] token representation** for sequence classification tasks
102
+
103
+ ## Limitations and Biases
104
+
105
+ This is a dummy model created for testing purposes and does not represent a real, trained model. It has not been trained on any data and produces random outputs.
106
+
107
+ ## Training Data
108
+
109
+ None - this model was generated as test data.
110
+
111
+ ## Evaluation Results
112
+
113
+ Not applicable - this is a test model.
114
+
115
+ ## Environmental Impact
116
+
117
+ Minimal environmental impact - this is a test model used only for software development and testing.
118
+
119
+ ## How to Get Started
120
+
121
+ ```python
122
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
123
+
124
+ model_id = "your-username/test-model"
125
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
126
+ model = AutoModelForMaskedLM.from_pretrained(model_id)
127
+
128
+ # This model is not trained, so outputs are random
129
+ inputs = tokenizer("Hello, world!", return_tensors="pt")
130
+ outputs = model(**inputs)
131
+ ```
132
+
133
+
134
+ ## Model Card Contact
135
+
136
+ For issues related to this test model, please open an issue on the dmf-ng repository.
137
+
138
+ ---
139
+
140
+ **Note:** This is a test artifact. For production models, ensure comprehensive model cards with real training data, evaluation metrics, and bias analysis.
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "bert",
3
+ "hidden_size": 768,
4
+ "num_hidden_layers": 12,
5
+ "num_attention_heads": 12,
6
+ "intermediate_size": 3072,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "attention_probs_dropout_prob": 0.1,
10
+ "max_position_embeddings": 512,
11
+ "type_vocab_size": 2,
12
+ "initializer_range": 0.02,
13
+ "layer_norm_eps": 1e-12,
14
+ "pad_token_id": 0,
15
+ "vocabulary_size": 30522,
16
+ "description": "Dummy BERT model configuration for testing"
17
+ }
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c6bed4e8beea2ccd1c7e1ce5c86c9317c3651524d9e0fbe65789f2e2f5a431b
3
+ size 170
tokenizer.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [],
6
+ "normalizer": {
7
+ "type": "Sequence",
8
+ "normalizers": [
9
+ {"type": "Lowercase"},
10
+ {"type": "StripAccents"}
11
+ ]
12
+ },
13
+ "pre_tokenizer": {
14
+ "type": "WhitespaceSplit"
15
+ },
16
+ "post_processor": {
17
+ "type": "TemplateProcessing",
18
+ "single": "[CLS] $A [SEP]",
19
+ "pair": "[CLS] $A [SEP] $B:1 [SEP]:1"
20
+ },
21
+ "decoder": {
22
+ "type": "WordPiece",
23
+ "unknown": "[UNK]",
24
+ "prefix": "##"
25
+ },
26
+ "model": {
27
+ "type": "BPE",
28
+ "vocab_size": 30522,
29
+ "merges": []
30
+ }
31
+ }
vocab.txt ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [PAD]
2
+ [unused0]
3
+ [unused1]
4
+ [unused2]
5
+ [unused3]
6
+ [unused4]
7
+ [unused5]
8
+ [unused6]
9
+ [unused7]
10
+ [unused8]
11
+ [unused9]
12
+ [unused10]
13
+ [unused11]
14
+ [unused12]
15
+ [unused13]
16
+ [unused14]
17
+ [unused15]
18
+ [unused16]
19
+ [unused17]
20
+ [unused18]
21
+ [unused19]
22
+ [unused20]
23
+ [UNK]
24
+ [CLS]
25
+ [SEP]
26
+ [MASK]
27
+ !
28
+ "
29
+ #
30
+ $
31
+ %
32
+ &
33
+ '
34
+ (
35
+ )
36
+ *
37
+ +
38
+ ,
39
+ -
40
+ .
41
+ /
42
+ 0
43
+ 1
44
+ 2
45
+ 3
46
+ 4
47
+ 5
48
+ 6
49
+ 7
50
+ 8
51
+ 9
52
+ :
53
+ ;
54
+ <
55
+ =
56
+ >
57
+ ?
58
+ @
59
+ a
60
+ b
61
+ c
62
+ d
63
+ e
64
+ f
65
+ g
66
+ h
67
+ i
68
+ j
69
+ k
70
+ l
71
+ m
72
+ n
73
+ o
74
+ p
75
+ q
76
+ r
77
+ s
78
+ t
79
+ u
80
+ v
81
+ w
82
+ x
83
+ y
84
+ z