Halfotter commited on
Commit
14ebc37
Β·
verified Β·
1 Parent(s): ab6a2a2

Upload 16 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Steel Industry Material Classification Model
2
+
3
+ This model is trained to classify steel industry materials and products based on text descriptions. It uses XLM-RoBERTa as the base model and can classify input text into 66 different steel-related categories.
4
+
5
+ ## Model Details
6
+
7
+ - **Base Model**: XLM-RoBERTa
8
+ - **Task**: Sequence Classification
9
+ - **Number of Labels**: 66
10
+ - **Languages**: Korean, English (multilingual support)
11
+ - **Model Size**: ~1GB
12
+
13
+ ## Supported Labels
14
+
15
+ The model can classify the following steel industry materials:
16
+
17
+ - Raw Materials: 철광석, μ„νšŒμ„, μ„μœ  μ½”ν¬μŠ€, 무연탄, κ°ˆνƒ„, 아역청탄, ν”ΌνŠΈ (Peat), 였일 셰일
18
+ - Fuels: μ²œμ—°κ°€μŠ€, μ•‘ν™”μ²œμ—°κ°€μŠ€, 경유, 휘발유, λ“±μœ , λ‚˜ν”„νƒ€, 페트둀 및 SBP, μž”λ₯˜ μ—°λ£Œμœ 
19
+ - Gases: μΌμ‚°ν™”νƒ„μ†Œ, 메탄, 에탄, κ³ λ‘œκ°€μŠ€, μ½”ν¬μŠ€ 였븐 κ°€μŠ€, μ‚°μ†Œ μ œκ°•λ‘œ κ°€μŠ€, μ†Œμ„±κ°€μŠ€, κ°€μŠ€κ³΅μž₯ κ°€μŠ€
20
+ - Products: κ°•μ² , μ„ μ² , μ² , μ—΄κ°„μ„±ν˜•μ²  (HBI), 고온 μ„±ν˜• ν™˜μ›μ² , 직접 ν™˜μ›μ² 
21
+ - By-products: 고둜 슬래그, μ••μ—° μŠ€μΌ€μΌ, λΆ„μ§„, μŠ¬λŸ¬μ§€, μ ˆμ‚­μΉ©
22
+ - Others: μ „κΈ°, λƒ‰κ°μˆ˜, μœ€ν™œμœ , 포μž₯재, μ—΄μœ μž…, μ˜€λ¦¬λ©€μ „, νŽ λ ›
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
28
+ import torch
29
+
30
+ # Load model and tokenizer
31
+ model_name = "your-username/steel-material-classifier"
32
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
33
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
34
+
35
+ # Prepare input
36
+ text = "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •"
37
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
38
+
39
+ # Predict
40
+ with torch.no_grad():
41
+ outputs = model(**inputs)
42
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
43
+ predicted_class = torch.argmax(predictions, dim=1).item()
44
+
45
+ # Get label
46
+ label = model.config.id2label[predicted_class]
47
+ confidence = predictions[0][predicted_class].item()
48
+
49
+ print(f"Predicted: {label}")
50
+ print(f"Confidence: {confidence:.4f}")
51
+ ```
52
+
53
+ ## Training Data
54
+
55
+ The model was trained on steel industry material descriptions and technical documents, focusing on Korean and English text related to steel manufacturing processes.
56
+
57
+ ## Performance
58
+
59
+ - **Label Independence**: Good (average similarity: 0.1166)
60
+ - **Orthogonality**: Good (average dot product: 0.2043)
61
+ - **Overall Assessment**: The model shows good separation between different material categories
62
+
63
+ ## License
64
+
65
+ [Add your license information here]
66
+
67
+ ## Citation
68
+
69
+ If you use this model in your research, please cite:
70
+
71
+ ```bibtex
72
+ [Add citation information here]
73
+ ```
UPLOAD_GUIDE.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Steel Material Classification Model Upload Guide
2
+
3
+ ## Step 1: Get Hugging Face Token
4
+
5
+ 1. Go to https://huggingface.co/settings/tokens
6
+ 2. Click "New token"
7
+ 3. Give it a name (e.g., "model-upload-token")
8
+ 4. Select "Write" role
9
+ 5. Copy the token
10
+
11
+ ## Step 2: Login to Hugging Face
12
+
13
+ ```bash
14
+ huggingface-cli login
15
+ # Enter your token when prompted
16
+ ```
17
+
18
+ ## Step 3: Create Model Repository
19
+
20
+ ```bash
21
+ huggingface-cli repo create steel-material-classifier --type model
22
+ ```
23
+
24
+ ## Step 4: Upload Model
25
+
26
+ ```bash
27
+ # Clone the repository
28
+ git clone https://huggingface.co/YOUR_USERNAME/steel-material-classifier
29
+ cd steel-material-classifier
30
+
31
+ # Copy all files from model_v24 directory
32
+ # Then commit and push
33
+ git add .
34
+ git commit -m "Initial commit: Steel material classification model"
35
+ git push
36
+ ```
37
+
38
+ ## Alternative: Direct Upload
39
+
40
+ ```bash
41
+ # From the model_v24 directory
42
+ huggingface-cli upload YOUR_USERNAME/steel-material-classifier . --include "*.json,*.safetensors,*.pkl,*.md,*.txt,*.py"
43
+ ```
44
+
45
+ ## Files to Upload
46
+
47
+ ### Required Files:
48
+ - βœ… config.json
49
+ - βœ… model.safetensors
50
+ - βœ… tokenizer.json
51
+ - βœ… tokenizer_config.json
52
+ - βœ… special_tokens_map.json
53
+ - βœ… label_mapping.json
54
+
55
+ ### Optional Files:
56
+ - βœ… classifier.pkl
57
+ - βœ… label_embeddings.pkl
58
+ - βœ… label_embeddings.pkl.backup
59
+
60
+ ### Documentation Files:
61
+ - βœ… README.md
62
+ - βœ… requirements.txt
63
+ - βœ… inference.py
64
+ - βœ… preprocessor.py
65
+ - βœ… model_card.md
66
+ - βœ… usage.md
67
+
68
+ ## Model Information
69
+
70
+ - **Model Name**: steel-material-classifier
71
+ - **Base Model**: XLM-RoBERTa
72
+ - **Task**: Sequence Classification
73
+ - **Labels**: 66 steel industry materials
74
+ - **Languages**: Korean, English
75
+ - **Model Size**: ~1GB
76
+
77
+ ## Usage After Upload
78
+
79
+ ```python
80
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
81
+ import torch
82
+
83
+ # Load model
84
+ model_name = "YOUR_USERNAME/steel-material-classifier"
85
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
86
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
87
+
88
+ # Predict
89
+ text = "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •"
90
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
91
+
92
+ with torch.no_grad():
93
+ outputs = model(**inputs)
94
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
95
+ predicted_class = torch.argmax(predictions, dim=1).item()
96
+
97
+ label = model.config.id2label[predicted_class]
98
+ confidence = predictions[0][predicted_class].item()
99
+ print(f"Predicted: {label} (Confidence: {confidence:.4f})")
100
+ ```
classifier.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf2ac4313a1006caa5b470331fcddcf7dd2d368e5822b1c4df3d3926929c8a5e
3
+ size 204311
config.json ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "xlm-roberta-base",
3
+ "architectures": [
4
+ "XLMRobertaForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": 0.1,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "num_labels": 66,
21
+ "output_past": true,
22
+ "pad_token_id": 1,
23
+ "position_embedding_type": "absolute",
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.35.2",
26
+ "type_vocab_size": 1,
27
+ "use_cache": true,
28
+ "vocab_size": 250002,
29
+ "id2label": {
30
+ "0": "점결탄",
31
+ "1": "μ‚°ν™”λ§ˆκ·Έλ„€μŠ˜",
32
+ "2": "였븐 μ½”ν¬μŠ€",
33
+ "3": "μ½œνƒ€λ₯΄",
34
+ "4": "직접 ν™˜μ›μ² ",
35
+ "5": "μΌμ‚°ν™”νƒ„μ†Œ",
36
+ "6": "μ²œμ—°κ°€μŠ€",
37
+ "7": "κ°ˆνƒ„",
38
+ "8": "페트둀 및 SBP",
39
+ "9": "μ—­μ²­",
40
+ "10": "λƒ‰κ°μˆ˜",
41
+ "11": "κ°•μ² ",
42
+ "12": "μ„νšŒμ„",
43
+ "13": "산업폐기물",
44
+ "14": "메탄",
45
+ "15": "고둜 슬래그",
46
+ "16": "철 슀크랩",
47
+ "17": "λΆ„μ§„",
48
+ "18": "μœ€ν™œμœ ",
49
+ "19": "μ•‘ν™”μ„μœ κ°€μŠ€",
50
+ "20": "κ°•μ²  슀크랩",
51
+ "21": "νƒ„μ‚°λ¦¬νŠ¬",
52
+ "22": "경유",
53
+ "23": "μž”λ₯˜ μ—°λ£Œμœ ",
54
+ "24": "μ „κΈ°",
55
+ "25": "무연탄",
56
+ "26": "였일 셰일",
57
+ "27": "철광석",
58
+ "28": "νƒ„μ‚°μˆ˜μ†Œλ‚˜νŠΈλ₯¨",
59
+ "29": "탄산바λ₯¨",
60
+ "30": "포μž₯재",
61
+ "31": "μ•‘ν™” μ²œμ—°κ°€μŠ€",
62
+ "32": "μŠ¬λŸ¬μ§€",
63
+ "33": "μ†Œλ‹€νšŒ",
64
+ "34": "μ‚°ν™”λ°”λ₯¨",
65
+ "35": "κ°€μŠ€κ³΅μž₯ κ°€μŠ€",
66
+ "36": "폐유",
67
+ "37": "EAF νƒ„μ†Œ μ „κ·Ή",
68
+ "38": "μ••μ—° μŠ€μΌ€μΌ",
69
+ "39": "μ½”ν¬μŠ€ 였븐 κ°€μŠ€",
70
+ "40": "EAF μΆ©μ „ νƒ„μ†Œ",
71
+ "41": "κ³ λ‘œκ°€μŠ€",
72
+ "42": "μ—΄κ°„μ„±ν˜•μ²  (HBI)",
73
+ "43": "ν”ΌνŠΈ (Peat)",
74
+ "44": "μ„ μ² ",
75
+ "45": "μ›μœ ",
76
+ "46": "μ‚°μ†Œ μ œκ°•λ‘œ κ°€μŠ€",
77
+ "47": "μ—΄μœ μž…",
78
+ "48": "μ ˆμ‚­μΉ©",
79
+ "49": "아역청탄",
80
+ "50": "λ§ˆκ·Έλ„€μ‚¬μ΄νŠΈ",
81
+ "51": "μ„μœ  μ½”ν¬μŠ€",
82
+ "52": "νŽ λ ›",
83
+ "53": "μ˜€λ¦¬λ©€μ „",
84
+ "54": "μ•‘ν™” μ„μœ κ°€μŠ€",
85
+ "55": "λ“±μœ ",
86
+ "56": "μ†Œμ„±κ°€μŠ€",
87
+ "57": "에탄",
88
+ "58": "μ‚°ν™”μΉΌμŠ˜",
89
+ "59": "λ‚˜ν”„νƒ€",
90
+ "60": "μ² ",
91
+ "61": "λŠ₯μ² κ΄‘",
92
+ "62": "μ†Œκ²°κ΄‘",
93
+ "63": "고온 μ„±ν˜• ν™˜μ›μ² ",
94
+ "64": "휘발유",
95
+ "65": "νƒ„μ‚°μŠ€νŠΈλ‘ νŠ¬"
96
+ },
97
+ "label2id": {
98
+ "점결탄": 0,
99
+ "μ‚°ν™”λ§ˆκ·Έλ„€μŠ˜": 1,
100
+ "였븐 μ½”ν¬μŠ€": 2,
101
+ "μ½œνƒ€λ₯΄": 3,
102
+ "직접 ν™˜μ›μ² ": 4,
103
+ "μΌμ‚°ν™”νƒ„μ†Œ": 5,
104
+ "μ²œμ—°κ°€μŠ€": 6,
105
+ "κ°ˆνƒ„": 7,
106
+ "페트둀 및 SBP": 8,
107
+ "μ—­μ²­": 9,
108
+ "λƒ‰κ°μˆ˜": 10,
109
+ "κ°•μ² ": 11,
110
+ "μ„νšŒμ„": 12,
111
+ "산업폐기물": 13,
112
+ "메탄": 14,
113
+ "고둜 슬래그": 15,
114
+ "철 슀크랩": 16,
115
+ "λΆ„μ§„": 17,
116
+ "μœ€ν™œμœ ": 18,
117
+ "μ•‘ν™”μ„μœ κ°€μŠ€": 19,
118
+ "κ°•μ²  슀크랩": 20,
119
+ "νƒ„μ‚°λ¦¬νŠ¬": 21,
120
+ "경유": 22,
121
+ "μž”λ₯˜ μ—°λ£Œμœ ": 23,
122
+ "μ „κΈ°": 24,
123
+ "무연탄": 25,
124
+ "였일 셰일": 26,
125
+ "철광석": 27,
126
+ "νƒ„μ‚°μˆ˜μ†Œλ‚˜νŠΈλ₯¨": 28,
127
+ "탄산바λ₯¨": 29,
128
+ "포μž₯재": 30,
129
+ "μ•‘ν™” μ²œμ—°κ°€μŠ€": 31,
130
+ "μŠ¬λŸ¬μ§€": 32,
131
+ "μ†Œλ‹€νšŒ": 33,
132
+ "μ‚°ν™”λ°”λ₯¨": 34,
133
+ "κ°€μŠ€κ³΅μž₯ κ°€μŠ€": 35,
134
+ "폐유": 36,
135
+ "EAF νƒ„μ†Œ μ „κ·Ή": 37,
136
+ "μ••μ—° μŠ€μΌ€μΌ": 38,
137
+ "μ½”ν¬μŠ€ 였븐 κ°€μŠ€": 39,
138
+ "EAF μΆ©μ „ νƒ„μ†Œ": 40,
139
+ "κ³ λ‘œκ°€μŠ€": 41,
140
+ "μ—΄κ°„μ„±ν˜•μ²  (HBI)": 42,
141
+ "ν”ΌνŠΈ (Peat)": 43,
142
+ "μ„ μ² ": 44,
143
+ "μ›μœ ": 45,
144
+ "μ‚°μ†Œ μ œκ°•λ‘œ κ°€μŠ€": 46,
145
+ "μ—΄μœ μž…": 47,
146
+ "μ ˆμ‚­μΉ©": 48,
147
+ "아역청탄": 49,
148
+ "λ§ˆκ·Έλ„€μ‚¬μ΄νŠΈ": 50,
149
+ "μ„μœ  μ½”ν¬μŠ€": 51,
150
+ "νŽ λ ›": 52,
151
+ "μ˜€λ¦¬λ©€μ „": 53,
152
+ "μ•‘ν™” μ„μœ κ°€μŠ€": 54,
153
+ "λ“±μœ ": 55,
154
+ "μ†Œμ„±κ°€μŠ€": 56,
155
+ "에탄": 57,
156
+ "μ‚°ν™”μΉΌμŠ˜": 58,
157
+ "λ‚˜ν”„νƒ€": 59,
158
+ "μ² ": 60,
159
+ "λŠ₯μ² κ΄‘": 61,
160
+ "μ†Œκ²°κ΄‘": 62,
161
+ "고온 μ„±ν˜• ν™˜μ›μ² ": 63,
162
+ "휘발유": 64,
163
+ "νƒ„μ‚°μŠ€νŠΈλ‘ νŠ¬": 65
164
+ }
165
+ }
inference.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import numpy as np
3
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
4
+ import pickle
5
+ import json
6
+ import os
7
+
8
+ class SteelMaterialClassifier:
9
+ def __init__(self, model_path):
10
+ """
11
+ Initialize the steel material classifier
12
+
13
+ Args:
14
+ model_path: Path to the model directory
15
+ """
16
+ self.model_path = model_path
17
+ self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
18
+
19
+ # Load model and tokenizer
20
+ self.tokenizer = AutoTokenizer.from_pretrained(model_path)
21
+ self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
22
+ self.model.to(self.device)
23
+ self.model.eval()
24
+
25
+ # Load additional components
26
+ self._load_additional_components()
27
+
28
+ def _load_additional_components(self):
29
+ """Load classifier and label embeddings if they exist"""
30
+ try:
31
+ # Load classifier if exists
32
+ classifier_path = os.path.join(self.model_path, "classifier.pkl")
33
+ if os.path.exists(classifier_path):
34
+ with open(classifier_path, 'rb') as f:
35
+ self.classifier = pickle.load(f)
36
+ else:
37
+ self.classifier = None
38
+
39
+ # Load label embeddings if exists
40
+ embeddings_path = os.path.join(self.model_path, "label_embeddings.pkl")
41
+ if os.path.exists(embeddings_path):
42
+ with open(embeddings_path, 'rb') as f:
43
+ self.label_embeddings = pickle.load(f)
44
+ else:
45
+ self.label_embeddings = None
46
+
47
+ except Exception as e:
48
+ print(f"Warning: Could not load additional components: {e}")
49
+ self.classifier = None
50
+ self.label_embeddings = None
51
+
52
+ def predict(self, text, top_k=5):
53
+ """
54
+ Predict steel material classification
55
+
56
+ Args:
57
+ text: Input text to classify
58
+ top_k: Number of top predictions to return
59
+
60
+ Returns:
61
+ dict: Prediction results with labels and probabilities
62
+ """
63
+ # Tokenize input
64
+ inputs = self.tokenizer(
65
+ text,
66
+ return_tensors="pt",
67
+ truncation=True,
68
+ max_length=512,
69
+ padding=True
70
+ )
71
+ inputs = {k: v.to(self.device) for k, v in inputs.items()}
72
+
73
+ # Get model predictions
74
+ with torch.no_grad():
75
+ outputs = self.model(**inputs)
76
+ logits = outputs.logits
77
+ probabilities = torch.nn.functional.softmax(logits, dim=-1)
78
+
79
+ # Get top-k predictions
80
+ top_probs, top_indices = torch.topk(probabilities, top_k, dim=1)
81
+
82
+ # Convert to results
83
+ results = []
84
+ for i in range(top_k):
85
+ label_id = top_indices[0][i].item()
86
+ probability = top_probs[0][i].item()
87
+ label = self.model.config.id2label[label_id]
88
+
89
+ results.append({
90
+ "label": label,
91
+ "label_id": label_id,
92
+ "probability": probability
93
+ })
94
+
95
+ return {
96
+ "predictions": results,
97
+ "input_text": text,
98
+ "model_info": {
99
+ "model_name": self.model.config._name_or_path,
100
+ "num_labels": self.model.config.num_labels,
101
+ "device": str(self.device)
102
+ }
103
+ }
104
+
105
+ def predict_batch(self, texts, top_k=5):
106
+ """
107
+ Predict for multiple texts
108
+
109
+ Args:
110
+ texts: List of input texts
111
+ top_k: Number of top predictions to return
112
+
113
+ Returns:
114
+ list: List of prediction results
115
+ """
116
+ results = []
117
+ for text in texts:
118
+ result = self.predict(text, top_k)
119
+ results.append(result)
120
+ return results
121
+
122
+ def get_label_info(self):
123
+ """
124
+ Get information about all available labels
125
+
126
+ Returns:
127
+ dict: Label information
128
+ """
129
+ return {
130
+ "num_labels": self.model.config.num_labels,
131
+ "id2label": self.model.config.id2label,
132
+ "label2id": self.model.config.label2id
133
+ }
134
+
135
+ # Example usage
136
+ if __name__ == "__main__":
137
+ # Initialize classifier
138
+ model_path = "." # Current directory
139
+ classifier = SteelMaterialClassifier(model_path)
140
+
141
+ # Example predictions
142
+ test_texts = [
143
+ "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •",
144
+ "μ²œμ—°κ°€μŠ€λ₯Ό μ—°λ£Œλ‘œ μ‚¬μš©ν•˜μ—¬ 고둜λ₯Ό κ°€μ—΄",
145
+ "μ„νšŒμ„μ„ μ²¨κ°€ν•˜μ—¬ 슬래그λ₯Ό ν˜•μ„±"
146
+ ]
147
+
148
+ print("=== Steel Material Classification Results ===")
149
+ for text in test_texts:
150
+ result = classifier.predict(text)
151
+ print(f"\nInput: {text}")
152
+ print(f"Top prediction: {result['predictions'][0]['label']} ({result['predictions'][0]['probability']:.4f})")
153
+
154
+ # Show top 3 predictions
155
+ print("Top 3 predictions:")
156
+ for i, pred in enumerate(result['predictions'][:3]):
157
+ print(f" {i+1}. {pred['label']}: {pred['probability']:.4f}")
label_embeddings.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80277db7a3eb26fca6c66c48e4410ca6f591cfc7242e698cddf8ed13ae583026
3
+ size 206147
label_mapping.json ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "점결탄": 0,
3
+ "μ‚°ν™”λ§ˆκ·Έλ„€μŠ˜": 1,
4
+ "였븐 μ½”ν¬μŠ€": 2,
5
+ "μ½œνƒ€λ₯΄": 3,
6
+ "직접 ν™˜μ›μ² ": 4,
7
+ "μΌμ‚°ν™”νƒ„μ†Œ": 5,
8
+ "μ²œμ—°κ°€μŠ€": 6,
9
+ "κ°ˆνƒ„": 7,
10
+ "페트둀 및 SBP": 8,
11
+ "μ—­μ²­": 9,
12
+ "λƒ‰κ°μˆ˜": 10,
13
+ "κ°•μ² ": 11,
14
+ "μ„νšŒμ„": 12,
15
+ "산업폐기물": 13,
16
+ "메탄": 14,
17
+ "고둜 슬래그": 15,
18
+ "철 슀크랩": 16,
19
+ "λΆ„μ§„": 17,
20
+ "μœ€ν™œμœ ": 18,
21
+ "μ•‘ν™”μ„μœ κ°€μŠ€": 19,
22
+ "κ°•μ²  슀크랩": 20,
23
+ "νƒ„μ‚°λ¦¬νŠ¬": 21,
24
+ "경유": 22,
25
+ "μž”λ₯˜ μ—°λ£Œμœ ": 23,
26
+ "μ „κΈ°": 24,
27
+ "무연탄": 25,
28
+ "였일 셰일": 26,
29
+ "철광석": 27,
30
+ "νƒ„μ‚°μˆ˜μ†Œλ‚˜νŠΈλ₯¨": 28,
31
+ "탄산바λ₯¨": 29,
32
+ "포μž₯재": 30,
33
+ "μ•‘ν™” μ²œμ—°κ°€μŠ€": 31,
34
+ "μŠ¬λŸ¬μ§€": 32,
35
+ "μ†Œλ‹€νšŒ": 33,
36
+ "μ‚°ν™”λ°”λ₯¨": 34,
37
+ "κ°€μŠ€κ³΅μž₯ κ°€μŠ€": 35,
38
+ "폐유": 36,
39
+ "EAF νƒ„μ†Œ μ „κ·Ή": 37,
40
+ "μ••μ—° μŠ€μΌ€μΌ": 38,
41
+ "μ½”ν¬μŠ€ 였븐 κ°€μŠ€": 39,
42
+ "EAF μΆ©μ „ νƒ„μ†Œ": 40,
43
+ "κ³ λ‘œκ°€μŠ€": 41,
44
+ "μ—΄κ°„μ„±ν˜•μ²  (HBI)": 42,
45
+ "ν”ΌνŠΈ (Peat)": 43,
46
+ "μ„ μ² ": 44,
47
+ "μ›μœ ": 45,
48
+ "μ‚°μ†Œ μ œκ°•λ‘œ κ°€μŠ€": 46,
49
+ "μ—΄μœ μž…": 47,
50
+ "μ ˆμ‚­μΉ©": 48,
51
+ "아역청탄": 49,
52
+ "λ§ˆκ·Έλ„€μ‚¬μ΄νŠΈ": 50,
53
+ "μ„μœ  μ½”ν¬μŠ€": 51,
54
+ "νŽ λ ›": 52,
55
+ "μ˜€λ¦¬λ©€μ „": 53,
56
+ "μ•‘ν™” μ„μœ κ°€μŠ€": 54,
57
+ "λ“±μœ ": 55,
58
+ "μ†Œμ„±κ°€μŠ€": 56,
59
+ "에탄": 57,
60
+ "μ‚°ν™”μΉΌμŠ˜": 58,
61
+ "λ‚˜ν”„νƒ€": 59,
62
+ "μ² ": 60,
63
+ "λŠ₯μ² κ΄‘": 61,
64
+ "μ†Œκ²°κ΄‘": 62,
65
+ "고온 μ„±ν˜• ν™˜μ›μ² ": 63,
66
+ "휘발유": 64,
67
+ "νƒ„μ‚°μŠ€νŠΈλ‘ νŠ¬": 65
68
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa9f78463531db7ec98f441bf5676f517c701cc4554814198711c1b465e9c3b8
3
+ size 1112197096
model_card.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Model Card for Steel Material Classification
2
+
3
+ ## Model Description
4
+
5
+ This model is designed to classify steel industry materials and products based on text descriptions. It uses XLM-RoBERTa as the base model and can classify input text into 66 different steel-related categories.
6
+
7
+ - **Developed by:** [Your Name/Organization]
8
+ - **Model type:** Text Classification
9
+ - **Language(s):** Korean, English (multilingual)
10
+ - **License:** [Your License]
11
+ - **Finetuned from model:** xlm-roberta-base
12
+
13
+ ## Intended Uses & Limitations
14
+
15
+ ### Intended Uses
16
+
17
+ This model is intended to be used for:
18
+ - Classifying steel industry materials from text descriptions
19
+ - Supporting LCA (Life Cycle Assessment) analysis in steel manufacturing
20
+ - Automating material categorization in steel industry documentation
21
+
22
+ ### Limitations
23
+
24
+ - The model is specifically trained for steel industry materials and may not perform well on other domains
25
+ - Performance may vary with different text styles or technical terminology
26
+ - The model requires Korean or English text input
27
+
28
+ ## Training and Evaluation Data
29
+
30
+ ### Training Data
31
+
32
+ The model was trained on steel industry material descriptions and technical documents, focusing on Korean and English text related to steel manufacturing processes.
33
+
34
+ ### Evaluation Data
35
+
36
+ [Add information about evaluation data]
37
+
38
+ ## Training Results
39
+
40
+ ### Training Infrastructure
41
+
42
+ [Add training infrastructure details]
43
+
44
+ ### Training Results
45
+
46
+ - **Label Independence**: Good (average similarity: 0.1166)
47
+ - **Orthogonality**: Good (average dot product: 0.2043)
48
+ - **Overall Assessment**: The model shows good separation between different material categories
49
+
50
+ ## Environmental Impact
51
+
52
+ [Add environmental impact information]
53
+
54
+ ## Citation
55
+
56
+ [Add citation information]
57
+
58
+ ## Glossary
59
+
60
+ - **LCA**: Life Cycle Assessment
61
+ - **Steel Industry Materials**: Raw materials, fuels, gases, products, and by-products used in steel manufacturing
62
+ - **XLM-RoBERTa**: Cross-lingual language model based on RoBERTa architecture
preprocessor.py ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
3
+ import numpy as np
4
+
5
+ def preprocess_function(examples, tokenizer, max_length=512):
6
+ """
7
+ Preprocess text data for the steel material classification model
8
+
9
+ Args:
10
+ examples: Dataset examples containing text
11
+ tokenizer: Tokenizer instance
12
+ max_length: Maximum sequence length
13
+
14
+ Returns:
15
+ dict: Tokenized inputs
16
+ """
17
+ # Tokenize the texts
18
+ result = tokenizer(
19
+ examples["text"],
20
+ truncation=True,
21
+ padding="max_length",
22
+ max_length=max_length,
23
+ return_tensors="pt"
24
+ )
25
+
26
+ return result
27
+
28
+ def postprocess_function(predictions, id2label):
29
+ """
30
+ Postprocess model predictions
31
+
32
+ Args:
33
+ predictions: Raw model predictions
34
+ id2label: Mapping from label IDs to label names
35
+
36
+ Returns:
37
+ dict: Processed predictions with labels and probabilities
38
+ """
39
+ # Convert logits to probabilities
40
+ probabilities = torch.nn.functional.softmax(torch.tensor(predictions), dim=-1)
41
+
42
+ # Get top predictions
43
+ top_probs, top_indices = torch.topk(probabilities, k=5, dim=1)
44
+
45
+ results = []
46
+ for i in range(len(predictions)):
47
+ sample_results = []
48
+ for j in range(5):
49
+ label_id = top_indices[i][j].item()
50
+ probability = top_probs[i][j].item()
51
+ label = id2label[label_id]
52
+
53
+ sample_results.append({
54
+ "label": label,
55
+ "label_id": label_id,
56
+ "probability": probability
57
+ })
58
+ results.append(sample_results)
59
+
60
+ return results
61
+
62
+ def validate_input(text):
63
+ """
64
+ Validate input text for classification
65
+
66
+ Args:
67
+ text: Input text to validate
68
+
69
+ Returns:
70
+ bool: True if valid, False otherwise
71
+ """
72
+ if not isinstance(text, str):
73
+ return False
74
+
75
+ if len(text.strip()) == 0:
76
+ return False
77
+
78
+ if len(text) > 1000: # Reasonable limit for steel material descriptions
79
+ return False
80
+
81
+ return True
82
+
83
+ def clean_text(text):
84
+ """
85
+ Clean and normalize input text
86
+
87
+ Args:
88
+ text: Raw input text
89
+
90
+ Returns:
91
+ str: Cleaned text
92
+ """
93
+ # Remove extra whitespace
94
+ text = " ".join(text.split())
95
+
96
+ # Normalize Korean characters (if needed)
97
+ # Add any specific text cleaning rules here
98
+
99
+ return text.strip()
100
+
101
+ # Example usage
102
+ if __name__ == "__main__":
103
+ # Load tokenizer
104
+ tokenizer = AutoTokenizer.from_pretrained(".")
105
+
106
+ # Example preprocessing
107
+ example_texts = [
108
+ "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •",
109
+ "μ²œμ—°κ°€μŠ€λ₯Ό μ—°λ£Œλ‘œ μ‚¬μš©ν•˜μ—¬ 고둜λ₯Ό κ°€μ—΄",
110
+ "μ„νšŒμ„μ„ μ²¨κ°€ν•˜μ—¬ 슬래그λ₯Ό ν˜•μ„±"
111
+ ]
112
+
113
+ # Clean and validate texts
114
+ cleaned_texts = []
115
+ for text in example_texts:
116
+ if validate_input(text):
117
+ cleaned_text = clean_text(text)
118
+ cleaned_texts.append(cleaned_text)
119
+
120
+ # Preprocess
121
+ examples = {"text": cleaned_texts}
122
+ tokenized = preprocess_function(examples, tokenizer)
123
+
124
+ print("=== Preprocessing Example ===")
125
+ print(f"Input texts: {cleaned_texts}")
126
+ print(f"Tokenized shape: {tokenized['input_ids'].shape}")
127
+ print(f"Attention mask shape: {tokenized['attention_mask'].shape}")
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ torch>=1.9.0
2
+ transformers>=4.35.0
3
+ numpy>=1.21.0
4
+ scikit-learn>=1.0.0
5
+ scipy>=1.7.0
6
+ matplotlib>=3.5.0
7
+ seaborn>=0.11.0
8
+ pandas>=1.3.0
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1cc44ad7faaeec47241864835473fd5403f2da94673f3f764a77ebcb0a803ec
3
+ size 17083009
tokenizer_config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 512,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "tokenizer_class": "XLMRobertaTokenizer",
53
+ "unk_token": "<unk>"
54
+ }
usage.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Steel Material Classification Model
2
+
3
+ ## Quick Start
4
+
5
+ ```python
6
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
7
+ import torch
8
+
9
+ # Load model
10
+ model_name = "your-username/steel-material-classifier"
11
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
12
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
13
+
14
+ # Predict
15
+ text = "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •"
16
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
17
+
18
+ with torch.no_grad():
19
+ outputs = model(**inputs)
20
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
21
+ predicted_class = torch.argmax(predictions, dim=1).item()
22
+
23
+ label = model.config.id2label[predicted_class]
24
+ confidence = predictions[0][predicted_class].item()
25
+ print(f"Predicted: {label} (Confidence: {confidence:.4f})")
26
+ ```
27
+
28
+ ## Model Information
29
+
30
+ - **Base Model**: XLM-RoBERTa
31
+ - **Task**: Sequence Classification
32
+ - **Labels**: 66 steel industry materials
33
+ - **Languages**: Korean, English
34
+ - **Model Size**: ~1GB
35
+
36
+ ## Supported Labels
37
+
38
+ The model can classify 66 different steel industry materials including:
39
+
40
+ - **Raw Materials**: 철광석, μ„νšŒμ„, μ„μœ  μ½”ν¬μŠ€, 무연탄, κ°ˆνƒ„
41
+ - **Fuels**: μ²œμ—°κ°€μŠ€, μ•‘ν™”μ²œμ—°κ°€μŠ€, 경유, 휘발유, λ“±μœ 
42
+ - **Gases**: μΌμ‚°ν™”νƒ„μ†Œ, 메탄, 에탄, κ³ λ‘œκ°€μŠ€, μ½”ν¬μŠ€ 였븐 κ°€μŠ€
43
+ - **Products**: κ°•μ² , μ„ μ² , μ² , μ—΄κ°„μ„±ν˜•μ²  (HBI), 고온 μ„±ν˜• ν™˜μ›μ² 
44
+ - **By-products**: 고둜 슬래그, μ••μ—° μŠ€μΌ€μΌ, λΆ„μ§„, μŠ¬λŸ¬μ§€, μ ˆμ‚­μΉ©
45
+ - **Others**: μ „κΈ°, λƒ‰κ°μˆ˜, μœ€ν™œμœ , 포μž₯재, μ—΄μœ μž…
46
+
47
+ ## Performance
48
+
49
+ - **Label Independence**: Good (average similarity: 0.1166)
50
+ - **Orthogonality**: Good (average dot product: 0.2043)
51
+ - **Overall Assessment**: The model shows good separation between different material categories
52
+
53
+ ## Usage Examples
54
+
55
+ ### Single Prediction
56
+ ```python
57
+ text = "μ²œμ—°κ°€μŠ€λ₯Ό μ—°λ£Œλ‘œ μ‚¬μš©ν•˜μ—¬ 고둜λ₯Ό κ°€μ—΄"
58
+ # Returns: "μ²œμ—°κ°€μŠ€" with confidence score
59
+ ```
60
+
61
+ ### Batch Prediction
62
+ ```python
63
+ texts = [
64
+ "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •",
65
+ "μ„νšŒμ„μ„ μ²¨κ°€ν•˜μ—¬ 슬래그λ₯Ό ν˜•μ„±"
66
+ ]
67
+ # Returns: ["철광석", "μ„νšŒμ„"] with confidence scores
68
+ ```
69
+
70
+ ## Installation
71
+
72
+ ```bash
73
+ pip install torch transformers
74
+ ```
75
+
76
+ ## License
77
+
78
+ [Add your license information]
79
+
80
+ ## Citation
81
+
82
+ If you use this model in your research, please cite:
83
+
84
+ ```bibtex
85
+ [Add citation information here]
86
+ ```
μ›Ήμ‚¬μ΄νŠΈ_μ—…λ‘œλ“œ_κ°€μ΄λ“œ.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ν—ˆκΉ…νŽ˜μ΄μŠ€ μ›Ήμ‚¬μ΄νŠΈ μ—…λ‘œλ“œ 방법
2
+
3
+ ## 1단계: ν—ˆκΉ…νŽ˜μ΄μŠ€ 계정 생성/둜그인
4
+ 1. https://huggingface.co 에 접속
5
+ 2. νšŒμ›κ°€μž… λ˜λŠ” 둜그인
6
+
7
+ ## 2단계: μƒˆ λͺ¨λΈ μ €μž₯μ†Œ 생성
8
+ 1. 우츑 μƒλ‹¨μ˜ "New" λ²„νŠΌ 클릭
9
+ 2. "Model" 선택
10
+ 3. μ €μž₯μ†Œ 이름 μž…λ ₯: `steel-material-classifier`
11
+ 4. "Create repository" 클릭
12
+
13
+ ## 3단계: 파일 μ—…λ‘œλ“œ
14
+ 1. μƒμ„±λœ μ €μž₯μ†Œ νŽ˜μ΄μ§€μ—μ„œ "Files and versions" νƒ­ 클릭
15
+ 2. "Add file" β†’ "Upload files" 클릭
16
+ 3. λ‹€μŒ νŒŒμΌλ“€μ„ λͺ¨λ‘ μ„ νƒν•˜μ—¬ μ—…λ‘œλ“œ:
17
+
18
+ ### ν•„μˆ˜ νŒŒμΌλ“€:
19
+ - `config.json`
20
+ - `model.safetensors`
21
+ - `tokenizer.json`
22
+ - `tokenizer_config.json`
23
+ - `special_tokens_map.json`
24
+ - `label_mapping.json`
25
+
26
+ ### μΆ”κ°€ νŒŒμΌλ“€:
27
+ - `classifier.pkl`
28
+ - `label_embeddings.pkl`
29
+ - `label_embeddings.pkl.backup`
30
+ - `README.md`
31
+ - `requirements.txt`
32
+ - `inference.py`
33
+ - `preprocessor.py`
34
+ - `model_card.md`
35
+ - `usage.md`
36
+
37
+ ## 4단계: 컀밋 λ©”μ‹œμ§€ μž‘μ„±
38
+ - "Commit message"에 "Initial commit: Steel material classification model" μž…λ ₯
39
+ - "Commit changes to main" 클릭
40
+
41
+ ## 5단계: λͺ¨λΈ 정보 μ„€μ •
42
+ 1. μ €μž₯μ†Œ νŽ˜μ΄μ§€μ—μ„œ "Settings" νƒ­ 클릭
43
+ 2. "Model Card" μ„Ήμ…˜μ—μ„œ λͺ¨λΈ 정보 μˆ˜μ •:
44
+ - License: μ μ ˆν•œ λΌμ΄μ„ μŠ€ 선택
45
+ - Model Card: model_card.md λ‚΄μš© μ°Έκ³ ν•˜μ—¬ μž‘μ„±
46
+
47
+ ## 6단계: μ‚¬μš© ν…ŒμŠ€νŠΈ
48
+ μ—…λ‘œλ“œ μ™„λ£Œ ν›„ λ‹€μŒ μ½”λ“œλ‘œ ν…ŒμŠ€νŠΈ:
49
+
50
+ ```python
51
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
52
+ import torch
53
+
54
+ # λͺ¨λΈ λ‘œλ“œ
55
+ model_name = "YOUR_USERNAME/steel-material-classifier"
56
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
57
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
58
+
59
+ # 예츑 ν…ŒμŠ€νŠΈ
60
+ text = "철광석을 κ³ λ‘œμ—μ„œ ν™˜μ›ν•˜μ—¬ 선철을 μ œμ‘°ν•˜λŠ” κ³Όμ •"
61
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
62
+
63
+ with torch.no_grad():
64
+ outputs = model(**inputs)
65
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
66
+ predicted_class = torch.argmax(predictions, dim=1).item()
67
+
68
+ label = model.config.id2label[predicted_class]
69
+ confidence = predictions[0][predicted_class].item()
70
+ print(f"Predicted: {label} (Confidence: {confidence:.4f})")
71
+ ```
72
+
73
+ ## μ£Όμ˜μ‚¬ν•­
74
+ - 파일 크기가 큰 경우 μ—…λ‘œλ“œμ— μ‹œκ°„μ΄ 걸릴 수 μžˆμŠ΅λ‹ˆλ‹€
75
+ - `model.safetensors` 파일이 μ•½ 1GBμ΄λ―€λ‘œ μ•ˆμ •μ μΈ 인터넷 연결이 ν•„μš”ν•©λ‹ˆλ‹€
76
+ - μ—…λ‘œλ“œ μ€‘μ—λŠ” λΈŒλΌμš°μ €λ₯Ό λ‹«μ§€ λ§ˆμ„Έμš”