Halfotter commited on Sep 4, 2025

Commit

14ebc37

verified ·

1 Parent(s): ab6a2a2

Upload 16 files

Browse files

Files changed (17) hide show

.gitattributes +1 -0
README.md +73 -3
UPLOAD_GUIDE.md +100 -0
classifier.pkl +3 -0
config.json +165 -0
inference.py +157 -0
label_embeddings.pkl +3 -0
label_mapping.json +68 -0
model.safetensors +3 -0
model_card.md +62 -0
preprocessor.py +127 -0
requirements.txt +8 -0
special_tokens_map.json +15 -0
tokenizer.json +3 -0
tokenizer_config.json +54 -0
usage.md +86 -0
웹사이트_업로드_가이드.md +76 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,73 @@
----
-license: mit
----

+# Steel Industry Material Classification Model
+This model is trained to classify steel industry materials and products based on text descriptions. It uses XLM-RoBERTa as the base model and can classify input text into 66 different steel-related categories.
+## Model Details
+- **Base Model**: XLM-RoBERTa
+- **Task**: Sequence Classification
+- **Number of Labels**: 66
+- **Languages**: Korean, English (multilingual support)
+- **Model Size**: ~1GB
+## Supported Labels
+The model can classify the following steel industry materials:
+- Raw Materials: 철광석, 석회석, 석유 코크스, 무연탄, 갈탄, 아역청탄, 피트 (Peat), 오일 셰일
+- Fuels: 천연가스, 액화천연가스, 경유, 휘발유, 등유, 나프타, 페트롤 및 SBP, 잔류 연료유
+- Gases: 일산화탄소, 메탄, 에탄, 고로가스, 코크스 오븐 가스, 산소 제강로 가스, 소성가스, 가스공장 가스
+- Products: 강철, 선철, 철, 열간성형철 (HBI), 고온 성형 환원철, 직접 환원철
+- By-products: 고로 슬래그, 압연 스케일, 분진, 슬러지, 절삭칩
+- Others: 전기, 냉각수, 윤활유, 포장재, 열유입, 오리멀전, 펠렛
+## Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model and tokenizer
+model_name = "your-username/steel-material-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Prepare input
+text = "철광석을 고로에서 환원하여 선철을 제조하는 과정"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=1).item()
+# Get label
+label = model.config.id2label[predicted_class]
+confidence = predictions[0][predicted_class].item()
+print(f"Predicted: {label}")
+print(f"Confidence: {confidence:.4f}")
+```
+## Training Data
+The model was trained on steel industry material descriptions and technical documents, focusing on Korean and English text related to steel manufacturing processes.
+## Performance
+- **Label Independence**: Good (average similarity: 0.1166)
+- **Orthogonality**: Good (average dot product: 0.2043)
+- **Overall Assessment**: The model shows good separation between different material categories
+## License
+[Add your license information here]
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+[Add citation information here]
+```

UPLOAD_GUIDE.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# Steel Material Classification Model Upload Guide
+## Step 1: Get Hugging Face Token
+1. Go to https://huggingface.co/settings/tokens
+2. Click "New token"
+3. Give it a name (e.g., "model-upload-token")
+4. Select "Write" role
+5. Copy the token
+## Step 2: Login to Hugging Face
+```bash
+huggingface-cli login
+# Enter your token when prompted
+```
+## Step 3: Create Model Repository
+```bash
+huggingface-cli repo create steel-material-classifier --type model
+```
+## Step 4: Upload Model
+```bash
+# Clone the repository
+git clone https://huggingface.co/YOUR_USERNAME/steel-material-classifier
+cd steel-material-classifier
+# Copy all files from model_v24 directory
+# Then commit and push
+git add .
+git commit -m "Initial commit: Steel material classification model"
+git push
+```
+## Alternative: Direct Upload
+```bash
+# From the model_v24 directory
+huggingface-cli upload YOUR_USERNAME/steel-material-classifier . --include "*.json,*.safetensors,*.pkl,*.md,*.txt,*.py"
+```
+## Files to Upload
+### Required Files:
+- ✅ config.json
+- ✅ model.safetensors
+- ✅ tokenizer.json
+- ✅ tokenizer_config.json
+- ✅ special_tokens_map.json
+- ✅ label_mapping.json
+### Optional Files:
+- ✅ classifier.pkl
+- ✅ label_embeddings.pkl
+- ✅ label_embeddings.pkl.backup
+### Documentation Files:
+- ✅ README.md
+- ✅ requirements.txt
+- ✅ inference.py
+- ✅ preprocessor.py
+- ✅ model_card.md
+- ✅ usage.md
+## Model Information
+- **Model Name**: steel-material-classifier
+- **Base Model**: XLM-RoBERTa
+- **Task**: Sequence Classification
+- **Labels**: 66 steel industry materials
+- **Languages**: Korean, English
+- **Model Size**: ~1GB
+## Usage After Upload
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model
+model_name = "YOUR_USERNAME/steel-material-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Predict
+text = "철광석을 고로에서 환원하여 선철을 제조하는 과정"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=1).item()
+label = model.config.id2label[predicted_class]
+confidence = predictions[0][predicted_class].item()
+print(f"Predicted: {label} (Confidence: {confidence:.4f})")
+```

classifier.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf2ac4313a1006caa5b470331fcddcf7dd2d368e5822b1c4df3d3926929c8a5e
+size 204311

config.json ADDED Viewed

	@@ -0,0 +1,165 @@

+{
+  "_name_or_path": "xlm-roberta-base",
+  "architectures": [
+    "XLMRobertaForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "classifier_dropout": 0.1,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "xlm-roberta",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "num_labels": 66,
+  "output_past": true,
+  "pad_token_id": 1,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
+  "type_vocab_size": 1,
+  "use_cache": true,
+  "vocab_size": 250002,
+  "id2label": {
+    "0": "점결탄",
+    "1": "산화마그네슘",
+    "2": "오븐 코크스",
+    "3": "콜타르",
+    "4": "직접 환원철",
+    "5": "일산화탄소",
+    "6": "천연가스",
+    "7": "갈탄",
+    "8": "페트롤 및 SBP",
+    "9": "역청",
+    "10": "냉각수",
+    "11": "강철",
+    "12": "석회석",
+    "13": "산업폐기물",
+    "14": "메탄",
+    "15": "고로 슬래그",
+    "16": "철 스크랩",
+    "17": "분진",
+    "18": "윤활유",
+    "19": "액화석유가스",
+    "20": "강철 스크랩",
+    "21": "탄산리튬",
+    "22": "경유",
+    "23": "잔류 연료유",
+    "24": "전기",
+    "25": "무연탄",
+    "26": "오일 셰일",
+    "27": "철광석",
+    "28": "탄산수소나트륨",
+    "29": "탄산바륨",
+    "30": "포장재",
+    "31": "액화 천연가스",
+    "32": "슬러지",
+    "33": "소다회",
+    "34": "산화바륨",
+    "35": "가스공장 가스",
+    "36": "폐유",
+    "37": "EAF 탄소 전극",
+    "38": "압연 스케일",
+    "39": "코크스 오븐 가스",
+    "40": "EAF 충전 탄소",
+    "41": "고로가스",
+    "42": "열간성형철 (HBI)",
+    "43": "피트 (Peat)",
+    "44": "선철",
+    "45": "원유",
+    "46": "산소 제강로 가스",
+    "47": "열유입",
+    "48": "절삭칩",
+    "49": "아역청탄",
+    "50": "마그네사이트",
+    "51": "석유 코크스",
+    "52": "펠렛",
+    "53": "오리멀전",
+    "54": "액화 석유가스",
+    "55": "등유",
+    "56": "소성가스",
+    "57": "에탄",
+    "58": "산화칼슘",
+    "59": "나프타",
+    "60": "철",
+    "61": "능철광",
+    "62": "소결광",
+    "63": "고온 성형 환원철",
+    "64": "휘발유",
+    "65": "탄산스트론튬"
+  },
+  "label2id": {
+    "점결탄": 0,
+    "산화마그네슘": 1,
+    "오븐 코크스": 2,
+    "콜타르": 3,
+    "직접 환원철": 4,
+    "일산화탄소": 5,
+    "천연가스": 6,
+    "갈탄": 7,
+    "페트롤 및 SBP": 8,
+    "역청": 9,
+    "냉각수": 10,
+    "강철": 11,
+    "석회석": 12,
+    "산업폐기물": 13,
+    "메탄": 14,
+    "고로 슬래그": 15,
+    "철 스크랩": 16,
+    "분진": 17,
+    "윤활유": 18,
+    "액화석유가스": 19,
+    "강철 스크랩": 20,
+    "탄산리튬": 21,
+    "경유": 22,
+    "잔류 연료유": 23,
+    "전기": 24,
+    "무연탄": 25,
+    "오일 셰일": 26,
+    "철광석": 27,
+    "탄산수소나트륨": 28,
+    "탄산바륨": 29,
+    "포장재": 30,
+    "액화 천연가스": 31,
+    "슬러지": 32,
+    "소다회": 33,
+    "산화바륨": 34,
+    "가스공장 가스": 35,
+    "폐유": 36,
+    "EAF 탄소 전극": 37,
+    "압연 스케일": 38,
+    "코크스 오븐 가스": 39,
+    "EAF 충전 탄소": 40,
+    "고로가스": 41,
+    "열간성형철 (HBI)": 42,
+    "피트 (Peat)": 43,
+    "선철": 44,
+    "원유": 45,
+    "산소 제강로 가스": 46,
+    "열유입": 47,
+    "절삭칩": 48,
+    "아역청탄": 49,
+    "마그네사이트": 50,
+    "석유 코크스": 51,
+    "펠렛": 52,
+    "오리멀전": 53,
+    "액화 석유가스": 54,
+    "등유": 55,
+    "소성가스": 56,
+    "에탄": 57,
+    "산화칼슘": 58,
+    "나프타": 59,
+    "철": 60,
+    "능철광": 61,
+    "소결광": 62,
+    "고온 성형 환원철": 63,
+    "휘발유": 64,
+    "탄산스트론튬": 65
+  }
+}

inference.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import torch
+import numpy as np
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import pickle
+import json
+import os
+class SteelMaterialClassifier:
+    def __init__(self, model_path):
+        """
+        Initialize the steel material classifier
+        Args:
+            model_path: Path to the model directory
+        """
+        self.model_path = model_path
+        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        # Load model and tokenizer
+        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
+        self.model.to(self.device)
+        self.model.eval()
+        # Load additional components
+        self._load_additional_components()
+    def _load_additional_components(self):
+        """Load classifier and label embeddings if they exist"""
+        try:
+            # Load classifier if exists
+            classifier_path = os.path.join(self.model_path, "classifier.pkl")
+            if os.path.exists(classifier_path):
+                with open(classifier_path, 'rb') as f:
+                    self.classifier = pickle.load(f)
+            else:
+                self.classifier = None
+            # Load label embeddings if exists
+            embeddings_path = os.path.join(self.model_path, "label_embeddings.pkl")
+            if os.path.exists(embeddings_path):
+                with open(embeddings_path, 'rb') as f:
+                    self.label_embeddings = pickle.load(f)
+            else:
+                self.label_embeddings = None
+        except Exception as e:
+            print(f"Warning: Could not load additional components: {e}")
+            self.classifier = None
+            self.label_embeddings = None
+    def predict(self, text, top_k=5):
+        """
+        Predict steel material classification
+        Args:
+            text: Input text to classify
+            top_k: Number of top predictions to return
+        Returns:
+            dict: Prediction results with labels and probabilities
+        """
+        # Tokenize input
+        inputs = self.tokenizer(
+            text,
+            return_tensors="pt",
+            truncation=True,
+            max_length=512,
+            padding=True
+        )
+        inputs = {k: v.to(self.device) for k, v in inputs.items()}
+        # Get model predictions
+        with torch.no_grad():
+            outputs = self.model(**inputs)
+            logits = outputs.logits
+            probabilities = torch.nn.functional.softmax(logits, dim=-1)
+        # Get top-k predictions
+        top_probs, top_indices = torch.topk(probabilities, top_k, dim=1)
+        # Convert to results
+        results = []
+        for i in range(top_k):
+            label_id = top_indices[0][i].item()
+            probability = top_probs[0][i].item()
+            label = self.model.config.id2label[label_id]
+            results.append({
+                "label": label,
+                "label_id": label_id,
+                "probability": probability
+            })
+        return {
+            "predictions": results,
+            "input_text": text,
+            "model_info": {
+                "model_name": self.model.config._name_or_path,
+                "num_labels": self.model.config.num_labels,
+                "device": str(self.device)
+            }
+        }
+    def predict_batch(self, texts, top_k=5):
+        """
+        Predict for multiple texts
+        Args:
+            texts: List of input texts
+            top_k: Number of top predictions to return
+        Returns:
+            list: List of prediction results
+        """
+        results = []
+        for text in texts:
+            result = self.predict(text, top_k)
+            results.append(result)
+        return results
+    def get_label_info(self):
+        """
+        Get information about all available labels
+        Returns:
+            dict: Label information
+        """
+        return {
+            "num_labels": self.model.config.num_labels,
+            "id2label": self.model.config.id2label,
+            "label2id": self.model.config.label2id
+        }
+# Example usage
+if __name__ == "__main__":
+    # Initialize classifier
+    model_path = "."  # Current directory
+    classifier = SteelMaterialClassifier(model_path)
+    # Example predictions
+    test_texts = [
+        "철광석을 고로에서 환원하여 선철을 제조하는 과정",
+        "천연가스를 연료로 사용하여 고로를 가열",
+        "석회석을 첨가하여 슬래그를 형성"
+    ]
+    print("=== Steel Material Classification Results ===")
+    for text in test_texts:
+        result = classifier.predict(text)
+        print(f"\nInput: {text}")
+        print(f"Top prediction: {result['predictions'][0]['label']} ({result['predictions'][0]['probability']:.4f})")
+        # Show top 3 predictions
+        print("Top 3 predictions:")
+        for i, pred in enumerate(result['predictions'][:3]):
+            print(f"  {i+1}. {pred['label']}: {pred['probability']:.4f}")

label_embeddings.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80277db7a3eb26fca6c66c48e4410ca6f591cfc7242e698cddf8ed13ae583026
+size 206147

label_mapping.json ADDED Viewed

	@@ -0,0 +1,68 @@

+{
+  "점결탄": 0,
+  "산화마그네슘": 1,
+  "오븐 코크스": 2,
+  "콜타르": 3,
+  "직접 환원철": 4,
+  "일산화탄소": 5,
+  "천연가스": 6,
+  "갈탄": 7,
+  "페트롤 및 SBP": 8,
+  "역청": 9,
+  "냉각수": 10,
+  "강철": 11,
+  "석회석": 12,
+  "산업폐기물": 13,
+  "메탄": 14,
+  "고로 슬래그": 15,
+  "철 스크랩": 16,
+  "분진": 17,
+  "윤활유": 18,
+  "액화석유가스": 19,
+  "강철 스크랩": 20,
+  "탄산리튬": 21,
+  "경유": 22,
+  "잔류 연료유": 23,
+  "전기": 24,
+  "무연탄": 25,
+  "오일 셰일": 26,
+  "철광석": 27,
+  "탄산수소나트륨": 28,
+  "탄산바륨": 29,
+  "포장재": 30,
+  "액화 천연가스": 31,
+  "슬러지": 32,
+  "소다회": 33,
+  "산화바륨": 34,
+  "가스공장 가스": 35,
+  "폐유": 36,
+  "EAF 탄소 전극": 37,
+  "압연 스케일": 38,
+  "코크스 오븐 가스": 39,
+  "EAF 충전 탄소": 40,
+  "고로가스": 41,
+  "열간성형철 (HBI)": 42,
+  "피트 (Peat)": 43,
+  "선철": 44,
+  "원유": 45,
+  "산소 제강로 가스": 46,
+  "열유입": 47,
+  "절삭칩": 48,
+  "아역청탄": 49,
+  "마그네사이트": 50,
+  "석유 코크스": 51,
+  "펠렛": 52,
+  "오리멀전": 53,
+  "액화 석유가스": 54,
+  "등유": 55,
+  "소성가스": 56,
+  "에탄": 57,
+  "산화칼슘": 58,
+  "나프타": 59,
+  "철": 60,
+  "능철광": 61,
+  "소결광": 62,
+  "고온 성형 환원철": 63,
+  "휘발유": 64,
+  "탄산스트론튬": 65
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa9f78463531db7ec98f441bf5676f517c701cc4554814198711c1b465e9c3b8
+size 1112197096

model_card.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Hugging Face Model Card for Steel Material Classification
+## Model Description
+This model is designed to classify steel industry materials and products based on text descriptions. It uses XLM-RoBERTa as the base model and can classify input text into 66 different steel-related categories.
+- **Developed by:** [Your Name/Organization]
+- **Model type:** Text Classification
+- **Language(s):** Korean, English (multilingual)
+- **License:** [Your License]
+- **Finetuned from model:** xlm-roberta-base
+## Intended Uses & Limitations
+### Intended Uses
+This model is intended to be used for:
+- Classifying steel industry materials from text descriptions
+- Supporting LCA (Life Cycle Assessment) analysis in steel manufacturing
+- Automating material categorization in steel industry documentation
+### Limitations
+- The model is specifically trained for steel industry materials and may not perform well on other domains
+- Performance may vary with different text styles or technical terminology
+- The model requires Korean or English text input
+## Training and Evaluation Data
+### Training Data
+The model was trained on steel industry material descriptions and technical documents, focusing on Korean and English text related to steel manufacturing processes.
+### Evaluation Data
+[Add information about evaluation data]
+## Training Results
+### Training Infrastructure
+[Add training infrastructure details]
+### Training Results
+- **Label Independence**: Good (average similarity: 0.1166)
+- **Orthogonality**: Good (average dot product: 0.2043)
+- **Overall Assessment**: The model shows good separation between different material categories
+## Environmental Impact
+[Add environmental impact information]
+## Citation
+[Add citation information]
+## Glossary
+- **LCA**: Life Cycle Assessment
+- **Steel Industry Materials**: Raw materials, fuels, gases, products, and by-products used in steel manufacturing
+- **XLM-RoBERTa**: Cross-lingual language model based on RoBERTa architecture

preprocessor.py ADDED Viewed

	@@ -0,0 +1,127 @@

+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import numpy as np
+def preprocess_function(examples, tokenizer, max_length=512):
+    """
+    Preprocess text data for the steel material classification model
+    Args:
+        examples: Dataset examples containing text
+        tokenizer: Tokenizer instance
+        max_length: Maximum sequence length
+    Returns:
+        dict: Tokenized inputs
+    """
+    # Tokenize the texts
+    result = tokenizer(
+        examples["text"],
+        truncation=True,
+        padding="max_length",
+        max_length=max_length,
+        return_tensors="pt"
+    )
+    return result
+def postprocess_function(predictions, id2label):
+    """
+    Postprocess model predictions
+    Args:
+        predictions: Raw model predictions
+        id2label: Mapping from label IDs to label names
+    Returns:
+        dict: Processed predictions with labels and probabilities
+    """
+    # Convert logits to probabilities
+    probabilities = torch.nn.functional.softmax(torch.tensor(predictions), dim=-1)
+    # Get top predictions
+    top_probs, top_indices = torch.topk(probabilities, k=5, dim=1)
+    results = []
+    for i in range(len(predictions)):
+        sample_results = []
+        for j in range(5):
+            label_id = top_indices[i][j].item()
+            probability = top_probs[i][j].item()
+            label = id2label[label_id]
+            sample_results.append({
+                "label": label,
+                "label_id": label_id,
+                "probability": probability
+            })
+        results.append(sample_results)
+    return results
+def validate_input(text):
+    """
+    Validate input text for classification
+    Args:
+        text: Input text to validate
+    Returns:
+        bool: True if valid, False otherwise
+    """
+    if not isinstance(text, str):
+        return False
+    if len(text.strip()) == 0:
+        return False
+    if len(text) > 1000:  # Reasonable limit for steel material descriptions
+        return False
+    return True
+def clean_text(text):
+    """
+    Clean and normalize input text
+    Args:
+        text: Raw input text
+    Returns:
+        str: Cleaned text
+    """
+    # Remove extra whitespace
+    text = " ".join(text.split())
+    # Normalize Korean characters (if needed)
+    # Add any specific text cleaning rules here
+    return text.strip()
+# Example usage
+if __name__ == "__main__":
+    # Load tokenizer
+    tokenizer = AutoTokenizer.from_pretrained(".")
+    # Example preprocessing
+    example_texts = [
+        "철광석을 고로에서 환원하여 선철을 제조하는 과정",
+        "천연가스를 연료로 사용하여 고로를 가열",
+        "석회석을 첨가하여 슬래그를 형성"
+    ]
+    # Clean and validate texts
+    cleaned_texts = []
+    for text in example_texts:
+        if validate_input(text):
+            cleaned_text = clean_text(text)
+            cleaned_texts.append(cleaned_text)
+    # Preprocess
+    examples = {"text": cleaned_texts}
+    tokenized = preprocess_function(examples, tokenizer)
+    print("=== Preprocessing Example ===")
+    print(f"Input texts: {cleaned_texts}")
+    print(f"Tokenized shape: {tokenized['input_ids'].shape}")
+    print(f"Attention mask shape: {tokenized['attention_mask'].shape}")

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch>=1.9.0
+transformers>=4.35.0
+numpy>=1.21.0
+scikit-learn>=1.0.0
+scipy>=1.7.0
+matplotlib>=3.5.0
+seaborn>=0.11.0
+pandas>=1.3.0

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f1cc44ad7faaeec47241864835473fd5403f2da94673f3f764a77ebcb0a803ec
+size 17083009

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,54 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "250001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>"
+}

usage.md ADDED Viewed

	@@ -0,0 +1,86 @@

+# Steel Material Classification Model
+## Quick Start
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load model
+model_name = "your-username/steel-material-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Predict
+text = "철광석을 고로에서 환원하여 선철을 제조하는 과정"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=1).item()
+label = model.config.id2label[predicted_class]
+confidence = predictions[0][predicted_class].item()
+print(f"Predicted: {label} (Confidence: {confidence:.4f})")
+```
+## Model Information
+- **Base Model**: XLM-RoBERTa
+- **Task**: Sequence Classification
+- **Labels**: 66 steel industry materials
+- **Languages**: Korean, English
+- **Model Size**: ~1GB
+## Supported Labels
+The model can classify 66 different steel industry materials including:
+- **Raw Materials**: 철광석, 석회석, 석유 코크스, 무연탄, 갈탄
+- **Fuels**: 천연가스, 액화천연가스, 경유, 휘발유, 등유
+- **Gases**: 일산화탄소, 메탄, 에탄, 고로가스, 코크스 오븐 가스
+- **Products**: 강철, 선철, 철, 열간성형철 (HBI), 고온 성형 환원철
+- **By-products**: 고로 슬래그, 압연 스케일, 분진, 슬러지, 절삭칩
+- **Others**: 전기, 냉각수, 윤활유, 포장재, 열유입
+## Performance
+- **Label Independence**: Good (average similarity: 0.1166)
+- **Orthogonality**: Good (average dot product: 0.2043)
+- **Overall Assessment**: The model shows good separation between different material categories
+## Usage Examples
+### Single Prediction
+```python
+text = "천연가스를 연료로 사용하여 고로를 가열"
+# Returns: "천연가스" with confidence score
+```
+### Batch Prediction
+```python
+texts = [
+    "철광석을 고로에서 환원하여 선철을 제조하는 과정",
+    "석회석을 첨가하여 슬래그를 형성"
+]
+# Returns: ["철광석", "석회석"] with confidence scores
+```
+## Installation
+```bash
+pip install torch transformers
+```
+## License
+[Add your license information]
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+[Add citation information here]
+```

웹사이트_업로드_가이드.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# 허깅페이스 웹사이트 업로드 방법
+## 1단계: 허깅페이스 계정 생성/로그인
+1. https://huggingface.co 에 접속
+2. 회원가입 또는 로그인
+## 2단계: 새 모델 저장소 생성
+1. 우측 상단의 "New" 버튼 클릭
+2. "Model" 선택
+3. 저장소 이름 입력: `steel-material-classifier`
+4. "Create repository" 클릭
+## 3단계: 파일 업로드
+1. 생성된 저장소 페이지에서 "Files and versions" 탭 클릭
+2. "Add file" → "Upload files" 클릭
+3. 다음 파일들을 모두 선택하여 업로드:
+### 필수 파일들:
+- `config.json`
+- `model.safetensors`
+- `tokenizer.json`
+- `tokenizer_config.json`
+- `special_tokens_map.json`
+- `label_mapping.json`
+### 추가 파일들:
+- `classifier.pkl`
+- `label_embeddings.pkl`
+- `label_embeddings.pkl.backup`
+- `README.md`
+- `requirements.txt`
+- `inference.py`
+- `preprocessor.py`
+- `model_card.md`
+- `usage.md`
+## 4단계: 커밋 메시지 작성
+- "Commit message"에 "Initial commit: Steel material classification model" 입력
+- "Commit changes to main" 클릭
+## 5단계: 모델 정보 설정
+1. 저장소 페이지에서 "Settings" 탭 클릭
+2. "Model Card" 섹션에서 모델 정보 수정:
+   - License: 적절한 라이선스 선택
+   - Model Card: model_card.md 내용 참고하여 작성
+## 6단계: 사용 테스트
+업로드 완료 후 다음 코드로 테스트:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# 모델 로드
+model_name = "YOUR_USERNAME/steel-material-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# 예측 테스트
+text = "철광석을 고로에서 환원하여 선철을 제조하는 과정"
+inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
+with torch.no_grad():
+    outputs = model(**inputs)
+    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class = torch.argmax(predictions, dim=1).item()
+label = model.config.id2label[predicted_class]
+confidence = predictions[0][predicted_class].item()
+print(f"Predicted: {label} (Confidence: {confidence:.4f})")
+```
+## 주의사항
+- 파일 크기가 큰 경우 업로드에 시간이 걸릴 수 있습니다
+- `model.safetensors` 파일이 약 1GB이므로 안정적인 인터넷 연결이 필요합니다
+- 업로드 중에는 브라우저를 닫지 마세요