--- license: gpl-3.0 language: - en --- # Model Card for RoBERTa-OrgCulture-Classifier Fischer et al. (2014) showed that organizational practices are best measured in three dimensions: employee orientation, formalization practices, and innovation practices. **Employee orientation** assesses the balance between employees' interests and the organization's. **Formalization practices** are based on balancing employees' independence to organize their work and the need for control and centralization. **Innovation practices** assess the balance between stability and change. This model is a RoBERTa-based multi-label classifier for identifying organizational practices in text. It predicts the salience of these three types of organizational practices. ## Paper Fischer, R., Ferreira, M. C., Assmar, E. M. L., Baris, G., Berberoglu, G., Dalyan, F., Wong, C. C., Hassan, A., Hanke, K., & Boer, D. (2014). Organizational practices across cultures: An exploration in six cultural contexts. *International Journal of Cross Cultural Management*, *14*(1), 105-125. https://doi.org/10.1177/1470595813510644 ## Model Details ### Model Description - **Developed by:** M. Murat Ardag - **Shared via:** Hugging Face - **Model type:** Multi-label Text Classification - **Language(s) (NLP):** English - **License:** GPL-3.0 - **Finetuned from model:** roberta-base ## Uses ### Direct Use The model can be used to analyze text data (e.g., company reviews, internal documents, company mission and vision statements) and identify the types of organizational practices mentioned. ### Downstream Use [optional] This model could be integrated into larger HR analytics or organizational culture assessment tools. ### Out-of-Scope Use The model is not designed for sentiment analysis, topic modeling, or other NLP tasks outside of multi-label classification of organizational practices. This model should not be used for: - Classifying text in languages other than English - Making decisions about individuals or organizations without human oversight ## Bias, Risks, and Limitations The model's performance may vary across different industries, company sizes, and cultural contexts. It may also be sensitive to the specific wording used in the text. Additionally, the model could perpetuate biases present in the training data. ### Recommendations Users should exercise caution when interpreting the model's predictions and consider the potential biases and limitations. It is recommended to use the model as one tool in a broader assessment of organizational culture, alongside other qualitative and quantitative methods. ## How to Get Started with the Model Example usage ```python import torch from transformers import RobertaTokenizer, RobertaForSequenceClassification import json # Load the model, tokenizer, and configuration model_path = "MMADS/RoBERTa-OrgCulture-Classifier" model = RobertaForSequenceClassification.from_pretrained(model_path) tokenizer = RobertaTokenizer.from_pretrained(model_path) # Load label names with open(f"{model_path}/label_names.json", 'r') as f: label_names = json.load(f) # Function to make predictions def predict(text): # Tokenize the input text inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) # Make prediction with torch.no_grad(): outputs = model(**inputs) # Apply sigmoid to get probabilities probabilities = torch.sigmoid(outputs.logits).squeeze().numpy() # Get predictions (1 if probability > 0.5, else 0) predictions = (probabilities > 0.5).astype(int) # Create a dictionary of label predictions result = {label: pred for label, pred in zip(label_names, predictions)} return result, probabilities # Example usage text_to_predict = "Testing model predictions for organizational practices." prediction, probabilities = predict(text_to_predict) print("Predictions:") for label, pred in prediction.items(): print(f"{label}: {'Yes' if pred == 1 else 'No'}") print("\nProbabilities:") for label, prob in zip(label_names, probabilities): print(f"{label}: {prob:.4f}") ``` ## Training Details ### Training Data The model was trained on sentences labeled with three types of organizational practices (employee orientation, formalization practices, and innovation practices). The data was preprocessed to remove missing values and convert text to strings. The data is a subset of >1.3M sentences from employee reviews and >16K sentences from company mission and vision statements. ### Training Procedure #### Preprocessing - Sentences were tokenized using the RoBERTa tokenizer - Texts were truncated and padded to a fixed length #### Training Hyperparameters - **epochs:** 10 - **batch_size:** 8 - **warmup_steps:** 500 - **weight_decay:** 0.1 - **learning_rate:** Not specified (using default AdamW optimizer) - **label_smoothing:** 0.1 ## Evaluation ### Testing Data, Factors & Metrics The model was evaluated on a held-out test set (20% of the original data) using the following metrics: - Accuracy - F1-score - Precision - Recall ### Results - **Accuracy:** 0.98 - **F1-score:** 0.97 - **Precision:** 0.98 - **Recall:** 0.97 ## Environmental Impact ## Environmental Impact **Minimal** Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** Google Colab GPU - **Hours used:** 8 - **Cloud Provider:** Google - **Compute Region:** South Carolina ## Model Card Authors M. Murat Ardag ## Model Card Contact via my personal website. thx ## Citation ***If you use this model in your research or applications, please cite it as follows:*** Ardag, M.M. (2024) RoBERTa-OrgCulture-Classifier (Revision 94b6fdd). HuggingFace. https://doi.org/10.57967/hf/2774 https://doi.org/10.57967/hf/2794