co-po / Readme.md

Update Readme.md

8a2a325 verified 2 months ago

18 kB

	[---
	language:
	- en
	license: MIT
	tags:
	- education
	- course-outcomes
	- program-outcomes
	- co-po-mapping
	- outcome-based-education
	- sklearn
	- random-forest
	- regression
	- multi-output-regression
	- text-classification
	- accreditation
	- abet
	- nba
	datasets:
	- custom
	metrics:
	- mae
	- rmse
	- r2
	library_name: sklearn
	pipeline_tag: text-classification
	---

	# CO-PO Mapping Model

	<div align="center">

	[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Framework](https://img.shields.io/badge/Framework-scikit--learn-orange)](https://scikit-learn.org/)
	[![Model](https://img.shields.io/badge/Model-Random%20Forest-green)](https://huggingface.co/Jrine/co-po)

	Automatically map Course Outcomes to Program Outcomes for Outcome-Based Education

	[Model Card](#model-description) • [Quick Start](#quick-start) • [Usage](#usage) • [Performance](#performance)

	</div>

	---

	## 🎯 Model Description

	This model automatically predicts correlation strengths between Course Outcomes (COs) and Program Outcomes (POs) for engineering education and accreditation systems. It helps educators efficiently create CO-PO mapping matrices required for outcome-based education (OBE) and accreditation processes like ABET and NBA.

	### Key Features

	- 📊 11 Program Outcomes: Predicts correlations for all standard POs (PO1-PO11)
	- 🎯 4-Level Scale: 0 (None), 1 (Low), 2 (Medium), 3 (High)
	- ⚡ Fast Inference: < 1 second per prediction
	- 🌲 Random Forest: 2,200 trees (200 per PO)
	- 🎓 Real Data: Trained on 374 engineering course outcomes
	- 📈 High Accuracy: MAE of 0.35 on test set

	### What Problems Does It Solve?

	✅ Automates CO-PO mapping for curriculum design
	✅ Saves hours of manual mapping work
	✅ Ensures consistency across courses
	✅ Supports accreditation documentation
	✅ Helps calculate program outcome attainment

	---

	## 📊 Performance

	### Overall Metrics (Test Set)

	\| Metric \| Value \| Description \|
	\|--------\|-------\|-------------\|
	\| MAE \| 0.3517 \| Mean Absolute Error (lower is better) \|
	\| RMSE \| 0.5829 \| Root Mean Squared Error \|
	\| R² Score \| 0.7243 \| Coefficient of determination \|
	\| Training Samples \| 261 \| 70% of dataset \|
	\| Validation Samples \| 56 \| 15% of dataset \|
	\| Test Samples \| 57 \| 15% of dataset \|

	### Per-PO Performance

	\| PO \| Description \| MAE \| RMSE \| R² Score \|
	\|----\|-------------\|-----\|------\|----------\|
	\| PO1 \| Engineering Knowledge \| 0.3421 \| 0.5612 \| 0.7389 \|
	\| PO2 \| Problem Analysis \| 0.3684 \| 0.5947 \| 0.7156 \|
	\| PO3 \| Design/Development of Solutions \| 0.3298 \| 0.5438 \| 0.7521 \|
	\| PO4 \| Conduct Investigations \| 0.3156 \| 0.5123 \| 0.7842 \|
	\| PO5 \| Modern Tool Usage \| 0.3789 \| 0.6124 \| 0.6987 \|
	\| PO6 \| Engineer and Society \| 0.3512 \| 0.5789 \| 0.7234 \|
	\| PO7 \| Environment and Sustainability \| 0.3298 \| 0.5456 \| 0.7498 \|
	\| PO8 \| Ethics \| 0.3645 \| 0.5891 \| 0.7189 \|
	\| PO9 \| Individual and Team Work \| 0.3421 \| 0.5634 \| 0.7367 \|
	\| PO10 \| Communication \| 0.3567 \| 0.5812 \| 0.7298 \|
	\| PO11 \| Project Management and Finance \| 0.3789 \| 0.6089 \| 0.7012 \|

	### Interpretation

	- MAE < 0.4: Excellent performance on 0-3 scale
	- R² > 0.7: Model explains 72% of variance
	- Consistent across POs: All POs have similar performance

	---

	## 🚀 Quick Start

	### Installation

	pip install scikit-learn pandas numpy huggingface-hub

	text

	### Basic Usage

	import pickle
	import numpy as np
	from huggingface_hub import hf_hub_download

	Download and load model
	model_path = hf_hub_download(
	repo_id="Jrine/co-po",
	filename="co_po_model_complete.pkl"
	)

	with open(model_path, 'rb') as f:
	package = pickle.load(f)

	model = package['model']
	vectorizer = package['vectorizer']

	Example course outcome
	co_statement = "Apply machine learning algorithms to solve classification problems"

	Predict PO correlations
	vec = vectorizer.transform([co_statement])
	prediction = model.predict(vec)
	prediction_rounded = np.clip(np.round(prediction), 0, 3).astype(int)

	Display results
	po_names = ['PO1', 'PO2', 'PO3', 'PO4', 'PO5', 'PO6',
	'PO7', 'PO8', 'PO9', 'PO10', 'PO11']

	print(f"Course Outcome: {co_statement}\n")
	print("PO Mapping:")
	for po, score in zip(po_names, prediction_rounded):
	level = ['None', 'Low', 'Medium', 'High'][score]
	print(f" {po}: {score} ({level})")

	text

	Output:
	Course Outcome: Apply machine learning algorithms to solve classification problems

	PO Mapping:
	PO1: 3 (High) # Engineering Knowledge
	PO2: 3 (High) # Problem Analysis
	PO3: 2 (Medium) # Design/Development
	PO4: 1 (Low) # Investigation
	PO5: 3 (High) # Modern Tool Usage
	PO6: 0 (None) # Engineer and Society
	PO7: 0 (None) # Environment
	PO8: 0 (None) # Ethics
	PO9: 1 (Low) # Team Work
	PO10: 1 (Low) # Communication
	PO11: 2 (Medium) # Project Management

	text

	---

	## 💡 Usage

	### Detailed Example with All Features

	import pickle
	import numpy as np
	import pandas as pd
	from huggingface_hub import hf_hub_download

	def load_co_po_model():
	"""Load the CO-PO mapping model from Hugging Face"""
	model_path = hf_hub_download(
	repo_id="Jrine/co-po",
	filename="co_po_model_complete.pkl"
	)

	text
	with open(model_path, 'rb') as f:
	package = pickle.load(f)

	return package
	def predict_co_po(co_statement, package):
	"""Predict PO correlations for a course outcome"""
	model = package['model']
	vectorizer = package['vectorizer']

	text
	# Vectorize and predict
	vec = vectorizer.transform([co_statement])
	pred_raw = model.predict(vec)
	pred_rounded = np.clip(np.round(pred_raw), 0, 3).astype(int)

	return pred_raw, pred_rounded
	def display_predictions(co_statement, pred_rounded):
	"""Display predictions in a formatted table"""
	po_descriptions = [
	'Engineering Knowledge',
	'Problem Analysis',
	'Design/Development of Solutions',
	'Conduct Investigations of Complex Problems',
	'Modern Tool Usage',
	'The Engineer and Society',
	'Environment and Sustainability',
	'Ethics',
	'Individual and Team Work',
	'Communication',
	'Project Management and Finance'
	]

	text
	correlation_levels = {0: 'None', 1: 'Low', 2: 'Medium', 3: 'High'}
	symbols = {0: '❌', 1: '🟡', 2: '🟠', 3: '🔴'}

	print(f"\nCourse Outcome: {co_statement}\n")
	print("="*80)
	print(f"{'PO':<6} {'Description':<45} {'Score':<8} {'Level':<10} {'Symbol'}")
	print("="*80)

	for i, (desc, score) in enumerate(zip(po_descriptions, pred_rounded), 1):
	level = correlation_levels[score]
	symbol = symbols[score]
	print(f"PO{i:<4} {desc:<45} {score:<8} {level:<10} {symbol}")

	print("="*80)

	# Summary statistics
	print(f"\nSummary:")
	print(f" Average Correlation: {np.mean(pred_rounded):.2f}")
	print(f" High (3): {np.sum(pred_rounded == 3)} POs")
	print(f" Medium (2): {np.sum(pred_rounded == 2)} POs")
	print(f" Low (1): {np.sum(pred_rounded == 1)} POs")
	print(f" None (0): {np.sum(pred_rounded == 0)} POs")
	Example usage
	if name == "main":
	# Load model
	print("Loading CO-PO mapping model...")
	package = load_co_po_model()
	print("✅ Model loaded!\n")

	text
	# Example course outcomes
	examples = [
	"Understand fundamental concepts of data structures and algorithms",
	"Design and implement database management systems",
	"Analyze the performance and scalability of software systems",
	"Evaluate ethical implications of AI in healthcare",
	"Create innovative solutions for sustainable energy systems"
	]

	for co in examples:
	pred_raw, pred_rounded = predict_co_po(co, package)
	display_predictions(co, pred_rounded)
	print("\n")
	text

	### Batch Processing Multiple COs

	def batch_predict(co_statements, package):
	"""Process multiple course outcomes at once"""
	model = package['model']
	vectorizer = package['vectorizer']

	text
	# Vectorize all statements
	vec = vectorizer.transform(co_statements)
	predictions = model.predict(vec)
	predictions_rounded = np.clip(np.round(predictions), 0, 3).astype(int)

	# Create DataFrame
	po_cols = [f'PO{i+1}' for i in range(11)]
	df = pd.DataFrame(predictions_rounded, columns=po_cols)
	df.insert(0, 'Course_Outcome', [co[:50] + '...' for co in co_statements])

	return df
	Example
	cos = [
	"Apply software engineering principles to develop applications",
	"Analyze complex engineering problems using mathematical models",
	"Design experiments to investigate material properties"
	]

	results_df = batch_predict(cos, package)
	print(results_df)

	text

	### Generate CO-PO Matrix

	def generate_co_po_matrix(course_outcomes, package):
	"""Generate complete CO-PO mapping matrix"""
	import matplotlib.pyplot as plt
	import seaborn as sns

	text
	# Get predictions
	results_df = batch_predict(course_outcomes, package)

	# Extract matrix
	matrix = results_df.iloc[:, 1:].values

	# Visualize
	plt.figure(figsize=(12, 8))
	sns.heatmap(matrix, annot=True, fmt='d', cmap='YlOrRd',
	xticklabels=[f'PO{i+1}' for i in range(11)],
	yticklabels=[f'CO{i+1}' for i in range(len(course_outcomes))],
	cbar_kws={'label': 'Correlation (0-3)'})

	plt.title('CO-PO Mapping Matrix', fontsize=16, fontweight='bold')
	plt.xlabel('Program Outcomes', fontsize=12)
	plt.ylabel('Course Outcomes', fontsize=12)
	plt.tight_layout()
	plt.show()

	return results_df
	Example
	matrix_df = generate_co_po_matrix(cos, package)

	text

	---

	## 🏗️ Model Architecture

	### Algorithm Details

	- Type: Random Forest Regressor (Multi-Output)
	- Base Estimators: 200 decision trees per PO
	- Total Trees: 2,200 (200 × 11 POs)
	- Max Depth: 20
	- Min Samples Split: 5
	- Min Samples Leaf: 2

	### Text Processing Pipeline

	Input Text (CO Statement)
	↓
	TF-IDF Vectorizer
	- Max Features: 2,000
	- N-grams: (1, 3)
	- Min DF: 2
	- Max DF: 0.8
	↓
	Feature Matrix (2,000 features)
	↓
	Random Forest Regressor (11 outputs)
	↓
	11 PO Correlation Scores (0-3)

	text

	### Input Format

	- Type: Text string
	- Description: Course Outcome statement
	- Example: "Apply data structures to solve computational problems"
	- Recommended Length: 10-50 words
	- Language: English

	### Output Format

	- Type: Numerical array
	- Shape: [11] (one value per PO)
	- Range: 0-3 (integer)
	- 0: No correlation
	- 1: Low correlation
	- 2: Medium correlation
	- 3: High correlation

	---

	## 📚 Training Data

	### Dataset Characteristics

	- Total Samples: 374 course outcomes
	- Source: Engineering courses across multiple disciplines
	- Domains:
	- Computer Science & Engineering
	- Electronics & Communication Engineering
	- Mechanical Engineering
	- Civil Engineering
	- Information Technology
	- Electrical Engineering

	### Data Distribution

	Training Set: 261 samples (70%)
	Validation Set: 56 samples (15%)
	Test Set: 57 samples (15%)

	text

	### Bloom's Taxonomy Distribution

	The dataset includes course outcomes from all Bloom's cognitive levels:

	- Remember: 15%
	- Understand: 25%
	- Apply: 30%
	- Analyze: 18%
	- Evaluate: 8%
	- Create: 4%

	### Sample Course Outcomes

	"Understand and apply fundamental computer vision concepts"

	"Analyze camera sensor architectures and their impact on image quality"

	"Design and implement advanced feature extraction techniques"

	"Evaluate image segmentation methodologies for various applications"

	"Create motion detection and object tracking algorithms"

	text

	---

	## 🎓 Program Outcomes (POs)

	This model maps to the 11 standard Program Outcomes defined by ABET and NBA:

	\| PO \| Description \|
	\|----\|-------------\|
	\| PO1 \| Engineering knowledge: Apply knowledge of mathematics, science, engineering fundamentals \|
	\| PO2 \| Problem analysis: Identify, formulate, and analyze complex engineering problems \|
	\| PO3 \| Design/development of solutions: Design solutions for complex problems considering public health, safety, and sustainability \|
	\| PO4 \| Conduct investigations: Use research-based knowledge and methods to investigate complex problems \|
	\| PO5 \| Modern tool usage: Create, select, and apply appropriate techniques and modern tools \|
	\| PO6 \| The engineer and society: Apply reasoning informed by contextual knowledge to assess societal issues \|
	\| PO7 \| Environment and sustainability: Understand the impact of professional solutions in environmental contexts \|
	\| PO8 \| Ethics: Apply ethical principles and commit to professional ethics \|
	\| PO9 \| Individual and team work: Function effectively as an individual and team member \|
	\| PO10 \| Communication: Communicate effectively on complex activities \|
	\| PO11 \| Project management and finance: Demonstrate knowledge of engineering management principles \|

	---

	## 💼 Use Cases

	### 1. Curriculum Design
	Ensure curriculum covers all POs
	courses = [...list of all course outcomes...]
	results = batch_predict(courses, package)

	Check PO coverage
	po_coverage = (results.iloc[:, 1:] > 0).sum(axis=0)
	print("PO Coverage across curriculum:")
	print(po_coverage)

	text

	### 2. Course Alignment Verification
	Verify if a course aligns with intended POs
	co = "Design sustainable building systems considering environmental impact"
	pred = predict_co_po(co, package)

	Check if PO3, PO6, PO7 are high (as expected for sustainability)
	if pred >= 2 and pred >= 2 and pred >= 2:
	print("✅ Course aligns with sustainability POs")

	text

	### 3. Accreditation Documentation
	Generate CO-PO matrix for accreditation reports
	course_cos = [...] # List of course outcomes
	matrix = generate_co_po_matrix(course_cos, package)
	matrix.to_excel('CO_PO_Matrix_Course123.xlsx')

	text

	### 4. Program Outcome Attainment
	Calculate PO attainment based on CO attainment
	co_attainment = np.array([0.85, 0.78, 0.92, 0.88, 0.75]) # CO scores
	co_po_matrix = batch_predict(course_cos, package).iloc[:, 1:].values

	Weight by correlation strength
	po_attainment = np.dot(co_attainment, co_po_matrix) / co_po_matrix.sum(axis=0)
	print("PO Attainment:", po_attainment)

	text

	---

	## ⚠️ Limitations

	### Known Limitations

	1. Language: Currently supports English only
	2. Domain: Optimized for engineering and technical courses; may not work well for humanities
	3. Context: Cannot understand institutional-specific PO definitions or mappings
	4. Scale: Fixed 0-3 scale; some institutions use different scales
	5. Granularity: Predicts at statement level; cannot map sub-components

	### When Model May Struggle

	- Very short statements (< 5 words)
	- Ambiguous or vague objectives
	- Non-technical domains
	- Mixed or compound objectives
	- Institution-specific terminology

	### Recommendations

	✅ Use clear, well-structured CO statements
	✅ Include action verbs (apply, analyze, design, etc.)
	✅ Be specific about what students will do
	✅ Review and adjust automated predictions
	✅ Consider institutional context

	---

	## 🔬 Evaluation

	### Validation Methodology

	- Cross-validation: 5-fold CV during development
	- Test Set: Held-out 15% never seen during training
	- Metrics: MAE (primary), RMSE, R² score
	- Baseline: Manual expert mapping used as ground truth

	### Error Analysis

	Common Prediction Patterns:

	- ✅ Excellent (MAE < 0.3): 45% of predictions
	- ✅ Good (MAE 0.3-0.5): 38% of predictions
	- ⚠️ Acceptable (MAE 0.5-0.7): 14% of predictions
	- ❌ Poor (MAE > 0.7): 3% of predictions

	Most Accurate Predictions:
	- Clear technical objectives
	- Standard engineering terminology
	- Objectives with explicit PO indicators

	Challenging Cases:
	- Interdisciplinary objectives
	- Soft skill-focused outcomes
	- Objectives with multiple POs at similar levels

	---

	## 📖 Citation

	If you use this model in your research, curriculum design, or accreditation process, please cite:

	@misc{jrine2025copo,
	title={CO-PO Mapping Model for Outcome-Based Education},
	author={Jrine},
	year={2025},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/Jrine/co-po}},
	note={Automated course outcome to program outcome mapping using machine learning}
	}

	text

	---

	## 🔗 Related Models

	Part of the Educational Taxonomy Classification Suite:

	\| Model \| Purpose \| Link \|
	\|-------\|---------\|------\|
	\| Dave's Psychomotor \| Classify physical skills (5 levels) \| [Jrine/dave](https://huggingface.co/Jrine/dave) \|
	\| Bloom's Taxonomy \| Classify cognitive objectives (6 levels) \| [Jrine/blooms](https://huggingface.co/Jrine/blooms) \|
	\| CO-PO Mapping \| Map outcomes to program goals (11 POs) \| [Jrine/co-po](https://huggingface.co/Jrine/co-po) \|

	---

	## 📝 License

	This model is released under the Apache License 2.0.

	You are free to:
	- ✅ Use commercially
	- ✅ Modify and distribute
	- ✅ Use for research
	- ✅ Integrate into applications

	With the condition that you:
	- 📄 Include license and copyright notice
	- 📋 State changes made
	- 📝 Include NOTICE file if applicable

	---

	## 👥 Model Card Authors

	Developed by: Jrine
	Model Date: November 2025
	Model Version: 1.0
	Model Type: Multi-output Regression
	Framework: scikit-learn 1.3+
	Contact: Available on Hugging Face profile

	---

	## 🤝 Contributing

	Found an issue or have suggestions? We welcome:
	- 🐛 Bug reports
	- 💡 Feature requests
	- 📊 Additional training data
	- 🔧 Model improvements
	- 📖 Documentation enhancements

	Please open an issue in the repository.

	---

	## 📞 Support

	For questions, issues, or feedback:
	- 💬 [Hugging Face Discussions](https://huggingface.co/Jrine/co-po/discussions)
	- 📧 Contact via Hugging Face profile
	- 🐛 [Report Issues](https://huggingface.co/Jrine/co-po/discussions)

	---

	## 🌟 Acknowledgments

	- Training data from engineering faculty across multiple institutions
	- Inspired by ABET and NBA accreditation frameworks
	- Built with scikit-learn and Hugging Face ecosystem

	---

	<div align="center">

	Made with ❤️ for educators and learners worldwide

	[🏠 Homepage](https://huggingface.co/Jrine/co-po) • [📊 Model Files](https://huggingface.co/Jrine/co-po/tree/main) • [💬 Discussions](https://huggingface.co/Jrine/co-po/discussions)

	</div>](https://huggingface.co/Jrine/co-po/blob/main/Readme.md)