| # Component 1: Project Setup (Windows + RTX 4060 8GB) |
|
|
| ## What This Component Does |
| - Creates a clean folder structure for the full coding-assistant project. |
| - Sets up a Python virtual environment. |
| - Installs all core dependencies needed across Components 2-10. |
| - Verifies that Python, PyTorch, CUDA visibility, and key libraries work. |
|
|
| ## Folder Structure Created |
| - `data/raw` -> raw datasets you will provide later |
| - `data/interim` -> temporary cleaned data |
| - `data/processed` -> training-ready tokenized data |
| - `data/external` -> any third-party resources |
| - `src/tokenizer` -> Component 2 code tokenizer |
| - `src/dataset_pipeline` -> Component 3 preprocessing pipeline |
| - `src/model_architecture` -> Component 4 transformer code |
| - `src/training_pipeline` -> Component 5 training loop |
| - `src/evaluation_system` -> Component 6 evaluation code |
| - `src/inference_engine` -> Component 7 inference code |
| - `src/chat_interface` -> Component 8 Gradio interface |
| - `src/finetuning_system` -> Component 9 LoRA fine-tuning |
| - `src/export_optimization` -> Component 10 quantization/export tools |
| - `configs` -> config files for all components |
| - `scripts` -> setup, verification, and utility scripts |
| - `tests` -> quick checks for each component |
| - `checkpoints` -> model checkpoints saved during training |
| - `models/base` -> base trained model files |
| - `models/lora` -> LoRA adapters |
| - `models/quantized` -> optimized quantized models |
| - `artifacts` -> generated reports, metrics, and outputs |
| - `logs` -> training and runtime logs |
|
|
| ## Exact Commands To Run (in this order) |
| Run from: |
| `D:\Desktop 31st Jan 2026\MIND-AI-MODEL` |
|
|
| 0. Install Python 3.11 (required for package compatibility): |
| - Download page: https://www.python.org/downloads/release/python-3119/ |
| - Windows installer file: `python-3.11.9-amd64.exe` |
| - During install, check: `Add python.exe to PATH` |
|
|
| 1. Allow script execution for this terminal only: |
| ```powershell |
| Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass |
| ``` |
|
|
| 2. If you already attempted setup once, remove old virtual environment first: |
| ```powershell |
| if (Test-Path .\.venv) { Remove-Item -Recurse -Force .\.venv } |
| ``` |
|
|
| 3. Create folders, virtual env, install dependencies: |
| ```powershell |
| .\scripts\setup_windows_environment.ps1 |
| ``` |
|
|
| 4. Activate virtual environment: |
| ```powershell |
| .\.venv\Scripts\Activate.ps1 |
| ``` |
|
|
| 5. Verify setup: |
| ```powershell |
| python .\scripts\verify_component1_setup.py |
| ``` |
|
|
| ## Expected Verification Result |
| - Prints Python version |
| - Prints PyTorch version |
| - Shows whether CUDA is available |
| - Shows GPU name if available |
| - Confirms critical libraries import correctly |
|
|
| Note: |
| - `codebleu` is excluded from base install on Windows due to a `tree-sitter` dependency conflict on Python 3.11. |
| - Component 6 will use Windows-stable evaluation metrics and add code-quality checks without breaking setup. |
| - `bitsandbytes` is optional on native Windows because some CUDA/driver combinations fail to load its DLL. |
| - Base setup and all early components continue without it. |
| - For Component 5, we will: |
| - try `bitsandbytes` if available, and |
| - automatically fall back to a stable optimizer on your machine if it is not. |
|
|
| If verification fails, copy the full terminal output and share it with me. |
|
|