File size: 3,211 Bytes
53f0cc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Component 1: Project Setup (Windows + RTX 4060 8GB)

## What This Component Does
- Creates a clean folder structure for the full coding-assistant project.
- Sets up a Python virtual environment.
- Installs all core dependencies needed across Components 2-10.
- Verifies that Python, PyTorch, CUDA visibility, and key libraries work.

## Folder Structure Created
- `data/raw` -> raw datasets you will provide later
- `data/interim` -> temporary cleaned data
- `data/processed` -> training-ready tokenized data
- `data/external` -> any third-party resources
- `src/tokenizer` -> Component 2 code tokenizer
- `src/dataset_pipeline` -> Component 3 preprocessing pipeline
- `src/model_architecture` -> Component 4 transformer code
- `src/training_pipeline` -> Component 5 training loop
- `src/evaluation_system` -> Component 6 evaluation code
- `src/inference_engine` -> Component 7 inference code
- `src/chat_interface` -> Component 8 Gradio interface
- `src/finetuning_system` -> Component 9 LoRA fine-tuning
- `src/export_optimization` -> Component 10 quantization/export tools
- `configs` -> config files for all components
- `scripts` -> setup, verification, and utility scripts
- `tests` -> quick checks for each component
- `checkpoints` -> model checkpoints saved during training
- `models/base` -> base trained model files
- `models/lora` -> LoRA adapters
- `models/quantized` -> optimized quantized models
- `artifacts` -> generated reports, metrics, and outputs
- `logs` -> training and runtime logs

## Exact Commands To Run (in this order)
Run from:
`D:\Desktop 31st Jan 2026\MIND-AI-MODEL`

0. Install Python 3.11 (required for package compatibility):
- Download page: https://www.python.org/downloads/release/python-3119/
- Windows installer file: `python-3.11.9-amd64.exe`
- During install, check: `Add python.exe to PATH`

1. Allow script execution for this terminal only:
```powershell
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
```

2. If you already attempted setup once, remove old virtual environment first:
```powershell
if (Test-Path .\.venv) { Remove-Item -Recurse -Force .\.venv }
```

3. Create folders, virtual env, install dependencies:
```powershell
.\scripts\setup_windows_environment.ps1
```

4. Activate virtual environment:
```powershell
.\.venv\Scripts\Activate.ps1
```

5. Verify setup:
```powershell
python .\scripts\verify_component1_setup.py
```

## Expected Verification Result
- Prints Python version
- Prints PyTorch version
- Shows whether CUDA is available
- Shows GPU name if available
- Confirms critical libraries import correctly

Note:
- `codebleu` is excluded from base install on Windows due to a `tree-sitter` dependency conflict on Python 3.11.
- Component 6 will use Windows-stable evaluation metrics and add code-quality checks without breaking setup.
- `bitsandbytes` is optional on native Windows because some CUDA/driver combinations fail to load its DLL.
- Base setup and all early components continue without it.
- For Component 5, we will:
  - try `bitsandbytes` if available, and
  - automatically fall back to a stable optimizer on your machine if it is not.

If verification fails, copy the full terminal output and share it with me.