CST383 - Module 5
What did I learn in the fifth week of CST383? This week covered machine learning topics, with the core idea being classification using k-Nearest Neighbors. The theme was to understand how to preprocess and evaluate the steps needed for k-Nearest, while handling missing data, scaling, and assessing performance. The most surprising aspect this week was how cross-validation solves the problem of wasting training data, as it allows you to estimate a future performance by validating across k iterations by rotating folds. Furthermore, the way you can get an accuracy estimate for the test set through these folds was both really interesting and somewhat difficult to understand, as it requires you to understand why through k iterations that a test set is organized according to validation folds and then through a mean across k iterations. A concept I am still unsure about is the relationship between precision and recall. Basically, I understand that precision measures how often positive predicti...