CST383 - Module 4

What did I learn in the fourth week of CST383?

This week covered visualizing multiple variables and going into two discrete variables, while making the core idea about choosing the correct type of plot for a data set, which would be based on the nature of the data instead of personal preference.

The most surprising aspects about this week were the new visualization plotting types like violin, bar() and crosstab(). Violin wasn't gone over too deeply but the other two types of plotting do bars just like a histogram would, but with a distinct difference in the visualization of data, as bar() is mostly used for visualizing the individual sections a part of that data as a whole (e.i. 0-25), and crosstab() combines the bar() to provide more details on it based on individual aspects instead.

A concept I am still unsure about is the right visualization for certain data, as I do understand that continuous has 3 (box, density, & hist), while discrete is mostly attributed to bar plots or crosstab, but there really wasn't much to go on when connecting one continuous to one discrete, thus I simply connected it to the new lecture material, which I hope was as close to what it should be.

Some ideas and questions I had when going over the labs was that having a visual rule to distinguish what to choose for normalize=’’ would make crosstab decisions quicker and that scatterplots should have a practical limit as it would be too confusing to look at. In the end, what is the intended approach for visualizing one continuous and one discrete variable, and is there a standard way to handle this in a real world data science setting?

Comments

Popular Posts