Question 1

Continuous data is a numerical data that can take on a range of values, such as height, weight and length. The magnitude and difference between numbers are important. Ordinal data is numerical data with order or scale; what’s important for them is only the order, not the difference between values. Examples include the order in which people accomplish something, satisfaction rate, education level, etc. Arithmetic calculations between ordinal data are meaningless. Nominal data are catagorical data like countries, gender, race, etc.

Model Example: In a model predicting a person’s health index, the features are this person’s weight, amount of daily sugar ingestion in grams and anount of sleep per day. These features are continuous data that can be taken on a range of values. The health index is the target that indicates the person’s health level. This index is an ordinal data. Based on each individual’s index, different people are grouped into different types: obese, overweight, normal, and underweight, which are catagorical data.

Question 2

Normal Distribution:

Mean: 0.5008942053275507
Median: 0.5025070488362504
norm

Left Skewed:

Mean: 0.8334088906226477
Median: 0.8678073027102118
lsk

Right Skewed:

Mean: 0.16748919817019833
Median: 0.1260774776528926
rsk

Question 3

Raw Data

LifeExp

Logarithmic Transformation

log

I think the logarithmic transformed one is better when comparing these two graphs. Unlike the raw graph that has many ups and downs, the transformed graph clearly shows that an increasing number of countries are having higher life expectancy. The difference in life expectancy distribution between 1952 and 2007 is clearly highlighted, which makes this graph easier for people to interpret.

Question 4

Original plot

Figure_1

Logarithmic Transformation

log

Compared with the logarithmic transformed graph, the first grpah is much more illegible since what we are interested in are the major changes in population rather than the few outliers. The tranformed graph not only shows those few outliers, but also clearly presents the median and quartiles, which better represent the change in population across these years.