Lecture 1¶
Classification - Data to classes
Regression - Predicting a numeric value
Clustering
Different types of problems¶
Classification Problem - MNIST Dataset
Regression - Predicting stock value
Clustering
- Automatically identify the data
Data integration¶
-
Data are created independently
-
A higher-level abstraction
Statical analysis¶
Collecting data¶
Collecting, exploring and presenting large amounts of data to discover underlying patterns and trends
Data come in two types: - Discrete - Continuous
We have - barchart - piechart, Stem-and-leaf plot - Scatterplot ( it uses caresian coordinates to display values for two variables for set of data) - Form - Direction
Numerical descriptive measures of data
(Central tendency) - Mean - Min - Max - Median - Mode
A sampling method is a procudure for selecting sample elements from a population.
Relationship between variables:¶
-
Eyeball fit: Fit two points on the plot so that the line passing through them fives a fairly good fit.
-
Least square fit: Fit a line \(y = a + bX\) such that it minimaizes the error S
-
Correlation coefficient, denoted as r, measures the degree to which two variables movements are associated.
- r = 1 means perfect positive relationship
- r = 1 means a perfect negative relationship
- r = 0 means no relationship
Forecasting¶
- An experiment is an action where the result is uncertain
- A sample space is all the possible outomes of an experiment, denoted as \(S\).
- A event is a subset of S
Probability: is the measure of how likely an event is to occur out of the number of possible outcomes.
$p = \frac{The\ number\ of outcomes}{sample space} $
Parameters¶
- Sample can be generated by a probability model, where parameters are characteristics of the model
Variance¶
-
Variance is another parameter of probability model
-
It is a measure of how spread out it is
Statical analysis¶
Collecting, exploring and presenting large amounts of data to discover underlying patterns and