Skip to content

Lecture 1

Classification - Data to classes

Regression - Predicting a numeric value

Clustering

Different types of problems

Classification Problem - MNIST Dataset

Regression - Predicting stock value

Clustering

  • Automatically identify the data

Data integration

  • Data are created independently

  • A higher-level abstraction

Statical analysis

Collecting data

Collecting, exploring and presenting large amounts of data to discover underlying patterns and trends

Data come in two types: - Discrete - Continuous

We have - barchart - piechart, Stem-and-leaf plot - Scatterplot ( it uses caresian coordinates to display values for two variables for set of data) - Form - Direction

Numerical descriptive measures of data

(Central tendency) - Mean - Min - Max - Median - Mode

A sampling method is a procudure for selecting sample elements from a population.

Relationship between variables:

  • Eyeball fit: Fit two points on the plot so that the line passing through them fives a fairly good fit.

  • Least square fit: Fit a line \(y = a + bX\) such that it minimaizes the error S

  • Correlation coefficient, denoted as r, measures the degree to which two variables movements are associated.

    • r = 1 means perfect positive relationship
    • r = 1 means a perfect negative relationship
    • r = 0 means no relationship

Forecasting

  • An experiment is an action where the result is uncertain
  • A sample space is all the possible outomes of an experiment, denoted as \(S\).
  • A event is a subset of S

Probability: is the measure of how likely an event is to occur out of the number of possible outcomes.

$p = \frac{The\ number\ of outcomes}{sample space} $

Parameters

  • Sample can be generated by a probability model, where parameters are characteristics of the model

Variance

  • Variance is another parameter of probability model

  • It is a measure of how spread out it is

Statical analysis

Collecting, exploring and presenting large amounts of data to discover underlying patterns and