Supervised learning¶
-
Supervised larning
-
Unsupervised learning
-
Reinforcement learning
Whats is machine learning¶
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks.
Linear Regression¶
-
A Part of machine learning
-
Given training set x, y
-
Find a good approximation to f: \(x \to y\)
-
Examples:
-
Spam detection ( Classification)
-
Digit recognition ( Classification)
-
House price prediction (Refression)
-
Terminology¶
-
Given a data point (x, y), x is called featyre vector, y is called label
-
The dataset given for learning is training data
-
The dataset to be tested is called testing data
Machine learning 3 steps¶
- Collect data, extract features
-
Determine a model
-
Train the model with the data
Loss¶
Loss on traning set
We measure the error using a loss function \(L(y, \hat{y})\)
For regression, squared error is often used \(\(L(y_1, f(x_i)) = (y_i - f(x_i))^2\)\)
Loss on testing set
Empirical loss is measuring the loss on the training set
We assume both training set and testing set are i.i.d from the same distribution D - Minimizing loss on training set will make loss on testing set small
Minimizing loss functions¶
-
The minimizers of some loss functions have analytical solutions: an exact solution you can explicitly derive by analyzing the formula.
-
However, most poular supervised learning models use loss functions with no analytical solution
-
We use gradient descent to approximate the minimal value of function.
-
Gradients: A vector, points to the direction where changing value is the fastest.
Method¶
-
For function G, randomly guess an initial value \(x_0\)
-
Repeat \(xi+1 = x_i - r \times \nabla G(x)\) where \(\nabla\) denotes the gradients, r denotes learning rate
-
Until convergence
from sympy import symbols, diff
r = 0.1
f_i = (1, 1, 1)
x, y, z = symbols('x y z', real=True)
f = (y + 2 * x)**2 + y + 2*x
g = (diff(f, x), diff(f, y), diff(f, z))
G
(8*x + 4*y + 2, 4*x + 2*y + 1, 0)
import numpy as np
result = np.array([8, 6, 3]) * r +np.array([1, 1, 1])
result
array([1.8, 1.6, 1.3])
Linear Classification¶
-
Use a line to separate data points
-
Use \(x = (x_1, x_2)\), \(w = (w_1, w_2)\), i.e., x, w are vectors in 2D space
Doesn't work well with classification problem
- Label y as either 1 or -1
-
Find f_w(x) = w^Tx that minimizes the loss function \(\(L(f_w(x)) = \frac{1}{n}\sum_{i=1}^n(w^Tx_i-y_1)^2\)\)
-
Find a line that minimizes the distance between red and blue
If there is a outlier in the graph, the seperation line will miss classification some points
- If the value get very large, the \(w^TX_i\) is correct \(\to\) large loss value even if predict value is positive.¶
Solution:
We use sigmoid function to minimize the value between 0 and 1 \(\(\sigma(a) = \frac{1}{1+exp(-a)}\)\)
-
Similar to step functions
-
Continuous and easy to compute
Some properties of sigmoid function¶
-
\(\sigma(a) = \frac{1}{1+exp(-a)} \in (0, 1)\)
-
symetric
-
Easy to compute gradients
Logistic Regression¶
-
Better approach ( cross-entropy loss function) find w that minimizes loss function
-
If misclassfication happens on i-th data with label 1, \(log(\sigma(w^Tx_i))\) is very large
-
No analytical solution, needs gradient descent
SVM¶
A svm performs classification by finding the hyperplane that maximizes the margin between the two classes
K-Nearest neighbor methods¶
-
Learning algorithm: just store training examples
-
Prediction algorithm:
- Regression: take the average value of k nearest neighbors
- Classification: assign to the most frequent class of k nearest neighbors
-
Easy to train with high storage requirement, but high-computation cost at prediction
| --- | Linear | knn |
|---|---|---|
| Advantages | Easy to fit | Strong assumtions on linear relationship |
| Disadvantages | Hard to classify the data | Takes a lot of computation power |
Decision Tree¶
Entropy is used to measure how informative is a probability distribution. The more entropy, the more uncertainty.
More info
Wrap up¶
-
Collect data, extract features
-
Determine a model
- Select a good model for your data