Tags: Week 1 : Introduction to Machine Learning
C1_W1_Lab03_Cost_function_Soln.ipynb
Regression 정의
Regression is to predict numbers
Any supervised learning model that predicts a number such as 220,000 or 1.5 or negative 33.2 is addressing what’s called a regression problem
Linear Regression
- Fitting a straight line to your dataset
 - Regression인데 선형인 regression, not 비선형
 - Linear regression builds a model which establishes a relationship between features and targets
 - The model has two parameters $w$ and $b$ whose values are ‘fitted’ using training data.
 - Supervised Learning
 Also called “one variable(a single feature)” or “Univariate(one variable) Linear Regression”

(Predictive) Function
\[f_{w,b}(x)=wx+b\]- $f$ : Hypothesis(Historically, This function used to called as)
 - $\hat{y}$ : Estimated or predicted value of $y$
 - $y$ : Target(or Label), which is actual true value in the training set
 - $x$ : Input or input feature
 - $w$ : parameter: weight
 - $b$ : parameter: bias
 - Subscript $w$, $b$ of $f_{w,b}(x)$ : $w$, $b$ are fixed, which are **always a constant value
 
Parameters
- $w$ and $b$ will determine the prediction $\hat{y}$ based on the input features $x$. The function takes $x$ as input, and depending on the values of $w$ and $b$, $f$ will output a prediction $\hat{y}$
 - In machine learning, parameters of the model are the variables you can adjust during training in order to improve the model
 - Parameters also are referred to as coefficients or weights
 
Cost Function
머신러닝 모델은 예측값과 실제값의 차이가 최소화되는 방향으로 parameters($w$, $b$)을 수정(Update)함
\[\min_{w,b} \ J(w, b)\]- 손실함수 $J(w, b)$를 최소화하는 $w$, $b$(min 하첨자)를 찾는 수학적 표현
 
Function
Note: There are many different type of cost functions, except below cost function
\[J(w,b) = \frac{1}{2m}∑^m_{i=1}(f_{w,b}(x^{(i)})−y^{(i)})^2\]- $m$ : number of training examples
 - $f_{w,b}(x^{(i)})$ = $\hat{y}^{(i)}$ : Prediction value
 - $y^{(i)}$ : Target
 - Error : $f_{w,b}(x^{(i)})-y^{(i)}$
 - $m$ : number of training examples
 - By convention, the cost function that machine learning people use actually divides by 2 times m. The extra division by 2 is just meant to make some of our later calculations look neater, but the cost function still works whether you include this division by 2 or not
 - Called as Squared Error Cost Function
 
Intuition
Predictive linear function is a function of the input $x$
Cost function is a function of the parameter w, the horizontal axis is now labeled w and not x of function, and the vertical axis is now J and not y of function
(1) With parameter $w$ only, being $b = 0$
- Model : $f_{w}(x)=wx$
 - Parameters : $w$
 - Cost function : $J(w) = \frac{1}{2m}∑^m_{i=1}(f_{w}(x^{(i)})−y^{(i)})^2$
 - Goal : $\min_{w} \ J(w)$
 그래프로 쉽게 이해하기



(2) With parameter $w$, $b$


- Model : $f_{w,b}(x)=wx+b$
 - Parameters : $w, b$
 - Cost function : $J(w,b) = \frac{1}{2m}∑^m_{i=1}(f_{w,b}(x^{(i)})−y^{(i)})^2$
 - Goal : $\min_{w, b} \ J(w, b)$
 Examples1 : 손실함수값이 minimum에서 멀리있다

Examples2 : 손실함수값이 minimum에 있음






