Minimizing RSS by taking partial derivative
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
I am learning about linear regression, and the goal is to find parameters $beta$, that minimize the RSS. My textbook accomplishes this by finding $partial text RSS /partial beta = 0$ However, I am slightly stuck on the following step:
They define:
$RSS(beta) = (mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta$,
where $beta$ are scalars, $y$ is a column vector, and $X$ is a matrix.
They find that
$fracpartial RSSpartial beta = -2mathbfX^T(mathbfy-mathbfXbeta)$
I tried deriving this result. I first wrote:
$(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - mathbfX^Tbeta)(mathbfy - mathbfXbeta)$
I then expanded the two terms in brackets:
$mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - mathbfymathbfX^Tbeta + mathbfX^TmathbfXbeta^2$
Now, I differentiate this with respect to $beta$:
$-mathbfy^TmathbfX - mathbfymathbfX^T + 2beta mathbfX^TmathbfX$
This is where I get stuck, comparing my result with the derived result, we both have the $2beta mathbfX^TmathbfX$ term, but I don't know how my first 2 terms should simplify to give $-2mathbfX^Tmathbfy$.
calculus statistics optimization maxima-minima
add a comment |Â
up vote
0
down vote
favorite
I am learning about linear regression, and the goal is to find parameters $beta$, that minimize the RSS. My textbook accomplishes this by finding $partial text RSS /partial beta = 0$ However, I am slightly stuck on the following step:
They define:
$RSS(beta) = (mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta$,
where $beta$ are scalars, $y$ is a column vector, and $X$ is a matrix.
They find that
$fracpartial RSSpartial beta = -2mathbfX^T(mathbfy-mathbfXbeta)$
I tried deriving this result. I first wrote:
$(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - mathbfX^Tbeta)(mathbfy - mathbfXbeta)$
I then expanded the two terms in brackets:
$mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - mathbfymathbfX^Tbeta + mathbfX^TmathbfXbeta^2$
Now, I differentiate this with respect to $beta$:
$-mathbfy^TmathbfX - mathbfymathbfX^T + 2beta mathbfX^TmathbfX$
This is where I get stuck, comparing my result with the derived result, we both have the $2beta mathbfX^TmathbfX$ term, but I don't know how my first 2 terms should simplify to give $-2mathbfX^Tmathbfy$.
calculus statistics optimization maxima-minima
1
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am learning about linear regression, and the goal is to find parameters $beta$, that minimize the RSS. My textbook accomplishes this by finding $partial text RSS /partial beta = 0$ However, I am slightly stuck on the following step:
They define:
$RSS(beta) = (mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta$,
where $beta$ are scalars, $y$ is a column vector, and $X$ is a matrix.
They find that
$fracpartial RSSpartial beta = -2mathbfX^T(mathbfy-mathbfXbeta)$
I tried deriving this result. I first wrote:
$(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - mathbfX^Tbeta)(mathbfy - mathbfXbeta)$
I then expanded the two terms in brackets:
$mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - mathbfymathbfX^Tbeta + mathbfX^TmathbfXbeta^2$
Now, I differentiate this with respect to $beta$:
$-mathbfy^TmathbfX - mathbfymathbfX^T + 2beta mathbfX^TmathbfX$
This is where I get stuck, comparing my result with the derived result, we both have the $2beta mathbfX^TmathbfX$ term, but I don't know how my first 2 terms should simplify to give $-2mathbfX^Tmathbfy$.
calculus statistics optimization maxima-minima
I am learning about linear regression, and the goal is to find parameters $beta$, that minimize the RSS. My textbook accomplishes this by finding $partial text RSS /partial beta = 0$ However, I am slightly stuck on the following step:
They define:
$RSS(beta) = (mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta$,
where $beta$ are scalars, $y$ is a column vector, and $X$ is a matrix.
They find that
$fracpartial RSSpartial beta = -2mathbfX^T(mathbfy-mathbfXbeta)$
I tried deriving this result. I first wrote:
$(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - mathbfX^Tbeta)(mathbfy - mathbfXbeta)$
I then expanded the two terms in brackets:
$mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - mathbfymathbfX^Tbeta + mathbfX^TmathbfXbeta^2$
Now, I differentiate this with respect to $beta$:
$-mathbfy^TmathbfX - mathbfymathbfX^T + 2beta mathbfX^TmathbfX$
This is where I get stuck, comparing my result with the derived result, we both have the $2beta mathbfX^TmathbfX$ term, but I don't know how my first 2 terms should simplify to give $-2mathbfX^Tmathbfy$.
calculus statistics optimization maxima-minima
edited Jul 31 at 16:09
Foobaz John
18k41245
18k41245
asked Jul 31 at 15:29
Thomas Moore
407210
407210
1
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46
add a comment |Â
1
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46
1
1
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46
add a comment |Â
4 Answers
4
active
oldest
votes
up vote
1
down vote
accepted
Note that $beta$ is not a scalar, but a vector.
Let
$$mathbfy = beginbmatrix
y_1 \
y_2 \
vdots \
y_N
endbmatrix$$
$$mathbfX = beginbmatrix
x_11 & x_12 & cdots & x_1p \
x_21 & x_22 & cdots & x_2p \
vdots & vdots & vdots & vdots \
x_N1 & x_N2 & cdots & x_Np
endbmatrix$$
and
$$beta = beginbmatrix
b_1 \
b_2 \
vdots \
b_p
endbmatrixtext.$$
Then $mathbfXbeta in mathbbR^N$ and
$$mathbfXbeta = beginbmatrix
sum_j=1^pb_jx_1j \
sum_j=1^pb_jx_2j \
vdots \
sum_j=1^pb_jx_Nj
endbmatrix implies mathbfy-mathbfXbeta=beginbmatrix
y_1 - sum_j=1^pb_jx_1j \
y_2 - sum_j=1^pb_jx_2j \
vdots \
y_N - sum_j=1^pb_jx_Nj
endbmatrix text.$$
Therefore,
$$(mathbfy-mathbfXbeta)^T(mathbfy-mathbfXbeta) = |mathbfy-mathbfXbeta |^2 = sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)^2text. $$
We have, for each $k = 1, dots, p$,
$$dfracpartial textRSSpartial b_k = 2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)(-x_ik) = -2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_iktext.$$
Then
$$beginaligndfracpartial textRSSpartial beta &= beginbmatrix
dfracpartial textRSSpartial b_1 \
dfracpartial textRSSpartial b_2 \
vdots \
dfracpartial textRSSpartial b_p
endbmatrix \
&= beginbmatrix
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2beginbmatrix
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2mathbfX^T(mathbfy-mathbfXbeta)text.
endalign$$
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
add a comment |Â
up vote
1
down vote
The correct transpose (see property 3) is $(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - beta^TmathbfX^T)(mathbfy - mathbfXbeta)$
The correct expansion is $mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - beta^T mathbfX^T mathbfy + beta^TmathbfX^TmathbfXbeta$
You can simplify the expansion to:
$$mathbfy^Tmathbfy + (-mathbfX^T mathbfy)^T beta + (-mathbfX^T mathbfy)^T beta + beta^TmathbfX^TmathbfXbeta$$
And the result readily follows.
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
add a comment |Â
up vote
1
down vote
Expand the brackets to write
$$
beginalign
RSS(beta)&=y'y-y'Xbeta-beta'X'y+beta'X'Xbeta\
&=y'y-2beta'X'y+beta'X'Xbeta
endalign
$$
where primes denote the transpose and $y'Xbeta=beta'X'y= (y'Xbeta)'$ since $y'Xbeta$ is a $1times 1$ vector. Now we can differentiate to get that
$$
fracpartial RSS(beta)partial beta=-2X'y+2X'Xbeta=-2X'(y-Xbeta)
$$
Here we used two properties. First, if $u=alpha'x$ where $alpha,xinmathbbR^n$, then
$$
fracpartial upartial x_j=alpha_jimplies fracpartial upartial x=alpha.
$$
One should notice that $fracpartial upartial x$ in this case represents the gradient. Second if $u=x'Ax=sum_i=1^nsum_j=1^na_ij x_i x_j$ where $Ain M_ntimes n(mathbbR)
$
and $xinmathbbR^n$, then
$$
fracpartial upartial x_ell=sum_i=1^na_iellx_i+sum_i=1^na_ell ix_i=[(A'+A)x]_ell
implies
fracpartial upartial x=(A'+A)x.
$$
In particular if $A$ is symmetric (like $X'X$ as above), we have that $fracpartial upartial x=2Ax$
add a comment |Â
up vote
1
down vote
Remark: $beta$ is a vector.
In multiple regression, if you have $n$ independent variables, therefore you have $n+1$ parameters to estimate (included intercept), that is: $$y_t=beta_0+beta_1X_1t+...beta_nX_nt+e_t,$$ where each $beta_i$ is scalar.
We can write aforementioned with matrix notation (your problem is in matrix notation):
$$y=Xbeta+e,$$
where $X$ is matrix, $y,beta$ and $e$ are vectors!
More precisely, $beta_i$ is scalar, but $beta$ is vector. Furthermore, you can note that unique solution of the problem that you have mentioned is the following:
$$beta=(X^TX)^-1X^Ty,$$
where you can note easily that $beta$ is a vector.
add a comment |Â
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Note that $beta$ is not a scalar, but a vector.
Let
$$mathbfy = beginbmatrix
y_1 \
y_2 \
vdots \
y_N
endbmatrix$$
$$mathbfX = beginbmatrix
x_11 & x_12 & cdots & x_1p \
x_21 & x_22 & cdots & x_2p \
vdots & vdots & vdots & vdots \
x_N1 & x_N2 & cdots & x_Np
endbmatrix$$
and
$$beta = beginbmatrix
b_1 \
b_2 \
vdots \
b_p
endbmatrixtext.$$
Then $mathbfXbeta in mathbbR^N$ and
$$mathbfXbeta = beginbmatrix
sum_j=1^pb_jx_1j \
sum_j=1^pb_jx_2j \
vdots \
sum_j=1^pb_jx_Nj
endbmatrix implies mathbfy-mathbfXbeta=beginbmatrix
y_1 - sum_j=1^pb_jx_1j \
y_2 - sum_j=1^pb_jx_2j \
vdots \
y_N - sum_j=1^pb_jx_Nj
endbmatrix text.$$
Therefore,
$$(mathbfy-mathbfXbeta)^T(mathbfy-mathbfXbeta) = |mathbfy-mathbfXbeta |^2 = sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)^2text. $$
We have, for each $k = 1, dots, p$,
$$dfracpartial textRSSpartial b_k = 2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)(-x_ik) = -2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_iktext.$$
Then
$$beginaligndfracpartial textRSSpartial beta &= beginbmatrix
dfracpartial textRSSpartial b_1 \
dfracpartial textRSSpartial b_2 \
vdots \
dfracpartial textRSSpartial b_p
endbmatrix \
&= beginbmatrix
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2beginbmatrix
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2mathbfX^T(mathbfy-mathbfXbeta)text.
endalign$$
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
add a comment |Â
up vote
1
down vote
accepted
Note that $beta$ is not a scalar, but a vector.
Let
$$mathbfy = beginbmatrix
y_1 \
y_2 \
vdots \
y_N
endbmatrix$$
$$mathbfX = beginbmatrix
x_11 & x_12 & cdots & x_1p \
x_21 & x_22 & cdots & x_2p \
vdots & vdots & vdots & vdots \
x_N1 & x_N2 & cdots & x_Np
endbmatrix$$
and
$$beta = beginbmatrix
b_1 \
b_2 \
vdots \
b_p
endbmatrixtext.$$
Then $mathbfXbeta in mathbbR^N$ and
$$mathbfXbeta = beginbmatrix
sum_j=1^pb_jx_1j \
sum_j=1^pb_jx_2j \
vdots \
sum_j=1^pb_jx_Nj
endbmatrix implies mathbfy-mathbfXbeta=beginbmatrix
y_1 - sum_j=1^pb_jx_1j \
y_2 - sum_j=1^pb_jx_2j \
vdots \
y_N - sum_j=1^pb_jx_Nj
endbmatrix text.$$
Therefore,
$$(mathbfy-mathbfXbeta)^T(mathbfy-mathbfXbeta) = |mathbfy-mathbfXbeta |^2 = sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)^2text. $$
We have, for each $k = 1, dots, p$,
$$dfracpartial textRSSpartial b_k = 2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)(-x_ik) = -2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_iktext.$$
Then
$$beginaligndfracpartial textRSSpartial beta &= beginbmatrix
dfracpartial textRSSpartial b_1 \
dfracpartial textRSSpartial b_2 \
vdots \
dfracpartial textRSSpartial b_p
endbmatrix \
&= beginbmatrix
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2beginbmatrix
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2mathbfX^T(mathbfy-mathbfXbeta)text.
endalign$$
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
add a comment |Â
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Note that $beta$ is not a scalar, but a vector.
Let
$$mathbfy = beginbmatrix
y_1 \
y_2 \
vdots \
y_N
endbmatrix$$
$$mathbfX = beginbmatrix
x_11 & x_12 & cdots & x_1p \
x_21 & x_22 & cdots & x_2p \
vdots & vdots & vdots & vdots \
x_N1 & x_N2 & cdots & x_Np
endbmatrix$$
and
$$beta = beginbmatrix
b_1 \
b_2 \
vdots \
b_p
endbmatrixtext.$$
Then $mathbfXbeta in mathbbR^N$ and
$$mathbfXbeta = beginbmatrix
sum_j=1^pb_jx_1j \
sum_j=1^pb_jx_2j \
vdots \
sum_j=1^pb_jx_Nj
endbmatrix implies mathbfy-mathbfXbeta=beginbmatrix
y_1 - sum_j=1^pb_jx_1j \
y_2 - sum_j=1^pb_jx_2j \
vdots \
y_N - sum_j=1^pb_jx_Nj
endbmatrix text.$$
Therefore,
$$(mathbfy-mathbfXbeta)^T(mathbfy-mathbfXbeta) = |mathbfy-mathbfXbeta |^2 = sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)^2text. $$
We have, for each $k = 1, dots, p$,
$$dfracpartial textRSSpartial b_k = 2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)(-x_ik) = -2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_iktext.$$
Then
$$beginaligndfracpartial textRSSpartial beta &= beginbmatrix
dfracpartial textRSSpartial b_1 \
dfracpartial textRSSpartial b_2 \
vdots \
dfracpartial textRSSpartial b_p
endbmatrix \
&= beginbmatrix
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2beginbmatrix
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2mathbfX^T(mathbfy-mathbfXbeta)text.
endalign$$
Note that $beta$ is not a scalar, but a vector.
Let
$$mathbfy = beginbmatrix
y_1 \
y_2 \
vdots \
y_N
endbmatrix$$
$$mathbfX = beginbmatrix
x_11 & x_12 & cdots & x_1p \
x_21 & x_22 & cdots & x_2p \
vdots & vdots & vdots & vdots \
x_N1 & x_N2 & cdots & x_Np
endbmatrix$$
and
$$beta = beginbmatrix
b_1 \
b_2 \
vdots \
b_p
endbmatrixtext.$$
Then $mathbfXbeta in mathbbR^N$ and
$$mathbfXbeta = beginbmatrix
sum_j=1^pb_jx_1j \
sum_j=1^pb_jx_2j \
vdots \
sum_j=1^pb_jx_Nj
endbmatrix implies mathbfy-mathbfXbeta=beginbmatrix
y_1 - sum_j=1^pb_jx_1j \
y_2 - sum_j=1^pb_jx_2j \
vdots \
y_N - sum_j=1^pb_jx_Nj
endbmatrix text.$$
Therefore,
$$(mathbfy-mathbfXbeta)^T(mathbfy-mathbfXbeta) = |mathbfy-mathbfXbeta |^2 = sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)^2text. $$
We have, for each $k = 1, dots, p$,
$$dfracpartial textRSSpartial b_k = 2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)(-x_ik) = -2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_iktext.$$
Then
$$beginaligndfracpartial textRSSpartial beta &= beginbmatrix
dfracpartial textRSSpartial b_1 \
dfracpartial textRSSpartial b_2 \
vdots \
dfracpartial textRSSpartial b_p
endbmatrix \
&= beginbmatrix
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
-2sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2beginbmatrix
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i1 \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_i2 \
vdots \
sum_i=1^Nleft(y_i-sum_j=1^pb_jx_ijright)x_ip
endbmatrix \
&= -2mathbfX^T(mathbfy-mathbfXbeta)text.
endalign$$
answered Jul 31 at 17:12
Clarinetist
10.3k32767
10.3k32767
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
add a comment |Â
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
Hi. Thanks for this very detailed derivation! Just a quick question: you have denoted as $x_ip$, above, which would mean that the transpose is $x_pi$, but in your final line, you don't have $x_pi$, but $x_ip$, so wondering where the $X^T$ comes from in the last equality.
â Thomas Moore
Aug 1 at 16:09
1
1
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
@ThomasMoore I would recommend that you do the multiplication yourself to see that the above is true, but at a very simplistic level: remember that all you're doing when you're doing matrix multiplication is dot products of rows of the first matrix to columns of the second matrix. The $p$th column of $mathbfX$ ends up being the $p$th row of $mathbfX^T$ when you're doing the transposition. Thus, when you're performing a dot product between the $p$th row of $mathbfX^T$ and $mathbfy - mathbfXbeta$, ultimately, you're using all of the entries of the $p$th column of $mathbfX$.
â Clarinetist
Aug 1 at 16:28
add a comment |Â
up vote
1
down vote
The correct transpose (see property 3) is $(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - beta^TmathbfX^T)(mathbfy - mathbfXbeta)$
The correct expansion is $mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - beta^T mathbfX^T mathbfy + beta^TmathbfX^TmathbfXbeta$
You can simplify the expansion to:
$$mathbfy^Tmathbfy + (-mathbfX^T mathbfy)^T beta + (-mathbfX^T mathbfy)^T beta + beta^TmathbfX^TmathbfXbeta$$
And the result readily follows.
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
add a comment |Â
up vote
1
down vote
The correct transpose (see property 3) is $(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - beta^TmathbfX^T)(mathbfy - mathbfXbeta)$
The correct expansion is $mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - beta^T mathbfX^T mathbfy + beta^TmathbfX^TmathbfXbeta$
You can simplify the expansion to:
$$mathbfy^Tmathbfy + (-mathbfX^T mathbfy)^T beta + (-mathbfX^T mathbfy)^T beta + beta^TmathbfX^TmathbfXbeta$$
And the result readily follows.
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
add a comment |Â
up vote
1
down vote
up vote
1
down vote
The correct transpose (see property 3) is $(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - beta^TmathbfX^T)(mathbfy - mathbfXbeta)$
The correct expansion is $mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - beta^T mathbfX^T mathbfy + beta^TmathbfX^TmathbfXbeta$
You can simplify the expansion to:
$$mathbfy^Tmathbfy + (-mathbfX^T mathbfy)^T beta + (-mathbfX^T mathbfy)^T beta + beta^TmathbfX^TmathbfXbeta$$
And the result readily follows.
The correct transpose (see property 3) is $(mathbfy - mathbfXbeta)^T (mathbfy-mathbfXbeta) = (mathbfy^T - beta^TmathbfX^T)(mathbfy - mathbfXbeta)$
The correct expansion is $mathbfy^Tmathbfy - mathbfy^TmathbfXbeta - beta^T mathbfX^T mathbfy + beta^TmathbfX^TmathbfXbeta$
You can simplify the expansion to:
$$mathbfy^Tmathbfy + (-mathbfX^T mathbfy)^T beta + (-mathbfX^T mathbfy)^T beta + beta^TmathbfX^TmathbfXbeta$$
And the result readily follows.
answered Jul 31 at 15:48
LinAlg
5,4111319
5,4111319
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
add a comment |Â
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
Hi. Thanks for this. But since $beta$ is a scalar, isn't $beta^T = beta$?
â Thomas Moore
Jul 31 at 15:49
1
1
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
@ThomasMoore my derivation applies both to scalars and to vectors; if you focus on scalars, your derivation goes wrong where you write $yX^T$ in the expansion: that is a matrix and should be $X^T y$. You can then use that $X^Ty = y^T X$.
â LinAlg
Jul 31 at 15:52
add a comment |Â
up vote
1
down vote
Expand the brackets to write
$$
beginalign
RSS(beta)&=y'y-y'Xbeta-beta'X'y+beta'X'Xbeta\
&=y'y-2beta'X'y+beta'X'Xbeta
endalign
$$
where primes denote the transpose and $y'Xbeta=beta'X'y= (y'Xbeta)'$ since $y'Xbeta$ is a $1times 1$ vector. Now we can differentiate to get that
$$
fracpartial RSS(beta)partial beta=-2X'y+2X'Xbeta=-2X'(y-Xbeta)
$$
Here we used two properties. First, if $u=alpha'x$ where $alpha,xinmathbbR^n$, then
$$
fracpartial upartial x_j=alpha_jimplies fracpartial upartial x=alpha.
$$
One should notice that $fracpartial upartial x$ in this case represents the gradient. Second if $u=x'Ax=sum_i=1^nsum_j=1^na_ij x_i x_j$ where $Ain M_ntimes n(mathbbR)
$
and $xinmathbbR^n$, then
$$
fracpartial upartial x_ell=sum_i=1^na_iellx_i+sum_i=1^na_ell ix_i=[(A'+A)x]_ell
implies
fracpartial upartial x=(A'+A)x.
$$
In particular if $A$ is symmetric (like $X'X$ as above), we have that $fracpartial upartial x=2Ax$
add a comment |Â
up vote
1
down vote
Expand the brackets to write
$$
beginalign
RSS(beta)&=y'y-y'Xbeta-beta'X'y+beta'X'Xbeta\
&=y'y-2beta'X'y+beta'X'Xbeta
endalign
$$
where primes denote the transpose and $y'Xbeta=beta'X'y= (y'Xbeta)'$ since $y'Xbeta$ is a $1times 1$ vector. Now we can differentiate to get that
$$
fracpartial RSS(beta)partial beta=-2X'y+2X'Xbeta=-2X'(y-Xbeta)
$$
Here we used two properties. First, if $u=alpha'x$ where $alpha,xinmathbbR^n$, then
$$
fracpartial upartial x_j=alpha_jimplies fracpartial upartial x=alpha.
$$
One should notice that $fracpartial upartial x$ in this case represents the gradient. Second if $u=x'Ax=sum_i=1^nsum_j=1^na_ij x_i x_j$ where $Ain M_ntimes n(mathbbR)
$
and $xinmathbbR^n$, then
$$
fracpartial upartial x_ell=sum_i=1^na_iellx_i+sum_i=1^na_ell ix_i=[(A'+A)x]_ell
implies
fracpartial upartial x=(A'+A)x.
$$
In particular if $A$ is symmetric (like $X'X$ as above), we have that $fracpartial upartial x=2Ax$
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Expand the brackets to write
$$
beginalign
RSS(beta)&=y'y-y'Xbeta-beta'X'y+beta'X'Xbeta\
&=y'y-2beta'X'y+beta'X'Xbeta
endalign
$$
where primes denote the transpose and $y'Xbeta=beta'X'y= (y'Xbeta)'$ since $y'Xbeta$ is a $1times 1$ vector. Now we can differentiate to get that
$$
fracpartial RSS(beta)partial beta=-2X'y+2X'Xbeta=-2X'(y-Xbeta)
$$
Here we used two properties. First, if $u=alpha'x$ where $alpha,xinmathbbR^n$, then
$$
fracpartial upartial x_j=alpha_jimplies fracpartial upartial x=alpha.
$$
One should notice that $fracpartial upartial x$ in this case represents the gradient. Second if $u=x'Ax=sum_i=1^nsum_j=1^na_ij x_i x_j$ where $Ain M_ntimes n(mathbbR)
$
and $xinmathbbR^n$, then
$$
fracpartial upartial x_ell=sum_i=1^na_iellx_i+sum_i=1^na_ell ix_i=[(A'+A)x]_ell
implies
fracpartial upartial x=(A'+A)x.
$$
In particular if $A$ is symmetric (like $X'X$ as above), we have that $fracpartial upartial x=2Ax$
Expand the brackets to write
$$
beginalign
RSS(beta)&=y'y-y'Xbeta-beta'X'y+beta'X'Xbeta\
&=y'y-2beta'X'y+beta'X'Xbeta
endalign
$$
where primes denote the transpose and $y'Xbeta=beta'X'y= (y'Xbeta)'$ since $y'Xbeta$ is a $1times 1$ vector. Now we can differentiate to get that
$$
fracpartial RSS(beta)partial beta=-2X'y+2X'Xbeta=-2X'(y-Xbeta)
$$
Here we used two properties. First, if $u=alpha'x$ where $alpha,xinmathbbR^n$, then
$$
fracpartial upartial x_j=alpha_jimplies fracpartial upartial x=alpha.
$$
One should notice that $fracpartial upartial x$ in this case represents the gradient. Second if $u=x'Ax=sum_i=1^nsum_j=1^na_ij x_i x_j$ where $Ain M_ntimes n(mathbbR)
$
and $xinmathbbR^n$, then
$$
fracpartial upartial x_ell=sum_i=1^na_iellx_i+sum_i=1^na_ell ix_i=[(A'+A)x]_ell
implies
fracpartial upartial x=(A'+A)x.
$$
In particular if $A$ is symmetric (like $X'X$ as above), we have that $fracpartial upartial x=2Ax$
edited Jul 31 at 16:06
answered Jul 31 at 15:50
Foobaz John
18k41245
18k41245
add a comment |Â
add a comment |Â
up vote
1
down vote
Remark: $beta$ is a vector.
In multiple regression, if you have $n$ independent variables, therefore you have $n+1$ parameters to estimate (included intercept), that is: $$y_t=beta_0+beta_1X_1t+...beta_nX_nt+e_t,$$ where each $beta_i$ is scalar.
We can write aforementioned with matrix notation (your problem is in matrix notation):
$$y=Xbeta+e,$$
where $X$ is matrix, $y,beta$ and $e$ are vectors!
More precisely, $beta_i$ is scalar, but $beta$ is vector. Furthermore, you can note that unique solution of the problem that you have mentioned is the following:
$$beta=(X^TX)^-1X^Ty,$$
where you can note easily that $beta$ is a vector.
add a comment |Â
up vote
1
down vote
Remark: $beta$ is a vector.
In multiple regression, if you have $n$ independent variables, therefore you have $n+1$ parameters to estimate (included intercept), that is: $$y_t=beta_0+beta_1X_1t+...beta_nX_nt+e_t,$$ where each $beta_i$ is scalar.
We can write aforementioned with matrix notation (your problem is in matrix notation):
$$y=Xbeta+e,$$
where $X$ is matrix, $y,beta$ and $e$ are vectors!
More precisely, $beta_i$ is scalar, but $beta$ is vector. Furthermore, you can note that unique solution of the problem that you have mentioned is the following:
$$beta=(X^TX)^-1X^Ty,$$
where you can note easily that $beta$ is a vector.
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Remark: $beta$ is a vector.
In multiple regression, if you have $n$ independent variables, therefore you have $n+1$ parameters to estimate (included intercept), that is: $$y_t=beta_0+beta_1X_1t+...beta_nX_nt+e_t,$$ where each $beta_i$ is scalar.
We can write aforementioned with matrix notation (your problem is in matrix notation):
$$y=Xbeta+e,$$
where $X$ is matrix, $y,beta$ and $e$ are vectors!
More precisely, $beta_i$ is scalar, but $beta$ is vector. Furthermore, you can note that unique solution of the problem that you have mentioned is the following:
$$beta=(X^TX)^-1X^Ty,$$
where you can note easily that $beta$ is a vector.
Remark: $beta$ is a vector.
In multiple regression, if you have $n$ independent variables, therefore you have $n+1$ parameters to estimate (included intercept), that is: $$y_t=beta_0+beta_1X_1t+...beta_nX_nt+e_t,$$ where each $beta_i$ is scalar.
We can write aforementioned with matrix notation (your problem is in matrix notation):
$$y=Xbeta+e,$$
where $X$ is matrix, $y,beta$ and $e$ are vectors!
More precisely, $beta_i$ is scalar, but $beta$ is vector. Furthermore, you can note that unique solution of the problem that you have mentioned is the following:
$$beta=(X^TX)^-1X^Ty,$$
where you can note easily that $beta$ is a vector.
edited Jul 31 at 17:16
answered Jul 31 at 17:02
David
277
277
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2868182%2fminimizing-rss-by-taking-partial-derivative%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Did you appreciate any of the answers? You should accept the best answer to mark this question as answered.
â LinAlg
Jul 31 at 19:46