Matrix multiplication: interpreting and understanding the process

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
37
down vote

favorite
55












I have just watched the first half of the 3rd lecture of Gilbert Strang on the open course ware with link:



http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/



It seems that with a matrix multiplication $AB=C$, that the entries as scalars, are formed from the dot product computations of the rows of $A$ with the columns of $B$. Visual interpretations from mechanics of overlpaing forces come to mind immediately because that is the source for the dot product (inner product).



I see the rows of $C$ as being the dot product of the rows of $B$, with the dot product of a particular row of $A$. Similar to the above and it is easy to see this from the individual entries in the matrix $C$ as to which elements change to give which dot products.



For understanding matrix multiplication there is the geometrical interpretation, that the matrix multiplication is a change in the reference system since matrix $B$ can be seen as a transormation operator for rotation, scalling, reflection and skew. It is easy to see this by constructing example $B$ matrices with these effects on $A$. This decomposition is a strong argument and is strongly convincing of its generality. This interpreation is strong but not smooth because I would find smoother an explanation which would be an interpretation begining from the dot product of vectors and using this to explain the process and the interpretation of the results (one which is a bit easier to see without many examples of the putting numbers in and seeing what comes out which students go through).



I can hope that sticking to dot products throughout the explanation and THEN seeing how these can be seen to produce scalings, rotations, and skewings would be better. But, after some simple graphical examples I saw this doesn't work as the order of the columns in matrix $B$ are important and don't show in the graphical representation.



The best explanation I can find is at Yahoo Answers. It is convincing but a bit disappointing (explains why this approach preserves the "composition of linear transformations"; thanks @Arturo Magidin). So the question is: Why does matrix multiplication happen as it does, and are there good practical examples to support it? Preferably not via rotations/scalings/skews (thanks @lhf).







share|cite|improve this question

















  • 6




    What is your question?
    – lhf
    Mar 1 '11 at 19:01






  • 12




    Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
    – Arturo Magidin
    Mar 1 '11 at 19:09






  • 9




    Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
    – Arturo Magidin
    Mar 1 '11 at 19:26






  • 4




    Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
    – AD.
    Mar 1 '11 at 19:45






  • 2




    I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
    – Mitch
    Mar 1 '11 at 21:11














up vote
37
down vote

favorite
55












I have just watched the first half of the 3rd lecture of Gilbert Strang on the open course ware with link:



http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/



It seems that with a matrix multiplication $AB=C$, that the entries as scalars, are formed from the dot product computations of the rows of $A$ with the columns of $B$. Visual interpretations from mechanics of overlpaing forces come to mind immediately because that is the source for the dot product (inner product).



I see the rows of $C$ as being the dot product of the rows of $B$, with the dot product of a particular row of $A$. Similar to the above and it is easy to see this from the individual entries in the matrix $C$ as to which elements change to give which dot products.



For understanding matrix multiplication there is the geometrical interpretation, that the matrix multiplication is a change in the reference system since matrix $B$ can be seen as a transormation operator for rotation, scalling, reflection and skew. It is easy to see this by constructing example $B$ matrices with these effects on $A$. This decomposition is a strong argument and is strongly convincing of its generality. This interpreation is strong but not smooth because I would find smoother an explanation which would be an interpretation begining from the dot product of vectors and using this to explain the process and the interpretation of the results (one which is a bit easier to see without many examples of the putting numbers in and seeing what comes out which students go through).



I can hope that sticking to dot products throughout the explanation and THEN seeing how these can be seen to produce scalings, rotations, and skewings would be better. But, after some simple graphical examples I saw this doesn't work as the order of the columns in matrix $B$ are important and don't show in the graphical representation.



The best explanation I can find is at Yahoo Answers. It is convincing but a bit disappointing (explains why this approach preserves the "composition of linear transformations"; thanks @Arturo Magidin). So the question is: Why does matrix multiplication happen as it does, and are there good practical examples to support it? Preferably not via rotations/scalings/skews (thanks @lhf).







share|cite|improve this question

















  • 6




    What is your question?
    – lhf
    Mar 1 '11 at 19:01






  • 12




    Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
    – Arturo Magidin
    Mar 1 '11 at 19:09






  • 9




    Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
    – Arturo Magidin
    Mar 1 '11 at 19:26






  • 4




    Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
    – AD.
    Mar 1 '11 at 19:45






  • 2




    I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
    – Mitch
    Mar 1 '11 at 21:11












up vote
37
down vote

favorite
55









up vote
37
down vote

favorite
55






55





I have just watched the first half of the 3rd lecture of Gilbert Strang on the open course ware with link:



http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/



It seems that with a matrix multiplication $AB=C$, that the entries as scalars, are formed from the dot product computations of the rows of $A$ with the columns of $B$. Visual interpretations from mechanics of overlpaing forces come to mind immediately because that is the source for the dot product (inner product).



I see the rows of $C$ as being the dot product of the rows of $B$, with the dot product of a particular row of $A$. Similar to the above and it is easy to see this from the individual entries in the matrix $C$ as to which elements change to give which dot products.



For understanding matrix multiplication there is the geometrical interpretation, that the matrix multiplication is a change in the reference system since matrix $B$ can be seen as a transormation operator for rotation, scalling, reflection and skew. It is easy to see this by constructing example $B$ matrices with these effects on $A$. This decomposition is a strong argument and is strongly convincing of its generality. This interpreation is strong but not smooth because I would find smoother an explanation which would be an interpretation begining from the dot product of vectors and using this to explain the process and the interpretation of the results (one which is a bit easier to see without many examples of the putting numbers in and seeing what comes out which students go through).



I can hope that sticking to dot products throughout the explanation and THEN seeing how these can be seen to produce scalings, rotations, and skewings would be better. But, after some simple graphical examples I saw this doesn't work as the order of the columns in matrix $B$ are important and don't show in the graphical representation.



The best explanation I can find is at Yahoo Answers. It is convincing but a bit disappointing (explains why this approach preserves the "composition of linear transformations"; thanks @Arturo Magidin). So the question is: Why does matrix multiplication happen as it does, and are there good practical examples to support it? Preferably not via rotations/scalings/skews (thanks @lhf).







share|cite|improve this question













I have just watched the first half of the 3rd lecture of Gilbert Strang on the open course ware with link:



http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/



It seems that with a matrix multiplication $AB=C$, that the entries as scalars, are formed from the dot product computations of the rows of $A$ with the columns of $B$. Visual interpretations from mechanics of overlpaing forces come to mind immediately because that is the source for the dot product (inner product).



I see the rows of $C$ as being the dot product of the rows of $B$, with the dot product of a particular row of $A$. Similar to the above and it is easy to see this from the individual entries in the matrix $C$ as to which elements change to give which dot products.



For understanding matrix multiplication there is the geometrical interpretation, that the matrix multiplication is a change in the reference system since matrix $B$ can be seen as a transormation operator for rotation, scalling, reflection and skew. It is easy to see this by constructing example $B$ matrices with these effects on $A$. This decomposition is a strong argument and is strongly convincing of its generality. This interpreation is strong but not smooth because I would find smoother an explanation which would be an interpretation begining from the dot product of vectors and using this to explain the process and the interpretation of the results (one which is a bit easier to see without many examples of the putting numbers in and seeing what comes out which students go through).



I can hope that sticking to dot products throughout the explanation and THEN seeing how these can be seen to produce scalings, rotations, and skewings would be better. But, after some simple graphical examples I saw this doesn't work as the order of the columns in matrix $B$ are important and don't show in the graphical representation.



The best explanation I can find is at Yahoo Answers. It is convincing but a bit disappointing (explains why this approach preserves the "composition of linear transformations"; thanks @Arturo Magidin). So the question is: Why does matrix multiplication happen as it does, and are there good practical examples to support it? Preferably not via rotations/scalings/skews (thanks @lhf).









share|cite|improve this question












share|cite|improve this question




share|cite|improve this question








edited Feb 27 '15 at 19:28









epimorphic

2,61721533




2,61721533









asked Mar 1 '11 at 18:51









Vass

5492921




5492921







  • 6




    What is your question?
    – lhf
    Mar 1 '11 at 19:01






  • 12




    Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
    – Arturo Magidin
    Mar 1 '11 at 19:09






  • 9




    Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
    – Arturo Magidin
    Mar 1 '11 at 19:26






  • 4




    Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
    – AD.
    Mar 1 '11 at 19:45






  • 2




    I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
    – Mitch
    Mar 1 '11 at 21:11












  • 6




    What is your question?
    – lhf
    Mar 1 '11 at 19:01






  • 12




    Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
    – Arturo Magidin
    Mar 1 '11 at 19:09






  • 9




    Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
    – Arturo Magidin
    Mar 1 '11 at 19:26






  • 4




    Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
    – AD.
    Mar 1 '11 at 19:45






  • 2




    I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
    – Mitch
    Mar 1 '11 at 21:11







6




6




What is your question?
– lhf
Mar 1 '11 at 19:01




What is your question?
– lhf
Mar 1 '11 at 19:01




12




12




Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
– Arturo Magidin
Mar 1 '11 at 19:09




Matrix multiplication is defined the way it does because it then coincides with composition of linear transformations. There are other matrix multiplications (e.g., the Kronecker product, which is entry-by-entry), but the great advantage of the "usual" matrix multiplication is that it corresponds to composition of linear transformations.
– Arturo Magidin
Mar 1 '11 at 19:09




9




9




Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
– Arturo Magidin
Mar 1 '11 at 19:26




Matrix multiplication doesn't "happen", it is defined a certain way. Why it is defined that way is precisely so that it corresponds to compositions of linear transformations: nothing more and nothing else. For "practical examples", write out linear transformations in terms of the basis, write out what the composition is, and you'll see it corresponds exactly to matrix multiplication. There is nothing to "support", the definition was made with one particular purpose in mind, and it achieves that purpose, period.
– Arturo Magidin
Mar 1 '11 at 19:26




4




4




Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
– AD.
Mar 1 '11 at 19:45




Supporting Arturo: Matrices are just used to visualise linear transformations, in in this manner the composition of two linear transformations $L_1$ and $L_2$ is a new linear transformation $L_3=L_1circ L_2$ which when you write it out as matrices is the product of the matrices.
– AD.
Mar 1 '11 at 19:45




2




2




I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
– Mitch
Mar 1 '11 at 21:11




I feel like this is then a question about history: why did (= what historical motivations) matrix multiplication get defined like it is? Did the dot product (which is itself magical in that it gives a projection) come first?
– Mitch
Mar 1 '11 at 21:11










3 Answers
3






active

oldest

votes

















up vote
95
down vote



accepted










Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.



For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.



Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.



Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $Tcolon mathbbR^ntomathbbR^m$. Since linear transformations satisfy $T(alphamathbfx+betamathbfy) = alpha T(mathbfx)+beta T(mathbfy)$, if you know the value of $T$ at each of $mathbfe_1,ldots,mathbfe_n$, where $mathbfe^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $mathbbR^n$.



So in order to describe the value of $T$, I just need to tell you what $T(mathbfe_i)$ is. For example, we can take
$$T(mathbfe_i) = left(beginarrayca_1i\a_2i\ vdots\ a_miendarrayright).$$
Then, since
$$left(beginarrayck_1\k_2\ vdots\k_nendarrayright) = k_1mathbfe_1 + cdots +k_nmathbfe_n,$$ we have
$$Tleft(beginarrayck_1\k_2\ vdots\ k_nendarrayright) = k_1T(mathbfe_1) + cdots +k_nT(mathbfe_n) = k_1left(beginarrayca_11\a_21\ vdots\a_m1endarrayright) + cdots + k_nleft(beginarrayca_1n\a_2n\ vdots\ a_mnendarrayright).$$



It is very fruitful, then to keep track of the $a_ij$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
$$left(beginarraycccc
a_11 & a_12 & cdots & a_1n\
a_21 & a_22 & cdots & a_2n\
vdots & vdots & ddots & vdots\
a_m1 & a_m2 & cdots & a_mn
endarrayright).$$
If we want to keep track of $T$ this way, then for an arbitrary vector $mathbfx = (x_1,ldots,x_n)^t$ (the $^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(mathbfx)$ corresponds to:
$$left(beginarraycccc
a_11 & a_12 & cdots & a_1n\
a_21 & a_22 & cdots & a_2n\
vdots & vdots & ddots & vdots\
a_m1 & a_m2 & cdots & a_mn
endarrayright) left(beginarrayc
x_1\x_2\ vdots\ x_nendarrayright) = left(beginarrayc
a_11x_1 + a_12x_2 + cdots + a_1nx_n\
a_21x_1 + a_22x_2 + cdots + a_2nx_n\
vdots\
a_m1x_1 + a_m2x_2 + cdots + a_mnx_n
endarrayright).$$



What happens when we have two linear transformations, $Tcolon mathbbR^ntomathbbR^m$ and $ScolonmathbbR^ptomathbbR^n$? If $T$ corresponds as above to a certain $mtimes n$ matrix, then $S$ will likewise correspond to a certain $ntimes p$ matrix, say
$$left(beginarraycccc
b_11 & b_12 & cdots & b_1p\
b_21 & b_22 & cdots & b_2p\
vdots & vdots & ddots & vdots\
b_n1 & b_n2 & cdots & b_np
endarrayright).$$
What is $Tcirc S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $mathbbR^p$ to $mathbbR^m$, so it should correspond to an $mtimes p$ matrix. Which matrix? If we let $mathbff_1,ldots,mathbff_p$ be the (column) $p$-vectors given by letting $mathbff_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
$$S(mathbff_j) = left(beginarraycb_1j\b_2j\ vdots \b_njendarrayright) = b_1jmathbfe_1+cdots + b_njmathbfe_n.$$



So, what is $Tcirc S(mathbff_j)$? This is what goes in the $j$th column of the matrix that corresponds to $Tcirc S$. Evaluating, we have:
beginalign*
Tcirc S(mathbff_j) &= TBigl( S(mathbff_j)Bigr)\
&= TBigl( b_1jmathbfe_1 + cdots + b_njmathbfe_nBigr)\
&= b_1j T(mathbfe_1) + cdots + b_njT(mathbfe_n)\
&= b_1jleft(beginarrayc
a_11\ a_21\ vdots\ a_m1endarrayright) + cdots + b_njleft(beginarrayc a_1n\a_2n\ vdots\ a_mnendarrayright)\
&= left(beginarrayc
a_11b_1j + a_12b_2j + cdots + a_1nb_nj\
a_21b_1j + a_22b_2j + cdots + a_2nb_nj\
vdots\
a_m1b_1j + a_m2b_2j + cdots + a_mnb_nj
endarrayright).
endalign*
So if we want to write down the matrix that corresponds to $Tcirc S$, then the $(i,j)$th entry will be
$$a_i1b_1j + a_i2b_2j + cdots + a_inb_nj.$$
So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $Tcirc S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $mtimes n$ with entries $a_ij$ (let's call it $A$); and the matrix of $S$ is $ntimes p$ with entries $b_rs$ (let's call it $B$), then the matrix of $Tcirc S$ (let's call it $Acirc B$ or $AB$) is $mtimes p$ and with entries $c_kell$, where
$$c_kell = a_k1b_1ell + a_k2b_2ell + cdots + a_knb_nell$$
by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.



In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1times n$ matrix consisting of the $i$th row of $A$, with the $ntimes 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $langlemathbfx,mathbfyrangle =mathrmtrace(overlinemathbfy^tmathbfx)$).



And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(alpha B) = alpha(AB) = (alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).



So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.



Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $mathbfxcdotmathbfy=0$ only tells you that $mathbfx$ and $mathbfy$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.



Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $mathbbR^n$ to itself, and which are one-to-one and onto. There are lots of linear transfomrations that aren't like that. For example, the linear transformation $D$ from $mathbbR^3$ to $mathbbR^2$ defined by
$$Dleft(beginarrayc
a\b\cendarrayright) = left(beginarraycb\2cendarrayright)$$
is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2times 3$, and is
$$left(beginarraycc
0 & 1 & 0\
0 & 0 & 2
endarrayright).$$
Now consider the linear transformation $UcolonmathbbR^2tomathbbR^2$ given by
$$Uleft(beginarraycx\yendarrayright) = left(beginarrayc3x+2y\
9x + 6yendarrayright).$$
Again, this is not a "change in perspective", because the vector $binom2-3$ is mapped to $binom00$. It has a matrix, $2times 2$, which is
$$left(beginarraycc
3 & 2\
9 & 6
endarrayright).$$
So the composition $Ucirc T$ has matrix:
$$left(beginarraycc
3 & 2\
9 & 6
endarrayright) left(beginarrayccc
0 & 1 & 0\
0 & 0 & 2
endarrayright) = left(beginarrayccc
0 & 3 & 4\
0 & 9 & 12
endarrayright),$$
which tells me that
$$Ucirc Tleft(beginarraycx\y\zendarrayright) = left(beginarrayc 3y + 4z\ 9y+12zendarrayright).$$



Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $mtimes n$ matrix times a $ptimes q$ matrix and gives an $mptimes nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.






share|cite|improve this answer

















  • 11




    Wow. I would have liked to give +2 rep for this, but that's not possible...
    – Gottfried Helms
    Mar 1 '11 at 21:02






  • 5




    an amazing answer. this is better than some textbooks or internet resources
    – Vass
    Mar 2 '11 at 12:53










  • is a linear combination also a linear transform?
    – Vass
    Mar 2 '11 at 14:57






  • 1




    @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
    – Rahul
    Apr 8 '11 at 17:13






  • 1




    Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
    – Pramod
    Jan 17 '15 at 6:02

















up vote
10
down vote













I think part of the problem people have with getting used to linear transformations vs. matrices is that they have probably never seen an example of a linear transformation defined without reference to a matrix or a basis. So here is such an example. Let $V$ be the vector space of real polynomials of degree at most $3$, and let $f : V to V$ be the derivative.



$V$ does not come equipped with a natural choice of basis. You might argue that $ 1, x, x^2, x^3 $ is natural, but it's only convenient: there's no reason to privilege this basis over $ 1, (x+c), (x+c)^2, (x+c)^3 $ for any $c in mathbbR$ (and, depending on what my definitions are, it is literally impossible to do so). More generally, $ a_0(x), a_1(x), a_2(x), a_3(x) $ is a basis for any collection of polynomials $a_i$ of degree $i$.



$V$ also does not come equipped with a natural choice of dot product, so there's no way to include those in the discussion without making an arbitrary choice. It really is just a vector space equipped with a linear transformation.



Since we want to talk about composition, let's write down a second linear transformation. $g : V to V$ will send a polynomial $p(x)$ to the polynomial $p(x + 1)$. Note that, once again, I do not need to refer to a basis to define $g$.



Then the abstract composition $gf : V to V$ is well-defined; it sends a polynomial $p(x)$ to the polynomial $p'(x + 1)$. I don't need to refer to a basis or multiply any matrices to see this; all I am doing is composing two functions.



Now let's do everything in a particular basis to see that we get the same answer using the correct and natural definition of matrix multiplication. We'll use the basis $ 1, x, x^2, x^3 $. In this basis $f$ has matrix



$$left[ beginarraycccc 0 & 1 & 0 & 0 \
0 & 0 & 2 & 0 \
0 & 0 & 0 & 3 \
0 & 0 & 0 & 0 endarray right]$$



and $g$ has matrix



$$left[ beginarraycccc 1 & 1 & 1 & 1 \
0 & 1 & 2 & 3 \
0 & 0 & 1 & 3 \
0 & 0 & 0 & 1 endarray right].$$



Now I encourage you to go through all the generalities in Arturo's post in this example to verify that $gf$ has the matrix it is supposed to have.






share|cite|improve this answer




























    up vote
    0
    down vote













    Let $T:mathsfVtomathsfW$ a linear transformation and $beta=v_1,v_2, gamma=w_1,w_2$ are bases of $mathsfV, mathsfW$ respectively. The value of interest is



    $$T(v).$$



    Let $v=xv_1+yv_2$, then



    $$beginalign
    T(v)&=T(xv_1+yv_2)\
    &=xT(v_1)+yT(v_2).
    endalign$$



    No matter what value of $v$ is, $T(v_1),T(v_2)$ are needed, the notation can be simplified. Let



    $$T(v_1)=aw_1+bw_2,\
    T(v_2)=cw_1+dw_2,$$



    represent $T(v_1), T(v_2)$ in columns



    $$
    beginarrayll
    T(v_1) & T(v_2)\
    aw_1 & cw_1\
    + & +\
    bw_2 & dw_2\
    endarray
    $$



    Put $w_1, w_2$ on the left side as a note and omit the plus signs



    $$
    beginarraylll
    & T(v_1) & T(v_2) \
    w_1 & a & c \
    w_2 & b & d \
    endarray
    $$



    Since $T(v)=xT(v_1)+yT(v_2)$



    $$
    beginarray
    & x & y \
    & T(v_1) +& T(v_2) = & T(v) \
    w_1 & a & c & e \
    w_2 & b & d & f \
    endarray
    $$



    An $colorblueoperation$ can be defined such that



    $$
    e=colorbluexa+colorblueyc\
    f=colorbluexb+colorblueyd
    $$



    that is



    $$
    beginbmatrixa & c\b & dendbmatrix
    colorblueoper.
    beginbmatrixx\yendbmatrix
    =
    beginbmatrixe\fendbmatrix,
    $$



    The order $w_1, w_2$ are listed is associated to this notation, so the idea of ordered basis is required to denote the linear transformation matrix



    $$large[T]_beta^gamma$$



    which means $T$ only accept $beginbmatrixx\yendbmatrix$ in basis $beta$, or in other words



    $$Large[v]_beta$$



    coordinate vector relative to $beta$.



    The $colorblueoperation$ is



    $$large[T(v)]_gamma = [T]_beta^gamma colorblueLargecdot [v]_beta$$



    --



    Since that coordinate vector $large[v]_beta$ can be seen as one of the columns of another linear transformation matrix $large[U]_alpha^beta$, the composition of $large[T]_beta^gamma$ and $large[U]_alpha^beta$



    $$beginalign
    large[T([U]_alpha^beta)]_gamma &= large[T( [overbraceu_1,u_2,dots,u_n^lVertalpharVert]_beta )]_gamma\
    &=large[T( U )]_alpha^gamma
    endalign,$$



    Hope this help.






    share|cite|improve this answer























      Your Answer




      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("mathjaxEditing", function ()
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      );
      );
      , "mathjax-editing");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "69"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f24456%2fmatrix-multiplication-interpreting-and-understanding-the-process%23new-answer', 'question_page');

      );

      Post as a guest






























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      95
      down vote



      accepted










      Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.



      For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.



      Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.



      Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $Tcolon mathbbR^ntomathbbR^m$. Since linear transformations satisfy $T(alphamathbfx+betamathbfy) = alpha T(mathbfx)+beta T(mathbfy)$, if you know the value of $T$ at each of $mathbfe_1,ldots,mathbfe_n$, where $mathbfe^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $mathbbR^n$.



      So in order to describe the value of $T$, I just need to tell you what $T(mathbfe_i)$ is. For example, we can take
      $$T(mathbfe_i) = left(beginarrayca_1i\a_2i\ vdots\ a_miendarrayright).$$
      Then, since
      $$left(beginarrayck_1\k_2\ vdots\k_nendarrayright) = k_1mathbfe_1 + cdots +k_nmathbfe_n,$$ we have
      $$Tleft(beginarrayck_1\k_2\ vdots\ k_nendarrayright) = k_1T(mathbfe_1) + cdots +k_nT(mathbfe_n) = k_1left(beginarrayca_11\a_21\ vdots\a_m1endarrayright) + cdots + k_nleft(beginarrayca_1n\a_2n\ vdots\ a_mnendarrayright).$$



      It is very fruitful, then to keep track of the $a_ij$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright).$$
      If we want to keep track of $T$ this way, then for an arbitrary vector $mathbfx = (x_1,ldots,x_n)^t$ (the $^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(mathbfx)$ corresponds to:
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright) left(beginarrayc
      x_1\x_2\ vdots\ x_nendarrayright) = left(beginarrayc
      a_11x_1 + a_12x_2 + cdots + a_1nx_n\
      a_21x_1 + a_22x_2 + cdots + a_2nx_n\
      vdots\
      a_m1x_1 + a_m2x_2 + cdots + a_mnx_n
      endarrayright).$$



      What happens when we have two linear transformations, $Tcolon mathbbR^ntomathbbR^m$ and $ScolonmathbbR^ptomathbbR^n$? If $T$ corresponds as above to a certain $mtimes n$ matrix, then $S$ will likewise correspond to a certain $ntimes p$ matrix, say
      $$left(beginarraycccc
      b_11 & b_12 & cdots & b_1p\
      b_21 & b_22 & cdots & b_2p\
      vdots & vdots & ddots & vdots\
      b_n1 & b_n2 & cdots & b_np
      endarrayright).$$
      What is $Tcirc S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $mathbbR^p$ to $mathbbR^m$, so it should correspond to an $mtimes p$ matrix. Which matrix? If we let $mathbff_1,ldots,mathbff_p$ be the (column) $p$-vectors given by letting $mathbff_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
      $$S(mathbff_j) = left(beginarraycb_1j\b_2j\ vdots \b_njendarrayright) = b_1jmathbfe_1+cdots + b_njmathbfe_n.$$



      So, what is $Tcirc S(mathbff_j)$? This is what goes in the $j$th column of the matrix that corresponds to $Tcirc S$. Evaluating, we have:
      beginalign*
      Tcirc S(mathbff_j) &= TBigl( S(mathbff_j)Bigr)\
      &= TBigl( b_1jmathbfe_1 + cdots + b_njmathbfe_nBigr)\
      &= b_1j T(mathbfe_1) + cdots + b_njT(mathbfe_n)\
      &= b_1jleft(beginarrayc
      a_11\ a_21\ vdots\ a_m1endarrayright) + cdots + b_njleft(beginarrayc a_1n\a_2n\ vdots\ a_mnendarrayright)\
      &= left(beginarrayc
      a_11b_1j + a_12b_2j + cdots + a_1nb_nj\
      a_21b_1j + a_22b_2j + cdots + a_2nb_nj\
      vdots\
      a_m1b_1j + a_m2b_2j + cdots + a_mnb_nj
      endarrayright).
      endalign*
      So if we want to write down the matrix that corresponds to $Tcirc S$, then the $(i,j)$th entry will be
      $$a_i1b_1j + a_i2b_2j + cdots + a_inb_nj.$$
      So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $Tcirc S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $mtimes n$ with entries $a_ij$ (let's call it $A$); and the matrix of $S$ is $ntimes p$ with entries $b_rs$ (let's call it $B$), then the matrix of $Tcirc S$ (let's call it $Acirc B$ or $AB$) is $mtimes p$ and with entries $c_kell$, where
      $$c_kell = a_k1b_1ell + a_k2b_2ell + cdots + a_knb_nell$$
      by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.



      In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1times n$ matrix consisting of the $i$th row of $A$, with the $ntimes 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $langlemathbfx,mathbfyrangle =mathrmtrace(overlinemathbfy^tmathbfx)$).



      And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(alpha B) = alpha(AB) = (alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).



      So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.



      Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $mathbfxcdotmathbfy=0$ only tells you that $mathbfx$ and $mathbfy$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.



      Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $mathbbR^n$ to itself, and which are one-to-one and onto. There are lots of linear transfomrations that aren't like that. For example, the linear transformation $D$ from $mathbbR^3$ to $mathbbR^2$ defined by
      $$Dleft(beginarrayc
      a\b\cendarrayright) = left(beginarraycb\2cendarrayright)$$
      is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2times 3$, and is
      $$left(beginarraycc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright).$$
      Now consider the linear transformation $UcolonmathbbR^2tomathbbR^2$ given by
      $$Uleft(beginarraycx\yendarrayright) = left(beginarrayc3x+2y\
      9x + 6yendarrayright).$$
      Again, this is not a "change in perspective", because the vector $binom2-3$ is mapped to $binom00$. It has a matrix, $2times 2$, which is
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright).$$
      So the composition $Ucirc T$ has matrix:
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright) left(beginarrayccc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright) = left(beginarrayccc
      0 & 3 & 4\
      0 & 9 & 12
      endarrayright),$$
      which tells me that
      $$Ucirc Tleft(beginarraycx\y\zendarrayright) = left(beginarrayc 3y + 4z\ 9y+12zendarrayright).$$



      Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $mtimes n$ matrix times a $ptimes q$ matrix and gives an $mptimes nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.






      share|cite|improve this answer

















      • 11




        Wow. I would have liked to give +2 rep for this, but that's not possible...
        – Gottfried Helms
        Mar 1 '11 at 21:02






      • 5




        an amazing answer. this is better than some textbooks or internet resources
        – Vass
        Mar 2 '11 at 12:53










      • is a linear combination also a linear transform?
        – Vass
        Mar 2 '11 at 14:57






      • 1




        @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
        – Rahul
        Apr 8 '11 at 17:13






      • 1




        Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
        – Pramod
        Jan 17 '15 at 6:02














      up vote
      95
      down vote



      accepted










      Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.



      For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.



      Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.



      Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $Tcolon mathbbR^ntomathbbR^m$. Since linear transformations satisfy $T(alphamathbfx+betamathbfy) = alpha T(mathbfx)+beta T(mathbfy)$, if you know the value of $T$ at each of $mathbfe_1,ldots,mathbfe_n$, where $mathbfe^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $mathbbR^n$.



      So in order to describe the value of $T$, I just need to tell you what $T(mathbfe_i)$ is. For example, we can take
      $$T(mathbfe_i) = left(beginarrayca_1i\a_2i\ vdots\ a_miendarrayright).$$
      Then, since
      $$left(beginarrayck_1\k_2\ vdots\k_nendarrayright) = k_1mathbfe_1 + cdots +k_nmathbfe_n,$$ we have
      $$Tleft(beginarrayck_1\k_2\ vdots\ k_nendarrayright) = k_1T(mathbfe_1) + cdots +k_nT(mathbfe_n) = k_1left(beginarrayca_11\a_21\ vdots\a_m1endarrayright) + cdots + k_nleft(beginarrayca_1n\a_2n\ vdots\ a_mnendarrayright).$$



      It is very fruitful, then to keep track of the $a_ij$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright).$$
      If we want to keep track of $T$ this way, then for an arbitrary vector $mathbfx = (x_1,ldots,x_n)^t$ (the $^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(mathbfx)$ corresponds to:
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright) left(beginarrayc
      x_1\x_2\ vdots\ x_nendarrayright) = left(beginarrayc
      a_11x_1 + a_12x_2 + cdots + a_1nx_n\
      a_21x_1 + a_22x_2 + cdots + a_2nx_n\
      vdots\
      a_m1x_1 + a_m2x_2 + cdots + a_mnx_n
      endarrayright).$$



      What happens when we have two linear transformations, $Tcolon mathbbR^ntomathbbR^m$ and $ScolonmathbbR^ptomathbbR^n$? If $T$ corresponds as above to a certain $mtimes n$ matrix, then $S$ will likewise correspond to a certain $ntimes p$ matrix, say
      $$left(beginarraycccc
      b_11 & b_12 & cdots & b_1p\
      b_21 & b_22 & cdots & b_2p\
      vdots & vdots & ddots & vdots\
      b_n1 & b_n2 & cdots & b_np
      endarrayright).$$
      What is $Tcirc S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $mathbbR^p$ to $mathbbR^m$, so it should correspond to an $mtimes p$ matrix. Which matrix? If we let $mathbff_1,ldots,mathbff_p$ be the (column) $p$-vectors given by letting $mathbff_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
      $$S(mathbff_j) = left(beginarraycb_1j\b_2j\ vdots \b_njendarrayright) = b_1jmathbfe_1+cdots + b_njmathbfe_n.$$



      So, what is $Tcirc S(mathbff_j)$? This is what goes in the $j$th column of the matrix that corresponds to $Tcirc S$. Evaluating, we have:
      beginalign*
      Tcirc S(mathbff_j) &= TBigl( S(mathbff_j)Bigr)\
      &= TBigl( b_1jmathbfe_1 + cdots + b_njmathbfe_nBigr)\
      &= b_1j T(mathbfe_1) + cdots + b_njT(mathbfe_n)\
      &= b_1jleft(beginarrayc
      a_11\ a_21\ vdots\ a_m1endarrayright) + cdots + b_njleft(beginarrayc a_1n\a_2n\ vdots\ a_mnendarrayright)\
      &= left(beginarrayc
      a_11b_1j + a_12b_2j + cdots + a_1nb_nj\
      a_21b_1j + a_22b_2j + cdots + a_2nb_nj\
      vdots\
      a_m1b_1j + a_m2b_2j + cdots + a_mnb_nj
      endarrayright).
      endalign*
      So if we want to write down the matrix that corresponds to $Tcirc S$, then the $(i,j)$th entry will be
      $$a_i1b_1j + a_i2b_2j + cdots + a_inb_nj.$$
      So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $Tcirc S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $mtimes n$ with entries $a_ij$ (let's call it $A$); and the matrix of $S$ is $ntimes p$ with entries $b_rs$ (let's call it $B$), then the matrix of $Tcirc S$ (let's call it $Acirc B$ or $AB$) is $mtimes p$ and with entries $c_kell$, where
      $$c_kell = a_k1b_1ell + a_k2b_2ell + cdots + a_knb_nell$$
      by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.



      In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1times n$ matrix consisting of the $i$th row of $A$, with the $ntimes 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $langlemathbfx,mathbfyrangle =mathrmtrace(overlinemathbfy^tmathbfx)$).



      And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(alpha B) = alpha(AB) = (alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).



      So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.



      Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $mathbfxcdotmathbfy=0$ only tells you that $mathbfx$ and $mathbfy$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.



      Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $mathbbR^n$ to itself, and which are one-to-one and onto. There are lots of linear transfomrations that aren't like that. For example, the linear transformation $D$ from $mathbbR^3$ to $mathbbR^2$ defined by
      $$Dleft(beginarrayc
      a\b\cendarrayright) = left(beginarraycb\2cendarrayright)$$
      is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2times 3$, and is
      $$left(beginarraycc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright).$$
      Now consider the linear transformation $UcolonmathbbR^2tomathbbR^2$ given by
      $$Uleft(beginarraycx\yendarrayright) = left(beginarrayc3x+2y\
      9x + 6yendarrayright).$$
      Again, this is not a "change in perspective", because the vector $binom2-3$ is mapped to $binom00$. It has a matrix, $2times 2$, which is
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright).$$
      So the composition $Ucirc T$ has matrix:
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright) left(beginarrayccc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright) = left(beginarrayccc
      0 & 3 & 4\
      0 & 9 & 12
      endarrayright),$$
      which tells me that
      $$Ucirc Tleft(beginarraycx\y\zendarrayright) = left(beginarrayc 3y + 4z\ 9y+12zendarrayright).$$



      Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $mtimes n$ matrix times a $ptimes q$ matrix and gives an $mptimes nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.






      share|cite|improve this answer

















      • 11




        Wow. I would have liked to give +2 rep for this, but that's not possible...
        – Gottfried Helms
        Mar 1 '11 at 21:02






      • 5




        an amazing answer. this is better than some textbooks or internet resources
        – Vass
        Mar 2 '11 at 12:53










      • is a linear combination also a linear transform?
        – Vass
        Mar 2 '11 at 14:57






      • 1




        @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
        – Rahul
        Apr 8 '11 at 17:13






      • 1




        Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
        – Pramod
        Jan 17 '15 at 6:02












      up vote
      95
      down vote



      accepted







      up vote
      95
      down vote



      accepted






      Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.



      For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.



      Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.



      Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $Tcolon mathbbR^ntomathbbR^m$. Since linear transformations satisfy $T(alphamathbfx+betamathbfy) = alpha T(mathbfx)+beta T(mathbfy)$, if you know the value of $T$ at each of $mathbfe_1,ldots,mathbfe_n$, where $mathbfe^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $mathbbR^n$.



      So in order to describe the value of $T$, I just need to tell you what $T(mathbfe_i)$ is. For example, we can take
      $$T(mathbfe_i) = left(beginarrayca_1i\a_2i\ vdots\ a_miendarrayright).$$
      Then, since
      $$left(beginarrayck_1\k_2\ vdots\k_nendarrayright) = k_1mathbfe_1 + cdots +k_nmathbfe_n,$$ we have
      $$Tleft(beginarrayck_1\k_2\ vdots\ k_nendarrayright) = k_1T(mathbfe_1) + cdots +k_nT(mathbfe_n) = k_1left(beginarrayca_11\a_21\ vdots\a_m1endarrayright) + cdots + k_nleft(beginarrayca_1n\a_2n\ vdots\ a_mnendarrayright).$$



      It is very fruitful, then to keep track of the $a_ij$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright).$$
      If we want to keep track of $T$ this way, then for an arbitrary vector $mathbfx = (x_1,ldots,x_n)^t$ (the $^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(mathbfx)$ corresponds to:
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright) left(beginarrayc
      x_1\x_2\ vdots\ x_nendarrayright) = left(beginarrayc
      a_11x_1 + a_12x_2 + cdots + a_1nx_n\
      a_21x_1 + a_22x_2 + cdots + a_2nx_n\
      vdots\
      a_m1x_1 + a_m2x_2 + cdots + a_mnx_n
      endarrayright).$$



      What happens when we have two linear transformations, $Tcolon mathbbR^ntomathbbR^m$ and $ScolonmathbbR^ptomathbbR^n$? If $T$ corresponds as above to a certain $mtimes n$ matrix, then $S$ will likewise correspond to a certain $ntimes p$ matrix, say
      $$left(beginarraycccc
      b_11 & b_12 & cdots & b_1p\
      b_21 & b_22 & cdots & b_2p\
      vdots & vdots & ddots & vdots\
      b_n1 & b_n2 & cdots & b_np
      endarrayright).$$
      What is $Tcirc S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $mathbbR^p$ to $mathbbR^m$, so it should correspond to an $mtimes p$ matrix. Which matrix? If we let $mathbff_1,ldots,mathbff_p$ be the (column) $p$-vectors given by letting $mathbff_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
      $$S(mathbff_j) = left(beginarraycb_1j\b_2j\ vdots \b_njendarrayright) = b_1jmathbfe_1+cdots + b_njmathbfe_n.$$



      So, what is $Tcirc S(mathbff_j)$? This is what goes in the $j$th column of the matrix that corresponds to $Tcirc S$. Evaluating, we have:
      beginalign*
      Tcirc S(mathbff_j) &= TBigl( S(mathbff_j)Bigr)\
      &= TBigl( b_1jmathbfe_1 + cdots + b_njmathbfe_nBigr)\
      &= b_1j T(mathbfe_1) + cdots + b_njT(mathbfe_n)\
      &= b_1jleft(beginarrayc
      a_11\ a_21\ vdots\ a_m1endarrayright) + cdots + b_njleft(beginarrayc a_1n\a_2n\ vdots\ a_mnendarrayright)\
      &= left(beginarrayc
      a_11b_1j + a_12b_2j + cdots + a_1nb_nj\
      a_21b_1j + a_22b_2j + cdots + a_2nb_nj\
      vdots\
      a_m1b_1j + a_m2b_2j + cdots + a_mnb_nj
      endarrayright).
      endalign*
      So if we want to write down the matrix that corresponds to $Tcirc S$, then the $(i,j)$th entry will be
      $$a_i1b_1j + a_i2b_2j + cdots + a_inb_nj.$$
      So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $Tcirc S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $mtimes n$ with entries $a_ij$ (let's call it $A$); and the matrix of $S$ is $ntimes p$ with entries $b_rs$ (let's call it $B$), then the matrix of $Tcirc S$ (let's call it $Acirc B$ or $AB$) is $mtimes p$ and with entries $c_kell$, where
      $$c_kell = a_k1b_1ell + a_k2b_2ell + cdots + a_knb_nell$$
      by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.



      In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1times n$ matrix consisting of the $i$th row of $A$, with the $ntimes 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $langlemathbfx,mathbfyrangle =mathrmtrace(overlinemathbfy^tmathbfx)$).



      And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(alpha B) = alpha(AB) = (alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).



      So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.



      Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $mathbfxcdotmathbfy=0$ only tells you that $mathbfx$ and $mathbfy$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.



      Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $mathbbR^n$ to itself, and which are one-to-one and onto. There are lots of linear transfomrations that aren't like that. For example, the linear transformation $D$ from $mathbbR^3$ to $mathbbR^2$ defined by
      $$Dleft(beginarrayc
      a\b\cendarrayright) = left(beginarraycb\2cendarrayright)$$
      is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2times 3$, and is
      $$left(beginarraycc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright).$$
      Now consider the linear transformation $UcolonmathbbR^2tomathbbR^2$ given by
      $$Uleft(beginarraycx\yendarrayright) = left(beginarrayc3x+2y\
      9x + 6yendarrayright).$$
      Again, this is not a "change in perspective", because the vector $binom2-3$ is mapped to $binom00$. It has a matrix, $2times 2$, which is
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright).$$
      So the composition $Ucirc T$ has matrix:
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright) left(beginarrayccc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright) = left(beginarrayccc
      0 & 3 & 4\
      0 & 9 & 12
      endarrayright),$$
      which tells me that
      $$Ucirc Tleft(beginarraycx\y\zendarrayright) = left(beginarrayc 3y + 4z\ 9y+12zendarrayright).$$



      Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $mtimes n$ matrix times a $ptimes q$ matrix and gives an $mptimes nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.






      share|cite|improve this answer













      Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.



      For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.



      Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.



      Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $Tcolon mathbbR^ntomathbbR^m$. Since linear transformations satisfy $T(alphamathbfx+betamathbfy) = alpha T(mathbfx)+beta T(mathbfy)$, if you know the value of $T$ at each of $mathbfe_1,ldots,mathbfe_n$, where $mathbfe^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $mathbbR^n$.



      So in order to describe the value of $T$, I just need to tell you what $T(mathbfe_i)$ is. For example, we can take
      $$T(mathbfe_i) = left(beginarrayca_1i\a_2i\ vdots\ a_miendarrayright).$$
      Then, since
      $$left(beginarrayck_1\k_2\ vdots\k_nendarrayright) = k_1mathbfe_1 + cdots +k_nmathbfe_n,$$ we have
      $$Tleft(beginarrayck_1\k_2\ vdots\ k_nendarrayright) = k_1T(mathbfe_1) + cdots +k_nT(mathbfe_n) = k_1left(beginarrayca_11\a_21\ vdots\a_m1endarrayright) + cdots + k_nleft(beginarrayca_1n\a_2n\ vdots\ a_mnendarrayright).$$



      It is very fruitful, then to keep track of the $a_ij$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright).$$
      If we want to keep track of $T$ this way, then for an arbitrary vector $mathbfx = (x_1,ldots,x_n)^t$ (the $^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(mathbfx)$ corresponds to:
      $$left(beginarraycccc
      a_11 & a_12 & cdots & a_1n\
      a_21 & a_22 & cdots & a_2n\
      vdots & vdots & ddots & vdots\
      a_m1 & a_m2 & cdots & a_mn
      endarrayright) left(beginarrayc
      x_1\x_2\ vdots\ x_nendarrayright) = left(beginarrayc
      a_11x_1 + a_12x_2 + cdots + a_1nx_n\
      a_21x_1 + a_22x_2 + cdots + a_2nx_n\
      vdots\
      a_m1x_1 + a_m2x_2 + cdots + a_mnx_n
      endarrayright).$$



      What happens when we have two linear transformations, $Tcolon mathbbR^ntomathbbR^m$ and $ScolonmathbbR^ptomathbbR^n$? If $T$ corresponds as above to a certain $mtimes n$ matrix, then $S$ will likewise correspond to a certain $ntimes p$ matrix, say
      $$left(beginarraycccc
      b_11 & b_12 & cdots & b_1p\
      b_21 & b_22 & cdots & b_2p\
      vdots & vdots & ddots & vdots\
      b_n1 & b_n2 & cdots & b_np
      endarrayright).$$
      What is $Tcirc S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $mathbbR^p$ to $mathbbR^m$, so it should correspond to an $mtimes p$ matrix. Which matrix? If we let $mathbff_1,ldots,mathbff_p$ be the (column) $p$-vectors given by letting $mathbff_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
      $$S(mathbff_j) = left(beginarraycb_1j\b_2j\ vdots \b_njendarrayright) = b_1jmathbfe_1+cdots + b_njmathbfe_n.$$



      So, what is $Tcirc S(mathbff_j)$? This is what goes in the $j$th column of the matrix that corresponds to $Tcirc S$. Evaluating, we have:
      beginalign*
      Tcirc S(mathbff_j) &= TBigl( S(mathbff_j)Bigr)\
      &= TBigl( b_1jmathbfe_1 + cdots + b_njmathbfe_nBigr)\
      &= b_1j T(mathbfe_1) + cdots + b_njT(mathbfe_n)\
      &= b_1jleft(beginarrayc
      a_11\ a_21\ vdots\ a_m1endarrayright) + cdots + b_njleft(beginarrayc a_1n\a_2n\ vdots\ a_mnendarrayright)\
      &= left(beginarrayc
      a_11b_1j + a_12b_2j + cdots + a_1nb_nj\
      a_21b_1j + a_22b_2j + cdots + a_2nb_nj\
      vdots\
      a_m1b_1j + a_m2b_2j + cdots + a_mnb_nj
      endarrayright).
      endalign*
      So if we want to write down the matrix that corresponds to $Tcirc S$, then the $(i,j)$th entry will be
      $$a_i1b_1j + a_i2b_2j + cdots + a_inb_nj.$$
      So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $Tcirc S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $mtimes n$ with entries $a_ij$ (let's call it $A$); and the matrix of $S$ is $ntimes p$ with entries $b_rs$ (let's call it $B$), then the matrix of $Tcirc S$ (let's call it $Acirc B$ or $AB$) is $mtimes p$ and with entries $c_kell$, where
      $$c_kell = a_k1b_1ell + a_k2b_2ell + cdots + a_knb_nell$$
      by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.



      In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1times n$ matrix consisting of the $i$th row of $A$, with the $ntimes 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $langlemathbfx,mathbfyrangle =mathrmtrace(overlinemathbfy^tmathbfx)$).



      And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(alpha B) = alpha(AB) = (alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).



      So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.



      Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $mathbfxcdotmathbfy=0$ only tells you that $mathbfx$ and $mathbfy$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.



      Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $mathbbR^n$ to itself, and which are one-to-one and onto. There are lots of linear transfomrations that aren't like that. For example, the linear transformation $D$ from $mathbbR^3$ to $mathbbR^2$ defined by
      $$Dleft(beginarrayc
      a\b\cendarrayright) = left(beginarraycb\2cendarrayright)$$
      is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2times 3$, and is
      $$left(beginarraycc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright).$$
      Now consider the linear transformation $UcolonmathbbR^2tomathbbR^2$ given by
      $$Uleft(beginarraycx\yendarrayright) = left(beginarrayc3x+2y\
      9x + 6yendarrayright).$$
      Again, this is not a "change in perspective", because the vector $binom2-3$ is mapped to $binom00$. It has a matrix, $2times 2$, which is
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright).$$
      So the composition $Ucirc T$ has matrix:
      $$left(beginarraycc
      3 & 2\
      9 & 6
      endarrayright) left(beginarrayccc
      0 & 1 & 0\
      0 & 0 & 2
      endarrayright) = left(beginarrayccc
      0 & 3 & 4\
      0 & 9 & 12
      endarrayright),$$
      which tells me that
      $$Ucirc Tleft(beginarraycx\y\zendarrayright) = left(beginarrayc 3y + 4z\ 9y+12zendarrayright).$$



      Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $mtimes n$ matrix times a $ptimes q$ matrix and gives an $mptimes nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.







      share|cite|improve this answer













      share|cite|improve this answer



      share|cite|improve this answer











      answered Mar 1 '11 at 20:32









      Arturo Magidin

      253k31566885




      253k31566885







      • 11




        Wow. I would have liked to give +2 rep for this, but that's not possible...
        – Gottfried Helms
        Mar 1 '11 at 21:02






      • 5




        an amazing answer. this is better than some textbooks or internet resources
        – Vass
        Mar 2 '11 at 12:53










      • is a linear combination also a linear transform?
        – Vass
        Mar 2 '11 at 14:57






      • 1




        @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
        – Rahul
        Apr 8 '11 at 17:13






      • 1




        Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
        – Pramod
        Jan 17 '15 at 6:02












      • 11




        Wow. I would have liked to give +2 rep for this, but that's not possible...
        – Gottfried Helms
        Mar 1 '11 at 21:02






      • 5




        an amazing answer. this is better than some textbooks or internet resources
        – Vass
        Mar 2 '11 at 12:53










      • is a linear combination also a linear transform?
        – Vass
        Mar 2 '11 at 14:57






      • 1




        @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
        – Rahul
        Apr 8 '11 at 17:13






      • 1




        Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
        – Pramod
        Jan 17 '15 at 6:02







      11




      11




      Wow. I would have liked to give +2 rep for this, but that's not possible...
      – Gottfried Helms
      Mar 1 '11 at 21:02




      Wow. I would have liked to give +2 rep for this, but that's not possible...
      – Gottfried Helms
      Mar 1 '11 at 21:02




      5




      5




      an amazing answer. this is better than some textbooks or internet resources
      – Vass
      Mar 2 '11 at 12:53




      an amazing answer. this is better than some textbooks or internet resources
      – Vass
      Mar 2 '11 at 12:53












      is a linear combination also a linear transform?
      – Vass
      Mar 2 '11 at 14:57




      is a linear combination also a linear transform?
      – Vass
      Mar 2 '11 at 14:57




      1




      1




      @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
      – Rahul
      Apr 8 '11 at 17:13




      @Vass, as you clearly seem to be happy with this answer, you should "accept" it as best answer by clicking the tick/check mark on the left of it.
      – Rahul
      Apr 8 '11 at 17:13




      1




      1




      Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
      – Pramod
      Jan 17 '15 at 6:02




      Khan Academy has a more beginner friendly video of the same explanation at khanacademy.org/math/linear-algebra/matrix_transformations/…
      – Pramod
      Jan 17 '15 at 6:02










      up vote
      10
      down vote













      I think part of the problem people have with getting used to linear transformations vs. matrices is that they have probably never seen an example of a linear transformation defined without reference to a matrix or a basis. So here is such an example. Let $V$ be the vector space of real polynomials of degree at most $3$, and let $f : V to V$ be the derivative.



      $V$ does not come equipped with a natural choice of basis. You might argue that $ 1, x, x^2, x^3 $ is natural, but it's only convenient: there's no reason to privilege this basis over $ 1, (x+c), (x+c)^2, (x+c)^3 $ for any $c in mathbbR$ (and, depending on what my definitions are, it is literally impossible to do so). More generally, $ a_0(x), a_1(x), a_2(x), a_3(x) $ is a basis for any collection of polynomials $a_i$ of degree $i$.



      $V$ also does not come equipped with a natural choice of dot product, so there's no way to include those in the discussion without making an arbitrary choice. It really is just a vector space equipped with a linear transformation.



      Since we want to talk about composition, let's write down a second linear transformation. $g : V to V$ will send a polynomial $p(x)$ to the polynomial $p(x + 1)$. Note that, once again, I do not need to refer to a basis to define $g$.



      Then the abstract composition $gf : V to V$ is well-defined; it sends a polynomial $p(x)$ to the polynomial $p'(x + 1)$. I don't need to refer to a basis or multiply any matrices to see this; all I am doing is composing two functions.



      Now let's do everything in a particular basis to see that we get the same answer using the correct and natural definition of matrix multiplication. We'll use the basis $ 1, x, x^2, x^3 $. In this basis $f$ has matrix



      $$left[ beginarraycccc 0 & 1 & 0 & 0 \
      0 & 0 & 2 & 0 \
      0 & 0 & 0 & 3 \
      0 & 0 & 0 & 0 endarray right]$$



      and $g$ has matrix



      $$left[ beginarraycccc 1 & 1 & 1 & 1 \
      0 & 1 & 2 & 3 \
      0 & 0 & 1 & 3 \
      0 & 0 & 0 & 1 endarray right].$$



      Now I encourage you to go through all the generalities in Arturo's post in this example to verify that $gf$ has the matrix it is supposed to have.






      share|cite|improve this answer

























        up vote
        10
        down vote













        I think part of the problem people have with getting used to linear transformations vs. matrices is that they have probably never seen an example of a linear transformation defined without reference to a matrix or a basis. So here is such an example. Let $V$ be the vector space of real polynomials of degree at most $3$, and let $f : V to V$ be the derivative.



        $V$ does not come equipped with a natural choice of basis. You might argue that $ 1, x, x^2, x^3 $ is natural, but it's only convenient: there's no reason to privilege this basis over $ 1, (x+c), (x+c)^2, (x+c)^3 $ for any $c in mathbbR$ (and, depending on what my definitions are, it is literally impossible to do so). More generally, $ a_0(x), a_1(x), a_2(x), a_3(x) $ is a basis for any collection of polynomials $a_i$ of degree $i$.



        $V$ also does not come equipped with a natural choice of dot product, so there's no way to include those in the discussion without making an arbitrary choice. It really is just a vector space equipped with a linear transformation.



        Since we want to talk about composition, let's write down a second linear transformation. $g : V to V$ will send a polynomial $p(x)$ to the polynomial $p(x + 1)$. Note that, once again, I do not need to refer to a basis to define $g$.



        Then the abstract composition $gf : V to V$ is well-defined; it sends a polynomial $p(x)$ to the polynomial $p'(x + 1)$. I don't need to refer to a basis or multiply any matrices to see this; all I am doing is composing two functions.



        Now let's do everything in a particular basis to see that we get the same answer using the correct and natural definition of matrix multiplication. We'll use the basis $ 1, x, x^2, x^3 $. In this basis $f$ has matrix



        $$left[ beginarraycccc 0 & 1 & 0 & 0 \
        0 & 0 & 2 & 0 \
        0 & 0 & 0 & 3 \
        0 & 0 & 0 & 0 endarray right]$$



        and $g$ has matrix



        $$left[ beginarraycccc 1 & 1 & 1 & 1 \
        0 & 1 & 2 & 3 \
        0 & 0 & 1 & 3 \
        0 & 0 & 0 & 1 endarray right].$$



        Now I encourage you to go through all the generalities in Arturo's post in this example to verify that $gf$ has the matrix it is supposed to have.






        share|cite|improve this answer























          up vote
          10
          down vote










          up vote
          10
          down vote









          I think part of the problem people have with getting used to linear transformations vs. matrices is that they have probably never seen an example of a linear transformation defined without reference to a matrix or a basis. So here is such an example. Let $V$ be the vector space of real polynomials of degree at most $3$, and let $f : V to V$ be the derivative.



          $V$ does not come equipped with a natural choice of basis. You might argue that $ 1, x, x^2, x^3 $ is natural, but it's only convenient: there's no reason to privilege this basis over $ 1, (x+c), (x+c)^2, (x+c)^3 $ for any $c in mathbbR$ (and, depending on what my definitions are, it is literally impossible to do so). More generally, $ a_0(x), a_1(x), a_2(x), a_3(x) $ is a basis for any collection of polynomials $a_i$ of degree $i$.



          $V$ also does not come equipped with a natural choice of dot product, so there's no way to include those in the discussion without making an arbitrary choice. It really is just a vector space equipped with a linear transformation.



          Since we want to talk about composition, let's write down a second linear transformation. $g : V to V$ will send a polynomial $p(x)$ to the polynomial $p(x + 1)$. Note that, once again, I do not need to refer to a basis to define $g$.



          Then the abstract composition $gf : V to V$ is well-defined; it sends a polynomial $p(x)$ to the polynomial $p'(x + 1)$. I don't need to refer to a basis or multiply any matrices to see this; all I am doing is composing two functions.



          Now let's do everything in a particular basis to see that we get the same answer using the correct and natural definition of matrix multiplication. We'll use the basis $ 1, x, x^2, x^3 $. In this basis $f$ has matrix



          $$left[ beginarraycccc 0 & 1 & 0 & 0 \
          0 & 0 & 2 & 0 \
          0 & 0 & 0 & 3 \
          0 & 0 & 0 & 0 endarray right]$$



          and $g$ has matrix



          $$left[ beginarraycccc 1 & 1 & 1 & 1 \
          0 & 1 & 2 & 3 \
          0 & 0 & 1 & 3 \
          0 & 0 & 0 & 1 endarray right].$$



          Now I encourage you to go through all the generalities in Arturo's post in this example to verify that $gf$ has the matrix it is supposed to have.






          share|cite|improve this answer













          I think part of the problem people have with getting used to linear transformations vs. matrices is that they have probably never seen an example of a linear transformation defined without reference to a matrix or a basis. So here is such an example. Let $V$ be the vector space of real polynomials of degree at most $3$, and let $f : V to V$ be the derivative.



          $V$ does not come equipped with a natural choice of basis. You might argue that $ 1, x, x^2, x^3 $ is natural, but it's only convenient: there's no reason to privilege this basis over $ 1, (x+c), (x+c)^2, (x+c)^3 $ for any $c in mathbbR$ (and, depending on what my definitions are, it is literally impossible to do so). More generally, $ a_0(x), a_1(x), a_2(x), a_3(x) $ is a basis for any collection of polynomials $a_i$ of degree $i$.



          $V$ also does not come equipped with a natural choice of dot product, so there's no way to include those in the discussion without making an arbitrary choice. It really is just a vector space equipped with a linear transformation.



          Since we want to talk about composition, let's write down a second linear transformation. $g : V to V$ will send a polynomial $p(x)$ to the polynomial $p(x + 1)$. Note that, once again, I do not need to refer to a basis to define $g$.



          Then the abstract composition $gf : V to V$ is well-defined; it sends a polynomial $p(x)$ to the polynomial $p'(x + 1)$. I don't need to refer to a basis or multiply any matrices to see this; all I am doing is composing two functions.



          Now let's do everything in a particular basis to see that we get the same answer using the correct and natural definition of matrix multiplication. We'll use the basis $ 1, x, x^2, x^3 $. In this basis $f$ has matrix



          $$left[ beginarraycccc 0 & 1 & 0 & 0 \
          0 & 0 & 2 & 0 \
          0 & 0 & 0 & 3 \
          0 & 0 & 0 & 0 endarray right]$$



          and $g$ has matrix



          $$left[ beginarraycccc 1 & 1 & 1 & 1 \
          0 & 1 & 2 & 3 \
          0 & 0 & 1 & 3 \
          0 & 0 & 0 & 1 endarray right].$$



          Now I encourage you to go through all the generalities in Arturo's post in this example to verify that $gf$ has the matrix it is supposed to have.







          share|cite|improve this answer













          share|cite|improve this answer



          share|cite|improve this answer











          answered Mar 1 '11 at 21:16









          Qiaochu Yuan

          268k32563899




          268k32563899




















              up vote
              0
              down vote













              Let $T:mathsfVtomathsfW$ a linear transformation and $beta=v_1,v_2, gamma=w_1,w_2$ are bases of $mathsfV, mathsfW$ respectively. The value of interest is



              $$T(v).$$



              Let $v=xv_1+yv_2$, then



              $$beginalign
              T(v)&=T(xv_1+yv_2)\
              &=xT(v_1)+yT(v_2).
              endalign$$



              No matter what value of $v$ is, $T(v_1),T(v_2)$ are needed, the notation can be simplified. Let



              $$T(v_1)=aw_1+bw_2,\
              T(v_2)=cw_1+dw_2,$$



              represent $T(v_1), T(v_2)$ in columns



              $$
              beginarrayll
              T(v_1) & T(v_2)\
              aw_1 & cw_1\
              + & +\
              bw_2 & dw_2\
              endarray
              $$



              Put $w_1, w_2$ on the left side as a note and omit the plus signs



              $$
              beginarraylll
              & T(v_1) & T(v_2) \
              w_1 & a & c \
              w_2 & b & d \
              endarray
              $$



              Since $T(v)=xT(v_1)+yT(v_2)$



              $$
              beginarray
              & x & y \
              & T(v_1) +& T(v_2) = & T(v) \
              w_1 & a & c & e \
              w_2 & b & d & f \
              endarray
              $$



              An $colorblueoperation$ can be defined such that



              $$
              e=colorbluexa+colorblueyc\
              f=colorbluexb+colorblueyd
              $$



              that is



              $$
              beginbmatrixa & c\b & dendbmatrix
              colorblueoper.
              beginbmatrixx\yendbmatrix
              =
              beginbmatrixe\fendbmatrix,
              $$



              The order $w_1, w_2$ are listed is associated to this notation, so the idea of ordered basis is required to denote the linear transformation matrix



              $$large[T]_beta^gamma$$



              which means $T$ only accept $beginbmatrixx\yendbmatrix$ in basis $beta$, or in other words



              $$Large[v]_beta$$



              coordinate vector relative to $beta$.



              The $colorblueoperation$ is



              $$large[T(v)]_gamma = [T]_beta^gamma colorblueLargecdot [v]_beta$$



              --



              Since that coordinate vector $large[v]_beta$ can be seen as one of the columns of another linear transformation matrix $large[U]_alpha^beta$, the composition of $large[T]_beta^gamma$ and $large[U]_alpha^beta$



              $$beginalign
              large[T([U]_alpha^beta)]_gamma &= large[T( [overbraceu_1,u_2,dots,u_n^lVertalpharVert]_beta )]_gamma\
              &=large[T( U )]_alpha^gamma
              endalign,$$



              Hope this help.






              share|cite|improve this answer



























                up vote
                0
                down vote













                Let $T:mathsfVtomathsfW$ a linear transformation and $beta=v_1,v_2, gamma=w_1,w_2$ are bases of $mathsfV, mathsfW$ respectively. The value of interest is



                $$T(v).$$



                Let $v=xv_1+yv_2$, then



                $$beginalign
                T(v)&=T(xv_1+yv_2)\
                &=xT(v_1)+yT(v_2).
                endalign$$



                No matter what value of $v$ is, $T(v_1),T(v_2)$ are needed, the notation can be simplified. Let



                $$T(v_1)=aw_1+bw_2,\
                T(v_2)=cw_1+dw_2,$$



                represent $T(v_1), T(v_2)$ in columns



                $$
                beginarrayll
                T(v_1) & T(v_2)\
                aw_1 & cw_1\
                + & +\
                bw_2 & dw_2\
                endarray
                $$



                Put $w_1, w_2$ on the left side as a note and omit the plus signs



                $$
                beginarraylll
                & T(v_1) & T(v_2) \
                w_1 & a & c \
                w_2 & b & d \
                endarray
                $$



                Since $T(v)=xT(v_1)+yT(v_2)$



                $$
                beginarray
                & x & y \
                & T(v_1) +& T(v_2) = & T(v) \
                w_1 & a & c & e \
                w_2 & b & d & f \
                endarray
                $$



                An $colorblueoperation$ can be defined such that



                $$
                e=colorbluexa+colorblueyc\
                f=colorbluexb+colorblueyd
                $$



                that is



                $$
                beginbmatrixa & c\b & dendbmatrix
                colorblueoper.
                beginbmatrixx\yendbmatrix
                =
                beginbmatrixe\fendbmatrix,
                $$



                The order $w_1, w_2$ are listed is associated to this notation, so the idea of ordered basis is required to denote the linear transformation matrix



                $$large[T]_beta^gamma$$



                which means $T$ only accept $beginbmatrixx\yendbmatrix$ in basis $beta$, or in other words



                $$Large[v]_beta$$



                coordinate vector relative to $beta$.



                The $colorblueoperation$ is



                $$large[T(v)]_gamma = [T]_beta^gamma colorblueLargecdot [v]_beta$$



                --



                Since that coordinate vector $large[v]_beta$ can be seen as one of the columns of another linear transformation matrix $large[U]_alpha^beta$, the composition of $large[T]_beta^gamma$ and $large[U]_alpha^beta$



                $$beginalign
                large[T([U]_alpha^beta)]_gamma &= large[T( [overbraceu_1,u_2,dots,u_n^lVertalpharVert]_beta )]_gamma\
                &=large[T( U )]_alpha^gamma
                endalign,$$



                Hope this help.






                share|cite|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Let $T:mathsfVtomathsfW$ a linear transformation and $beta=v_1,v_2, gamma=w_1,w_2$ are bases of $mathsfV, mathsfW$ respectively. The value of interest is



                  $$T(v).$$



                  Let $v=xv_1+yv_2$, then



                  $$beginalign
                  T(v)&=T(xv_1+yv_2)\
                  &=xT(v_1)+yT(v_2).
                  endalign$$



                  No matter what value of $v$ is, $T(v_1),T(v_2)$ are needed, the notation can be simplified. Let



                  $$T(v_1)=aw_1+bw_2,\
                  T(v_2)=cw_1+dw_2,$$



                  represent $T(v_1), T(v_2)$ in columns



                  $$
                  beginarrayll
                  T(v_1) & T(v_2)\
                  aw_1 & cw_1\
                  + & +\
                  bw_2 & dw_2\
                  endarray
                  $$



                  Put $w_1, w_2$ on the left side as a note and omit the plus signs



                  $$
                  beginarraylll
                  & T(v_1) & T(v_2) \
                  w_1 & a & c \
                  w_2 & b & d \
                  endarray
                  $$



                  Since $T(v)=xT(v_1)+yT(v_2)$



                  $$
                  beginarray
                  & x & y \
                  & T(v_1) +& T(v_2) = & T(v) \
                  w_1 & a & c & e \
                  w_2 & b & d & f \
                  endarray
                  $$



                  An $colorblueoperation$ can be defined such that



                  $$
                  e=colorbluexa+colorblueyc\
                  f=colorbluexb+colorblueyd
                  $$



                  that is



                  $$
                  beginbmatrixa & c\b & dendbmatrix
                  colorblueoper.
                  beginbmatrixx\yendbmatrix
                  =
                  beginbmatrixe\fendbmatrix,
                  $$



                  The order $w_1, w_2$ are listed is associated to this notation, so the idea of ordered basis is required to denote the linear transformation matrix



                  $$large[T]_beta^gamma$$



                  which means $T$ only accept $beginbmatrixx\yendbmatrix$ in basis $beta$, or in other words



                  $$Large[v]_beta$$



                  coordinate vector relative to $beta$.



                  The $colorblueoperation$ is



                  $$large[T(v)]_gamma = [T]_beta^gamma colorblueLargecdot [v]_beta$$



                  --



                  Since that coordinate vector $large[v]_beta$ can be seen as one of the columns of another linear transformation matrix $large[U]_alpha^beta$, the composition of $large[T]_beta^gamma$ and $large[U]_alpha^beta$



                  $$beginalign
                  large[T([U]_alpha^beta)]_gamma &= large[T( [overbraceu_1,u_2,dots,u_n^lVertalpharVert]_beta )]_gamma\
                  &=large[T( U )]_alpha^gamma
                  endalign,$$



                  Hope this help.






                  share|cite|improve this answer















                  Let $T:mathsfVtomathsfW$ a linear transformation and $beta=v_1,v_2, gamma=w_1,w_2$ are bases of $mathsfV, mathsfW$ respectively. The value of interest is



                  $$T(v).$$



                  Let $v=xv_1+yv_2$, then



                  $$beginalign
                  T(v)&=T(xv_1+yv_2)\
                  &=xT(v_1)+yT(v_2).
                  endalign$$



                  No matter what value of $v$ is, $T(v_1),T(v_2)$ are needed, the notation can be simplified. Let



                  $$T(v_1)=aw_1+bw_2,\
                  T(v_2)=cw_1+dw_2,$$



                  represent $T(v_1), T(v_2)$ in columns



                  $$
                  beginarrayll
                  T(v_1) & T(v_2)\
                  aw_1 & cw_1\
                  + & +\
                  bw_2 & dw_2\
                  endarray
                  $$



                  Put $w_1, w_2$ on the left side as a note and omit the plus signs



                  $$
                  beginarraylll
                  & T(v_1) & T(v_2) \
                  w_1 & a & c \
                  w_2 & b & d \
                  endarray
                  $$



                  Since $T(v)=xT(v_1)+yT(v_2)$



                  $$
                  beginarray
                  & x & y \
                  & T(v_1) +& T(v_2) = & T(v) \
                  w_1 & a & c & e \
                  w_2 & b & d & f \
                  endarray
                  $$



                  An $colorblueoperation$ can be defined such that



                  $$
                  e=colorbluexa+colorblueyc\
                  f=colorbluexb+colorblueyd
                  $$



                  that is



                  $$
                  beginbmatrixa & c\b & dendbmatrix
                  colorblueoper.
                  beginbmatrixx\yendbmatrix
                  =
                  beginbmatrixe\fendbmatrix,
                  $$



                  The order $w_1, w_2$ are listed is associated to this notation, so the idea of ordered basis is required to denote the linear transformation matrix



                  $$large[T]_beta^gamma$$



                  which means $T$ only accept $beginbmatrixx\yendbmatrix$ in basis $beta$, or in other words



                  $$Large[v]_beta$$



                  coordinate vector relative to $beta$.



                  The $colorblueoperation$ is



                  $$large[T(v)]_gamma = [T]_beta^gamma colorblueLargecdot [v]_beta$$



                  --



                  Since that coordinate vector $large[v]_beta$ can be seen as one of the columns of another linear transformation matrix $large[U]_alpha^beta$, the composition of $large[T]_beta^gamma$ and $large[U]_alpha^beta$



                  $$beginalign
                  large[T([U]_alpha^beta)]_gamma &= large[T( [overbraceu_1,u_2,dots,u_n^lVertalpharVert]_beta )]_gamma\
                  &=large[T( U )]_alpha^gamma
                  endalign,$$



                  Hope this help.







                  share|cite|improve this answer















                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jul 3 at 3:38


























                  answered Apr 12 at 16:03









                  Niing

                  1,1181419




                  1,1181419






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f24456%2fmatrix-multiplication-interpreting-and-understanding-the-process%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      Comments

                      Popular posts from this blog

                      What is the equation of a 3D cone with generalised tilt?

                      Relationship between determinant of matrix and determinant of adjoint?

                      Color the edges and diagonals of a regular polygon