Markov Chain question?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
4
down vote

favorite












Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.



Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?



And with this there is an observation that said:



Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where



$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$



So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?



Could someone explain this construction or definition of Markov chain pls.



Thanks for your help and time everyone.







share|cite|improve this question



















  • This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
    – Math1000
    Jul 24 at 2:04











  • This seems equivalent to a hidden Markov chain.
    – Did
    Jul 24 at 15:32














up vote
4
down vote

favorite












Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.



Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?



And with this there is an observation that said:



Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where



$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$



So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?



Could someone explain this construction or definition of Markov chain pls.



Thanks for your help and time everyone.







share|cite|improve this question



















  • This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
    – Math1000
    Jul 24 at 2:04











  • This seems equivalent to a hidden Markov chain.
    – Did
    Jul 24 at 15:32












up vote
4
down vote

favorite









up vote
4
down vote

favorite











Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.



Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?



And with this there is an observation that said:



Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where



$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$



So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?



Could someone explain this construction or definition of Markov chain pls.



Thanks for your help and time everyone.







share|cite|improve this question











Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.



Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?



And with this there is an observation that said:



Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where



$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$



So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?



Could someone explain this construction or definition of Markov chain pls.



Thanks for your help and time everyone.









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 23 at 15:12









Knight

1,91111325




1,91111325











  • This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
    – Math1000
    Jul 24 at 2:04











  • This seems equivalent to a hidden Markov chain.
    – Did
    Jul 24 at 15:32
















  • This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
    – Math1000
    Jul 24 at 2:04











  • This seems equivalent to a hidden Markov chain.
    – Did
    Jul 24 at 15:32















This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04





This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04













This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32




This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32










1 Answer
1






active

oldest

votes

















up vote
1
down vote













Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:



A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:



|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|


What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.



As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.






share|cite|improve this answer





















  • For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
    – Knight
    Jul 24 at 15:52










  • @Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
    – Nucl3ic
    Jul 24 at 15:53










  • I don't know, It is complicated to understand these concepts.
    – Knight
    Jul 24 at 15:54










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2860482%2fmarkov-chain-question%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:



A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:



|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|


What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.



As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.






share|cite|improve this answer





















  • For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
    – Knight
    Jul 24 at 15:52










  • @Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
    – Nucl3ic
    Jul 24 at 15:53










  • I don't know, It is complicated to understand these concepts.
    – Knight
    Jul 24 at 15:54














up vote
1
down vote













Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:



A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:



|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|


What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.



As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.






share|cite|improve this answer





















  • For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
    – Knight
    Jul 24 at 15:52










  • @Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
    – Nucl3ic
    Jul 24 at 15:53










  • I don't know, It is complicated to understand these concepts.
    – Knight
    Jul 24 at 15:54












up vote
1
down vote










up vote
1
down vote









Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:



A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:



|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|


What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.



As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.






share|cite|improve this answer













Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:



A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:



|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|


What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.



As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.







share|cite|improve this answer













share|cite|improve this answer



share|cite|improve this answer











answered Jul 24 at 14:31









Nucl3ic

48829




48829











  • For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
    – Knight
    Jul 24 at 15:52










  • @Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
    – Nucl3ic
    Jul 24 at 15:53










  • I don't know, It is complicated to understand these concepts.
    – Knight
    Jul 24 at 15:54
















  • For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
    – Knight
    Jul 24 at 15:52










  • @Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
    – Nucl3ic
    Jul 24 at 15:53










  • I don't know, It is complicated to understand these concepts.
    – Knight
    Jul 24 at 15:54















For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52




For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52












@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53




@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53












I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54




I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2860482%2fmarkov-chain-question%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Relationship between determinant of matrix and determinant of adjoint?

Color the edges and diagonals of a regular polygon