Markov Chain question?
Clash Royale CLAN TAG#URR8PPP
up vote
4
down vote
favorite
Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.
Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?
And with this there is an observation that said:
Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where
$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$
So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?
Could someone explain this construction or definition of Markov chain pls.
Thanks for your help and time everyone.
stochastic-processes markov-chains
add a comment |Â
up vote
4
down vote
favorite
Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.
Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?
And with this there is an observation that said:
Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where
$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$
So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?
Could someone explain this construction or definition of Markov chain pls.
Thanks for your help and time everyone.
stochastic-processes markov-chains
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32
add a comment |Â
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.
Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?
And with this there is an observation that said:
Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where
$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$
So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?
Could someone explain this construction or definition of Markov chain pls.
Thanks for your help and time everyone.
stochastic-processes markov-chains
Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.
Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?
And with this there is an observation that said:
Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where
$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$
So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?
Could someone explain this construction or definition of Markov chain pls.
Thanks for your help and time everyone.
stochastic-processes markov-chains
asked Jul 23 at 15:12
Knight
1,91111325
1,91111325
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32
add a comment |Â
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:
A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:
|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|
What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down)
. However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down
or right
as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.
As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:
A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:
|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|
What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down)
. However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down
or right
as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.
As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
add a comment |Â
up vote
1
down vote
Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:
A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:
|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|
What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down)
. However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down
or right
as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.
As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:
A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:
|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|
What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down)
. However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down
or right
as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.
As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.
Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:
A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:
|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|
What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down)
. However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down
or right
as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.
As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.
answered Jul 24 at 14:31
Nucl3ic
48829
48829
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
add a comment |Â
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
– Knight
Jul 24 at 15:52
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
– Nucl3ic
Jul 24 at 15:53
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
I don't know, It is complicated to understand these concepts.
– Knight
Jul 24 at 15:54
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2860482%2fmarkov-chain-question%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
– Math1000
Jul 24 at 2:04
This seems equivalent to a hidden Markov chain.
– Did
Jul 24 at 15:32