Markov Chain question?

up vote
4
down vote

favorite

Let $M=(S,A,A(s)_sin S,P)$ be a Markov chain, where $S$ is the set os states, $S subset mathbbN$ and $A$ is a finite set of actions. For each $s in S$, $A(s) subset A$ is the non-empty set of admisible actions at state $s in S$.

Ok, this is a construction that my teacher told us, but I don't understand the idea of the set $A(s)$, what is the function of this set? I don't understand why called it admissible actions? what is the difference with the set of actions?

And with this there is an observation that said:

Without loss of generality we may take $A = cup_s n S A(s)$. Whereas, $mathbbK= s in S, a in A(s)$ is the set of admissible state-action pairs. And the variable $p_ik$ is a stationary controlled transition matrix (why is controlled? what does that mean?), which defines a stochastic kernel on $S$ given $mathbbK$ (I don't understand this implication), where

$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$

So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?

Could someone explain this construction or definition of Markov chain pls.

Thanks for your help and time everyone.

asked Jul 23 at 15:12

Knight

1,91111325

This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
â€“Â Math1000
Jul 24 at 2:04

This seems equivalent to a hidden Markov chain.
â€“Â Did
Jul 24 at 15:32

add a commentÂ |Â

up vote
4
down vote

favorite

And with this there is an observation that said:

$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$

So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?

Could someone explain this construction or definition of Markov chain pls.

Thanks for your help and time everyone.

asked Jul 23 at 15:12

Knight

1,91111325

This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
â€“Â Math1000
Jul 24 at 2:04

This seems equivalent to a hidden Markov chain.
â€“Â Did
Jul 24 at 15:32

add a commentÂ |Â

up vote
4
down vote

favorite

And with this there is an observation that said:

$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$

So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?

Could someone explain this construction or definition of Markov chain pls.

Thanks for your help and time everyone.

asked Jul 23 at 15:12

Knight

1,91111325

And with this there is an observation that said:

$$p_ik:= P(X_t+1=s_j|X_t=s_i,A_t=a_k), forall t in mathbbN $$

So Is this the representation of the probability associated with the transition from the state $s_i$ to $s_j$, $i,j=1,...,n$ under an action $a_k in A(s_i)$ ?

Could someone explain this construction or definition of Markov chain pls.

Thanks for your help and time everyone.

asked Jul 23 at 15:12

Knight

1,91111325

asked Jul 23 at 15:12

Knight

1,91111325

asked Jul 23 at 15:12

Knight

1,91111325

asked Jul 23 at 15:12

Knight

1,91111325

This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
â€“Â Math1000
Jul 24 at 2:04

This seems equivalent to a hidden Markov chain.
â€“Â Did
Jul 24 at 15:32

add a commentÂ |Â

This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
â€“Â Math1000
Jul 24 at 2:04

This seems equivalent to a hidden Markov chain.
â€“Â Did
Jul 24 at 15:32

This sounds like a Markov Decision Process, which is a bit more involved than a regular Markov chain. The general idea is that at each time $n$, some action $ain A$ may be taken, and if so, then the distribution of $X_n+1$ is given by the transition matrix corresponding to $a$.
â€“Â Math1000
Jul 24 at 2:04

This seems equivalent to a hidden Markov chain.
â€“Â Did
Jul 24 at 15:32

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

Perhaps I can explain this concept more clearly with some example. As a brief note, what you are describing is not a Markov Chain, but rather a Markovian Decision Process. Consider the following MDP:

A robot is allowed to move on a grid of dimension $(n,n)$. He starts on the grid at $(0,0)$ and at each timestemp may choose to move to a square that is either one unit away vertically or one unit away horizontally so long as he stays on the grid. After some time, me might look like this:

|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|

What are all the states? We can denote them by their coordinates: $(a,b) forall a in [0,1]$. What is the set of all actions? Well that is simply (up,left,right,down). However, the robot can not perform all of those actions from every state. For example, in his starting state, he may only move down or right as he is restricted to staying on the grid. Thus we can say, $A((0,0)) = up, right$ as those are the only admissible or "allowed" actions in state $(0,0)$ which is clearly different than the set of all actions. Furthermore, one could say that $cup_s in S A(s) = mathbbA$ for all MDP.

As for the probabilities, it may be the case that in certain states, the robot likes to do certain actions more than others (or perhaps completely uniformly). Suppose we observed the robot for a very, very long time and came up with the percentages of the time he transitioned from state $s$ to state $s'$. We could write out a matrix representing such transition probabilities. This is the stochastic kernel.

answered Jul 24 at 14:31

Nucl3ic

48829

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2860482%2fmarkov-chain-question%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|

answered Jul 24 at 14:31

Nucl3ic

48829

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

add a commentÂ |Â

up vote
1
down vote

|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|

answered Jul 24 at 14:31

Nucl3ic

48829

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

add a commentÂ |Â

up vote
1
down vote

|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|

answered Jul 24 at 14:31

Nucl3ic

48829

|------------|------------|------------|-----------|
| | | | |
| (0,0) | (0,1) | ... | |
|------------|------------|------------|-----------|
| | | | |
| (1,0) | | | |
|------------|------------|------------|-----------|
| | | | |
| ... | | | |
|------------|------------|------------|-----------|
| | | | O_O |
| | | | || |
|------------|------------|------------|-----------|
| | | | |
| | | | |
|------------|------------|------------|-----------|

answered Jul 24 at 14:31

Nucl3ic

48829

answered Jul 24 at 14:31

Nucl3ic

48829

answered Jul 24 at 14:31

Nucl3ic

48829

answered Jul 24 at 14:31

Nucl3ic

48829

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

add a commentÂ |Â

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

For example, I was reading what is a MDP and talked about histories and policies, i.e, The space $mathbbH_t$ of possible histories up to time $t in mathbbN$ is defined by $mathbbH_0:=S$ and $mathbbH_t:= mathbbK_t times S$, $t geq 1$, $h_t= (s_0,a_0,...,s_i,a_i,...,s_t)$ , and a control policy $pi=pi(t)$ is a special sequence of stochastic kerneles, what does that mean here? And what is this set of histories?
â€“Â Knight
Jul 24 at 15:52

@Knight this is in reference to the ability for agents to change their environments and thus their policy on taking actions in space along time.
â€“Â Nucl3ic
Jul 24 at 15:53

I don't know, It is complicated to understand these concepts.
â€“Â Knight
Jul 24 at 15:54

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

Search This Blog

ukmuiik