How well does a model fit a subset of the original dataset?
Clash Royale CLAN TAG#URR8PPP
up vote
0
down vote
favorite
A linear regression model has been created based on a dataset with observations which can be sorted into several different categories.
I have been asked to assess how well this regression model (which was created using the entire dataset) fits the observations by the category subsets. Correct me if needed, but this seems to render R-squared useless for multiple reasons - (1) because the means of predictions and observations inside a specific category are not equal, and the relationship that SSE + SSR = SSTotal no longer holds within each category.
Calculating the correlation coefficient between the overall model's fits for the category, and those category's observations, would be equivalent to fitting a new least-squares model specific to that category and calculating an R-squared, but that would not be assessing the fit of the OVERALL model to that category.
What other options would you recommend here? Would some kind of likelihood ratio test perhaps be appropriate? I confess I'm not sure what else to do.
Thanks for your help.
regression
add a comment |Â
up vote
0
down vote
favorite
A linear regression model has been created based on a dataset with observations which can be sorted into several different categories.
I have been asked to assess how well this regression model (which was created using the entire dataset) fits the observations by the category subsets. Correct me if needed, but this seems to render R-squared useless for multiple reasons - (1) because the means of predictions and observations inside a specific category are not equal, and the relationship that SSE + SSR = SSTotal no longer holds within each category.
Calculating the correlation coefficient between the overall model's fits for the category, and those category's observations, would be equivalent to fitting a new least-squares model specific to that category and calculating an R-squared, but that would not be assessing the fit of the OVERALL model to that category.
What other options would you recommend here? Would some kind of likelihood ratio test perhaps be appropriate? I confess I'm not sure what else to do.
Thanks for your help.
regression
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32
add a comment |Â
up vote
0
down vote
favorite
up vote
0
down vote
favorite
A linear regression model has been created based on a dataset with observations which can be sorted into several different categories.
I have been asked to assess how well this regression model (which was created using the entire dataset) fits the observations by the category subsets. Correct me if needed, but this seems to render R-squared useless for multiple reasons - (1) because the means of predictions and observations inside a specific category are not equal, and the relationship that SSE + SSR = SSTotal no longer holds within each category.
Calculating the correlation coefficient between the overall model's fits for the category, and those category's observations, would be equivalent to fitting a new least-squares model specific to that category and calculating an R-squared, but that would not be assessing the fit of the OVERALL model to that category.
What other options would you recommend here? Would some kind of likelihood ratio test perhaps be appropriate? I confess I'm not sure what else to do.
Thanks for your help.
regression
A linear regression model has been created based on a dataset with observations which can be sorted into several different categories.
I have been asked to assess how well this regression model (which was created using the entire dataset) fits the observations by the category subsets. Correct me if needed, but this seems to render R-squared useless for multiple reasons - (1) because the means of predictions and observations inside a specific category are not equal, and the relationship that SSE + SSR = SSTotal no longer holds within each category.
Calculating the correlation coefficient between the overall model's fits for the category, and those category's observations, would be equivalent to fitting a new least-squares model specific to that category and calculating an R-squared, but that would not be assessing the fit of the OVERALL model to that category.
What other options would you recommend here? Would some kind of likelihood ratio test perhaps be appropriate? I confess I'm not sure what else to do.
Thanks for your help.
regression
asked Jul 17 at 2:51
Greg
162
162
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32
add a comment |Â
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32
add a comment |Â
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2854091%2fhow-well-does-a-model-fit-a-subset-of-the-original-dataset%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem.
– V. Vancak
Jul 18 at 18:10
I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is.
– Greg
Jul 31 at 0:32