Regressions in case of non-normality
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?
(Please, provide references if they apply)
statistics regression linear-regression descriptive-statistics
add a comment |Â
up vote
1
down vote
favorite
Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?
(Please, provide references if they apply)
statistics regression linear-regression descriptive-statistics
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?
(Please, provide references if they apply)
statistics regression linear-regression descriptive-statistics
Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?
(Please, provide references if they apply)
statistics regression linear-regression descriptive-statistics
asked Jul 22 at 13:32
Kiko Fernandez
1062
1062
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29
add a comment |Â
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
0
down vote
Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
add a comment |Â
up vote
0
down vote
Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.
Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.
answered Jul 24 at 1:19
V. Vancak
9,7802926
9,7802926
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
add a comment |Â
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
1
1
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
â Glen_b
Jul 24 at 7:46
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859401%2fregressions-in-case-of-non-normality%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
â ericf
Jul 22 at 23:29