Regressions in case of non-normality

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
1
down vote

favorite












Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?



(Please, provide references if they apply)







share|cite|improve this question



















  • I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
    – ericf
    Jul 22 at 23:29














up vote
1
down vote

favorite












Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?



(Please, provide references if they apply)







share|cite|improve this question



















  • I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
    – ericf
    Jul 22 at 23:29












up vote
1
down vote

favorite









up vote
1
down vote

favorite











Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?



(Please, provide references if they apply)







share|cite|improve this question











Our variables and residuals are not normally distributed. What we found is that regressions are usually quite robust against violations of normality. But we don't know to which degree, because our sample size is not that big either (71). We found some options how to deal with this: (1) use a non-parametric alternative (but we were not able to find one for regressions), (2) transform the data to be more normally distributed, but what are the implications?, (3) using a more conservative p-value to assess significance (i.e. 0.01 instead of 0.05). How do you deal with non-normality in this case? Is the sample size big enough to just assume robustness against normality or should we go for one of the 3 options?



(Please, provide references if they apply)









share|cite|improve this question










share|cite|improve this question




share|cite|improve this question









asked Jul 22 at 13:32









Kiko Fernandez

1062




1062











  • I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
    – ericf
    Jul 22 at 23:29
















  • I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
    – ericf
    Jul 22 at 23:29















I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
– ericf
Jul 22 at 23:29




I'd suggest cross validation to estimate p values en.m.wikipedia.org/wiki/Cross-validation_(statistics)
– ericf
Jul 22 at 23:29










1 Answer
1






active

oldest

votes

















up vote
0
down vote













Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.






share|cite|improve this answer

















  • 1




    While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
    – Glen_b
    Jul 24 at 7:46










Your Answer




StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "69"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);








 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859401%2fregressions-in-case-of-non-normality%23new-answer', 'question_page');

);

Post as a guest






























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.






share|cite|improve this answer

















  • 1




    While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
    – Glen_b
    Jul 24 at 7:46














up vote
0
down vote













Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.






share|cite|improve this answer

















  • 1




    While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
    – Glen_b
    Jul 24 at 7:46












up vote
0
down vote










up vote
0
down vote









Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.






share|cite|improve this answer













Regression analysis does not assume normality per se. Normality is required only to get certain properties that relate mainly to the multivariate normal distribution. Without doing special modifications: The assumptions are that the observations are i.i.d and what-ever is random it has finite variance. Everything else is bonus. Except for the infinite variance, every other violation has some reasonable remedies. But the exact "remedy" depends on the specific violation. E.g, you can use WLS to overcome non-constant variance, non-linear regression to overcome parametric non-linearity, certain non-parametric regressions (e.g., Kruskal-Wallis) to overcome asymmetry etc.







share|cite|improve this answer













share|cite|improve this answer



share|cite|improve this answer











answered Jul 24 at 1:19









V. Vancak

9,7802926




9,7802926







  • 1




    While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
    – Glen_b
    Jul 24 at 7:46












  • 1




    While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
    – Glen_b
    Jul 24 at 7:46







1




1




While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
– Glen_b
Jul 24 at 7:46




While estimation via least squares doesn't require normality (as long as you don't care too much about relative efficiency, which can go to 0 as you move sufficiently far from the normal), the usual tests, and confidence and prediction intervals assume it. The hypothesis tests will have approximately the right significance level in sufficiently large samples (though relative power may be poor), and CIs will have approximately the right coverage (though the average width may be relatively wider than necessary), but prediction intervals rely on it more heavily (large samples don't really help).
– Glen_b
Jul 24 at 7:46












 

draft saved


draft discarded


























 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2859401%2fregressions-in-case-of-non-normality%23new-answer', 'question_page');

);

Post as a guest













































































Comments

Popular posts from this blog

What is the equation of a 3D cone with generalised tilt?

Relationship between determinant of matrix and determinant of adjoint?

Color the edges and diagonals of a regular polygon