Can i include the product of two random variables? Or do I risk collinearity?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
$begingroup$
I have a model in which I want to predict Y.
My regressors X, are x1 and x2.
For some reason I believe that it would also be useful to include into the model:
x3 = x1 * x2x4 = x1 / x2
Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.
regression linear-model multicollinearity
$endgroup$
add a comment |
$begingroup$
I have a model in which I want to predict Y.
My regressors X, are x1 and x2.
For some reason I believe that it would also be useful to include into the model:
x3 = x1 * x2x4 = x1 / x2
Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.
regression linear-model multicollinearity
$endgroup$
2
$begingroup$
x3is commonly called an interaction term.
$endgroup$
– COOLSerdash
6 hours ago
add a comment |
$begingroup$
I have a model in which I want to predict Y.
My regressors X, are x1 and x2.
For some reason I believe that it would also be useful to include into the model:
x3 = x1 * x2x4 = x1 / x2
Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.
regression linear-model multicollinearity
$endgroup$
I have a model in which I want to predict Y.
My regressors X, are x1 and x2.
For some reason I believe that it would also be useful to include into the model:
x3 = x1 * x2x4 = x1 / x2
Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.
regression linear-model multicollinearity
regression linear-model multicollinearity
asked 6 hours ago
scugn1zz0scugn1zz0
84
84
2
$begingroup$
x3is commonly called an interaction term.
$endgroup$
– COOLSerdash
6 hours ago
add a comment |
2
$begingroup$
x3is commonly called an interaction term.
$endgroup$
– COOLSerdash
6 hours ago
2
2
$begingroup$
x3 is commonly called an interaction term.$endgroup$
– COOLSerdash
6 hours ago
$begingroup$
x3 is commonly called an interaction term.$endgroup$
– COOLSerdash
6 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.
In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$
$endgroup$
add a comment |
$begingroup$
While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.
While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.
For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.
Similarly, high correlation can happen with the ratio and the difference.
$endgroup$
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation betweenx1,x2, andx1 * x2.
$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401487%2fcan-i-include-the-product-of-two-random-variables-or-do-i-risk-collinearity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.
In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$
$endgroup$
add a comment |
$begingroup$
No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.
In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$
$endgroup$
add a comment |
$begingroup$
No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.
In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$
$endgroup$
No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.
In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$
edited 6 hours ago
answered 6 hours ago
gunesgunes
7,0751215
7,0751215
add a comment |
add a comment |
$begingroup$
While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.
While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.
For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.
Similarly, high correlation can happen with the ratio and the difference.
$endgroup$
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation betweenx1,x2, andx1 * x2.
$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
add a comment |
$begingroup$
While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.
While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.
For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.
Similarly, high correlation can happen with the ratio and the difference.
$endgroup$
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation betweenx1,x2, andx1 * x2.
$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
add a comment |
$begingroup$
While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.
While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.
For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.
Similarly, high correlation can happen with the ratio and the difference.
$endgroup$
While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.
While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.
For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.
Similarly, high correlation can happen with the ratio and the difference.
edited 3 hours ago
answered 4 hours ago
Glen_b♦Glen_b
215k23417770
215k23417770
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation betweenx1,x2, andx1 * x2.
$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
add a comment |
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation betweenx1,x2, andx1 * x2.
$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
1
1
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between
x1, x2, and x1 * x2.$endgroup$
– Wayne
1 hour ago
$begingroup$
+1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between
x1, x2, and x1 * x2.$endgroup$
– Wayne
1 hour ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
$begingroup$
It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
$endgroup$
– Glen_b♦
59 mins ago
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401487%2fcan-i-include-the-product-of-two-random-variables-or-do-i-risk-collinearity%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
x3is commonly called an interaction term.$endgroup$
– COOLSerdash
6 hours ago