Can i include the product of two random variables? Or do I risk collinearity?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







1












$begingroup$


I have a model in which I want to predict Y.



My regressors X, are x1 and x2.



For some reason I believe that it would also be useful to include into the model:




  • x3 = x1 * x2

  • x4 = x1 / x2


Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    x3 is commonly called an interaction term.
    $endgroup$
    – COOLSerdash
    6 hours ago


















1












$begingroup$


I have a model in which I want to predict Y.



My regressors X, are x1 and x2.



For some reason I believe that it would also be useful to include into the model:




  • x3 = x1 * x2

  • x4 = x1 / x2


Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.










share|cite|improve this question









$endgroup$








  • 2




    $begingroup$
    x3 is commonly called an interaction term.
    $endgroup$
    – COOLSerdash
    6 hours ago














1












1








1





$begingroup$


I have a model in which I want to predict Y.



My regressors X, are x1 and x2.



For some reason I believe that it would also be useful to include into the model:




  • x3 = x1 * x2

  • x4 = x1 / x2


Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.










share|cite|improve this question









$endgroup$




I have a model in which I want to predict Y.



My regressors X, are x1 and x2.



For some reason I believe that it would also be useful to include into the model:




  • x3 = x1 * x2

  • x4 = x1 / x2


Can I use a regressors x1, x2, x3 and x4 altogether or do I risk perfect collinearity problem.
I know for instance that using x5 = x1 + x2 would yield perfect collinearity and hence a completely useless regressor.







regression linear-model multicollinearity






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 6 hours ago









scugn1zz0scugn1zz0

84




84








  • 2




    $begingroup$
    x3 is commonly called an interaction term.
    $endgroup$
    – COOLSerdash
    6 hours ago














  • 2




    $begingroup$
    x3 is commonly called an interaction term.
    $endgroup$
    – COOLSerdash
    6 hours ago








2




2




$begingroup$
x3 is commonly called an interaction term.
$endgroup$
– COOLSerdash
6 hours ago




$begingroup$
x3 is commonly called an interaction term.
$endgroup$
– COOLSerdash
6 hours ago










2 Answers
2






active

oldest

votes


















0












$begingroup$

No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
$$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
And that is $a_i=0$.



In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$






share|cite|improve this answer











$endgroup$





















    2












    $begingroup$

    While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.



    While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.



    For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.



    Similarly, high correlation can happen with the ratio and the difference.






    share|cite|improve this answer











    $endgroup$









    • 1




      $begingroup$
      +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
      $endgroup$
      – Wayne
      1 hour ago










    • $begingroup$
      It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
      $endgroup$
      – Glen_b
      59 mins ago












    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401487%2fcan-i-include-the-product-of-two-random-variables-or-do-i-risk-collinearity%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
    $$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
    And that is $a_i=0$.



    In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$






    share|cite|improve this answer











    $endgroup$


















      0












      $begingroup$

      No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
      $$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
      And that is $a_i=0$.



      In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$






      share|cite|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
        $$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
        And that is $a_i=0$.



        In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$






        share|cite|improve this answer











        $endgroup$



        No, you don’t risk collinearity because $x_i$ are not linearly dependent in general, i.e. the below equation has just one solution holding for all possible $x_i$:
        $$a_1x_1+a_2x_2+a_3x_3+a_4x_4=0$$
        And that is $a_i=0$.



        In $x_5=x_1+x_2$ case, the following equation has non-zero solutions such that $a_1=a_2=-a_5$: $$a_1x_1+a_2x_2+a_5x_5=0$$







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited 6 hours ago

























        answered 6 hours ago









        gunesgunes

        7,0751215




        7,0751215

























            2












            $begingroup$

            While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.



            While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.



            For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.



            Similarly, high correlation can happen with the ratio and the difference.






            share|cite|improve this answer











            $endgroup$









            • 1




              $begingroup$
              +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
              $endgroup$
              – Wayne
              1 hour ago










            • $begingroup$
              It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
              $endgroup$
              – Glen_b
              59 mins ago
















            2












            $begingroup$

            While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.



            While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.



            For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.



            Similarly, high correlation can happen with the ratio and the difference.






            share|cite|improve this answer











            $endgroup$









            • 1




              $begingroup$
              +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
              $endgroup$
              – Wayne
              1 hour ago










            • $begingroup$
              It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
              $endgroup$
              – Glen_b
              59 mins ago














            2












            2








            2





            $begingroup$

            While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.



            While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.



            For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.



            Similarly, high correlation can happen with the ratio and the difference.






            share|cite|improve this answer











            $endgroup$



            While you won't have perfect collinearity (as per your question), you do risk multicollinearity issues with your two additional regressors.



            While they're not algebraically linear combinations of the two predictors, it can be the case that these variables (x1-x4) in a particular sample might lay close to to a linear subspace - with the typical consequences of near-multicollinearity.



            For example, if the two original variates both have very small coefficients of variation then their product can be quite closely related to their sum (or some other linear combination if they're dissimilar in size). This can happen even if the original variables are not highly correlated.



            Similarly, high correlation can happen with the ratio and the difference.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 3 hours ago

























            answered 4 hours ago









            Glen_bGlen_b

            215k23417770




            215k23417770








            • 1




              $begingroup$
              +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
              $endgroup$
              – Wayne
              1 hour ago










            • $begingroup$
              It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
              $endgroup$
              – Glen_b
              59 mins ago














            • 1




              $begingroup$
              +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
              $endgroup$
              – Wayne
              1 hour ago










            • $begingroup$
              It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
              $endgroup$
              – Glen_b
              59 mins ago








            1




            1




            $begingroup$
            +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
            $endgroup$
            – Wayne
            1 hour ago




            $begingroup$
            +1. As I understand things, you want to center your variables (subtract the mean), which will help to reduce correlation between x1, x2, and x1 * x2.
            $endgroup$
            – Wayne
            1 hour ago












            $begingroup$
            It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
            $endgroup$
            – Glen_b
            59 mins ago




            $begingroup$
            It would certainly help; but then we can come back to other examples; in the case of certain kinds of relationships between x1 and x2, the collection of x1,x2,x1*x2 and x1/x2 can still result in near-multicollinearity
            $endgroup$
            – Glen_b
            59 mins ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401487%2fcan-i-include-the-product-of-two-random-variables-or-do-i-risk-collinearity%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            宮崎県

            濃尾地震

            シテ島