lead or lag function to get several values, not just the nth












7















I have a tibble with a list of words for each row. I want to create a new variable from a function that searches for a keyword and, if it finds the keyword, creates a string composed of the keyword plus-and-minus 3 words.



The code below is close, but, rather than grabbing all three words before and after my keyword, it grabs the single word 3 ahead/behind.



df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
"it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times",
paste(lag(words, 3),
words,
lead(words, 3), sep = " "),
NA))


The most intuitive solution would be if the lag function could do something like this: lead(words, 1:3) but that doesn't work.



Obviously I could pretty quickly do this by hand (paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)), but I'll eventually actually want to be able to grab the keyword plus-and-minus 50 words--too much to hand-code.



Would be ideal if a solution existed in the tidyverse, but any solution would be helpful. Any help would be appreciated.










share|improve this question









New contributor




wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    7















    I have a tibble with a list of words for each row. I want to create a new variable from a function that searches for a keyword and, if it finds the keyword, creates a string composed of the keyword plus-and-minus 3 words.



    The code below is close, but, rather than grabbing all three words before and after my keyword, it grabs the single word 3 ahead/behind.



    df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
    "it", "was", "the", "worst", "of", "times"))
    df <- df %>% mutate(chunks = ifelse(words=="times",
    paste(lag(words, 3),
    words,
    lead(words, 3), sep = " "),
    NA))


    The most intuitive solution would be if the lag function could do something like this: lead(words, 1:3) but that doesn't work.



    Obviously I could pretty quickly do this by hand (paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)), but I'll eventually actually want to be able to grab the keyword plus-and-minus 50 words--too much to hand-code.



    Would be ideal if a solution existed in the tidyverse, but any solution would be helpful. Any help would be appreciated.










    share|improve this question









    New contributor




    wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      7












      7








      7


      2






      I have a tibble with a list of words for each row. I want to create a new variable from a function that searches for a keyword and, if it finds the keyword, creates a string composed of the keyword plus-and-minus 3 words.



      The code below is close, but, rather than grabbing all three words before and after my keyword, it grabs the single word 3 ahead/behind.



      df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
      "it", "was", "the", "worst", "of", "times"))
      df <- df %>% mutate(chunks = ifelse(words=="times",
      paste(lag(words, 3),
      words,
      lead(words, 3), sep = " "),
      NA))


      The most intuitive solution would be if the lag function could do something like this: lead(words, 1:3) but that doesn't work.



      Obviously I could pretty quickly do this by hand (paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)), but I'll eventually actually want to be able to grab the keyword plus-and-minus 50 words--too much to hand-code.



      Would be ideal if a solution existed in the tidyverse, but any solution would be helpful. Any help would be appreciated.










      share|improve this question









      New contributor




      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.












      I have a tibble with a list of words for each row. I want to create a new variable from a function that searches for a keyword and, if it finds the keyword, creates a string composed of the keyword plus-and-minus 3 words.



      The code below is close, but, rather than grabbing all three words before and after my keyword, it grabs the single word 3 ahead/behind.



      df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
      "it", "was", "the", "worst", "of", "times"))
      df <- df %>% mutate(chunks = ifelse(words=="times",
      paste(lag(words, 3),
      words,
      lead(words, 3), sep = " "),
      NA))


      The most intuitive solution would be if the lag function could do something like this: lead(words, 1:3) but that doesn't work.



      Obviously I could pretty quickly do this by hand (paste(lead(words,3), lead(words,2), lead(words,1),...lag(words,3)), but I'll eventually actually want to be able to grab the keyword plus-and-minus 50 words--too much to hand-code.



      Would be ideal if a solution existed in the tidyverse, but any solution would be helpful. Any help would be appreciated.







      r dplyr lag lead






      share|improve this question









      New contributor




      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 5 hours ago







      wscampbell













      New contributor




      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 5 hours ago









      wscampbellwscampbell

      363




      363




      New contributor




      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      wscampbell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























          4 Answers
          4






          active

          oldest

          votes


















          5














          One option would be sapply:



          library(dplyr)

          df %>%
          mutate(
          chunks = ifelse(words == "times",
          sapply(1:nrow(.),
          function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
          NA)
          )


          Output:



          # A tibble: 12 x 2
          words chunks
          <chr> <chr>
          1 it NA
          2 was NA
          3 the NA
          4 best NA
          5 of NA
          6 times the best of times it was the
          7 it NA
          8 was NA
          9 the NA
          10 worst NA
          11 of NA
          12 times the worst of times


          Although not an explicit lead or lag function, it can often serve the purpose as well.






          share|improve this answer



















          • 1





            Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

            – wscampbell
            4 hours ago











          • You're welcome! Consider accepting the answer if it helped.

            – arg0naut
            4 hours ago



















          4














          Similar to @arg0naut but without dplyr:



          r  = 1:nrow(df)
          w = which(df$words == "times")
          wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

          df$chunks <- NA_character_
          df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

          # A tibble: 12 x 2
          words chunks
          <chr> <chr>
          1 it <NA>
          2 was <NA>
          3 the <NA>
          4 best <NA>
          5 of <NA>
          6 times the best of times it was the
          7 it <NA>
          8 was <NA>
          9 the <NA>
          10 worst <NA>
          11 of <NA>
          12 times the worst of times


          The data.table translation:



          library(data.table)
          DT = data.table(df)

          w = DT["times", on="words", which=TRUE]
          wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

          DT[w, chunks := DT[unlist(wm), paste(words, collapse=" "), by=rep(w, lengths(wm))]$V1]





          share|improve this answer































            4














            data.table::shift accepts a vector for the n (lag) argument and outputs a list, so you can use that and do.call(paste the list elements together. However, unless you're on data.table version >= 1.12, I don't think it will let you mix negative and positive n values (as below).



            With data table:



            library(data.table)
            setDT(df)

            df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

            # words chunks
            # 1: it <NA>
            # 2: was <NA>
            # 3: the <NA>
            # 4: best <NA>
            # 5: of <NA>
            # 6: times the best of times it was the
            # 7: it <NA>
            # 8: was <NA>
            # 9: the <NA>
            # 10: worst <NA>
            # 11: of <NA>
            # 12: times the worst of times


            With dplyr and only using data.table for the shift function:



            library(dplyr)

            df %>%
            mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
            chunks = trimws(ifelse(words != "times", NA, chunks)))

            # # A tibble: 12 x 2
            # words chunks
            # <chr> <chr>
            # 1 it NA
            # 2 was NA
            # 3 the NA
            # 4 best NA
            # 5 of NA
            # 6 times the best of times it was the
            # 7 it NA
            # 8 was NA
            # 9 the NA
            # 10 worst NA
            # 11 of NA
            # 12 times the worst of times





            share|improve this answer

































              3














              Here is a another tidyverse solution using lag and lead



              laglead_f <- function(what, range)
              setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

              df %>%
              mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
              unite(chunks, -words, sep = " ") %>%
              mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
              ## A tibble: 12 x 2
              # words chunks
              # <chr> <chr>
              # 1 it NA
              # 2 was NA
              # 3 the NA
              # 4 best NA
              # 5 of NA
              # 6 times the best of times it was the
              # 7 it NA
              # 8 was NA
              # 9 the NA
              #10 worst NA
              #11 of NA
              #12 times the worst of times


              The idea is to store values from the three lagged and leading vectors in new columns with mutate_at and a named function, unite those columns and then filter entries based on your condition where words == "times".






              share|improve this answer


























              • This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                – wscampbell
                1 hour ago











              • You're very welcome @wscampbell and thanks for an interesting question!

                – Maurits Evers
                1 hour ago











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              wscampbell is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55010810%2flead-or-lag-function-to-get-several-values-not-just-the-nth%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              5














              One option would be sapply:



              library(dplyr)

              df %>%
              mutate(
              chunks = ifelse(words == "times",
              sapply(1:nrow(.),
              function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
              NA)
              )


              Output:



              # A tibble: 12 x 2
              words chunks
              <chr> <chr>
              1 it NA
              2 was NA
              3 the NA
              4 best NA
              5 of NA
              6 times the best of times it was the
              7 it NA
              8 was NA
              9 the NA
              10 worst NA
              11 of NA
              12 times the worst of times


              Although not an explicit lead or lag function, it can often serve the purpose as well.






              share|improve this answer



















              • 1





                Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

                – wscampbell
                4 hours ago











              • You're welcome! Consider accepting the answer if it helped.

                – arg0naut
                4 hours ago
















              5














              One option would be sapply:



              library(dplyr)

              df %>%
              mutate(
              chunks = ifelse(words == "times",
              sapply(1:nrow(.),
              function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
              NA)
              )


              Output:



              # A tibble: 12 x 2
              words chunks
              <chr> <chr>
              1 it NA
              2 was NA
              3 the NA
              4 best NA
              5 of NA
              6 times the best of times it was the
              7 it NA
              8 was NA
              9 the NA
              10 worst NA
              11 of NA
              12 times the worst of times


              Although not an explicit lead or lag function, it can often serve the purpose as well.






              share|improve this answer



















              • 1





                Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

                – wscampbell
                4 hours ago











              • You're welcome! Consider accepting the answer if it helped.

                – arg0naut
                4 hours ago














              5












              5








              5







              One option would be sapply:



              library(dplyr)

              df %>%
              mutate(
              chunks = ifelse(words == "times",
              sapply(1:nrow(.),
              function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
              NA)
              )


              Output:



              # A tibble: 12 x 2
              words chunks
              <chr> <chr>
              1 it NA
              2 was NA
              3 the NA
              4 best NA
              5 of NA
              6 times the best of times it was the
              7 it NA
              8 was NA
              9 the NA
              10 worst NA
              11 of NA
              12 times the worst of times


              Although not an explicit lead or lag function, it can often serve the purpose as well.






              share|improve this answer













              One option would be sapply:



              library(dplyr)

              df %>%
              mutate(
              chunks = ifelse(words == "times",
              sapply(1:nrow(.),
              function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")),
              NA)
              )


              Output:



              # A tibble: 12 x 2
              words chunks
              <chr> <chr>
              1 it NA
              2 was NA
              3 the NA
              4 best NA
              5 of NA
              6 times the best of times it was the
              7 it NA
              8 was NA
              9 the NA
              10 worst NA
              11 of NA
              12 times the worst of times


              Although not an explicit lead or lag function, it can often serve the purpose as well.







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered 4 hours ago









              arg0nautarg0naut

              5,3191319




              5,3191319








              • 1





                Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

                – wscampbell
                4 hours ago











              • You're welcome! Consider accepting the answer if it helped.

                – arg0naut
                4 hours ago














              • 1





                Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

                – wscampbell
                4 hours ago











              • You're welcome! Consider accepting the answer if it helped.

                – arg0naut
                4 hours ago








              1




              1





              Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

              – wscampbell
              4 hours ago





              Works pefectly, arg0naut. Thanks a bunch! Really, really helpful

              – wscampbell
              4 hours ago













              You're welcome! Consider accepting the answer if it helped.

              – arg0naut
              4 hours ago





              You're welcome! Consider accepting the answer if it helped.

              – arg0naut
              4 hours ago













              4














              Similar to @arg0naut but without dplyr:



              r  = 1:nrow(df)
              w = which(df$words == "times")
              wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

              df$chunks <- NA_character_
              df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

              # A tibble: 12 x 2
              words chunks
              <chr> <chr>
              1 it <NA>
              2 was <NA>
              3 the <NA>
              4 best <NA>
              5 of <NA>
              6 times the best of times it was the
              7 it <NA>
              8 was <NA>
              9 the <NA>
              10 worst <NA>
              11 of <NA>
              12 times the worst of times


              The data.table translation:



              library(data.table)
              DT = data.table(df)

              w = DT["times", on="words", which=TRUE]
              wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

              DT[w, chunks := DT[unlist(wm), paste(words, collapse=" "), by=rep(w, lengths(wm))]$V1]





              share|improve this answer




























                4














                Similar to @arg0naut but without dplyr:



                r  = 1:nrow(df)
                w = which(df$words == "times")
                wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                df$chunks <- NA_character_
                df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

                # A tibble: 12 x 2
                words chunks
                <chr> <chr>
                1 it <NA>
                2 was <NA>
                3 the <NA>
                4 best <NA>
                5 of <NA>
                6 times the best of times it was the
                7 it <NA>
                8 was <NA>
                9 the <NA>
                10 worst <NA>
                11 of <NA>
                12 times the worst of times


                The data.table translation:



                library(data.table)
                DT = data.table(df)

                w = DT["times", on="words", which=TRUE]
                wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                DT[w, chunks := DT[unlist(wm), paste(words, collapse=" "), by=rep(w, lengths(wm))]$V1]





                share|improve this answer


























                  4












                  4








                  4







                  Similar to @arg0naut but without dplyr:



                  r  = 1:nrow(df)
                  w = which(df$words == "times")
                  wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                  df$chunks <- NA_character_
                  df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

                  # A tibble: 12 x 2
                  words chunks
                  <chr> <chr>
                  1 it <NA>
                  2 was <NA>
                  3 the <NA>
                  4 best <NA>
                  5 of <NA>
                  6 times the best of times it was the
                  7 it <NA>
                  8 was <NA>
                  9 the <NA>
                  10 worst <NA>
                  11 of <NA>
                  12 times the worst of times


                  The data.table translation:



                  library(data.table)
                  DT = data.table(df)

                  w = DT["times", on="words", which=TRUE]
                  wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                  DT[w, chunks := DT[unlist(wm), paste(words, collapse=" "), by=rep(w, lengths(wm))]$V1]





                  share|improve this answer













                  Similar to @arg0naut but without dplyr:



                  r  = 1:nrow(df)
                  w = which(df$words == "times")
                  wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                  df$chunks <- NA_character_
                  df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

                  # A tibble: 12 x 2
                  words chunks
                  <chr> <chr>
                  1 it <NA>
                  2 was <NA>
                  3 the <NA>
                  4 best <NA>
                  5 of <NA>
                  6 times the best of times it was the
                  7 it <NA>
                  8 was <NA>
                  9 the <NA>
                  10 worst <NA>
                  11 of <NA>
                  12 times the worst of times


                  The data.table translation:



                  library(data.table)
                  DT = data.table(df)

                  w = DT["times", on="words", which=TRUE]
                  wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

                  DT[w, chunks := DT[unlist(wm), paste(words, collapse=" "), by=rep(w, lengths(wm))]$V1]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 4 hours ago









                  FrankFrank

                  55k659133




                  55k659133























                      4














                      data.table::shift accepts a vector for the n (lag) argument and outputs a list, so you can use that and do.call(paste the list elements together. However, unless you're on data.table version >= 1.12, I don't think it will let you mix negative and positive n values (as below).



                      With data table:



                      library(data.table)
                      setDT(df)

                      df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

                      # words chunks
                      # 1: it <NA>
                      # 2: was <NA>
                      # 3: the <NA>
                      # 4: best <NA>
                      # 5: of <NA>
                      # 6: times the best of times it was the
                      # 7: it <NA>
                      # 8: was <NA>
                      # 9: the <NA>
                      # 10: worst <NA>
                      # 11: of <NA>
                      # 12: times the worst of times


                      With dplyr and only using data.table for the shift function:



                      library(dplyr)

                      df %>%
                      mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
                      chunks = trimws(ifelse(words != "times", NA, chunks)))

                      # # A tibble: 12 x 2
                      # words chunks
                      # <chr> <chr>
                      # 1 it NA
                      # 2 was NA
                      # 3 the NA
                      # 4 best NA
                      # 5 of NA
                      # 6 times the best of times it was the
                      # 7 it NA
                      # 8 was NA
                      # 9 the NA
                      # 10 worst NA
                      # 11 of NA
                      # 12 times the worst of times





                      share|improve this answer






























                        4














                        data.table::shift accepts a vector for the n (lag) argument and outputs a list, so you can use that and do.call(paste the list elements together. However, unless you're on data.table version >= 1.12, I don't think it will let you mix negative and positive n values (as below).



                        With data table:



                        library(data.table)
                        setDT(df)

                        df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

                        # words chunks
                        # 1: it <NA>
                        # 2: was <NA>
                        # 3: the <NA>
                        # 4: best <NA>
                        # 5: of <NA>
                        # 6: times the best of times it was the
                        # 7: it <NA>
                        # 8: was <NA>
                        # 9: the <NA>
                        # 10: worst <NA>
                        # 11: of <NA>
                        # 12: times the worst of times


                        With dplyr and only using data.table for the shift function:



                        library(dplyr)

                        df %>%
                        mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
                        chunks = trimws(ifelse(words != "times", NA, chunks)))

                        # # A tibble: 12 x 2
                        # words chunks
                        # <chr> <chr>
                        # 1 it NA
                        # 2 was NA
                        # 3 the NA
                        # 4 best NA
                        # 5 of NA
                        # 6 times the best of times it was the
                        # 7 it NA
                        # 8 was NA
                        # 9 the NA
                        # 10 worst NA
                        # 11 of NA
                        # 12 times the worst of times





                        share|improve this answer




























                          4












                          4








                          4







                          data.table::shift accepts a vector for the n (lag) argument and outputs a list, so you can use that and do.call(paste the list elements together. However, unless you're on data.table version >= 1.12, I don't think it will let you mix negative and positive n values (as below).



                          With data table:



                          library(data.table)
                          setDT(df)

                          df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

                          # words chunks
                          # 1: it <NA>
                          # 2: was <NA>
                          # 3: the <NA>
                          # 4: best <NA>
                          # 5: of <NA>
                          # 6: times the best of times it was the
                          # 7: it <NA>
                          # 8: was <NA>
                          # 9: the <NA>
                          # 10: worst <NA>
                          # 11: of <NA>
                          # 12: times the worst of times


                          With dplyr and only using data.table for the shift function:



                          library(dplyr)

                          df %>%
                          mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
                          chunks = trimws(ifelse(words != "times", NA, chunks)))

                          # # A tibble: 12 x 2
                          # words chunks
                          # <chr> <chr>
                          # 1 it NA
                          # 2 was NA
                          # 3 the NA
                          # 4 best NA
                          # 5 of NA
                          # 6 times the best of times it was the
                          # 7 it NA
                          # 8 was NA
                          # 9 the NA
                          # 10 worst NA
                          # 11 of NA
                          # 12 times the worst of times





                          share|improve this answer















                          data.table::shift accepts a vector for the n (lag) argument and outputs a list, so you can use that and do.call(paste the list elements together. However, unless you're on data.table version >= 1.12, I don't think it will let you mix negative and positive n values (as below).



                          With data table:



                          library(data.table)
                          setDT(df)

                          df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

                          # words chunks
                          # 1: it <NA>
                          # 2: was <NA>
                          # 3: the <NA>
                          # 4: best <NA>
                          # 5: of <NA>
                          # 6: times the best of times it was the
                          # 7: it <NA>
                          # 8: was <NA>
                          # 9: the <NA>
                          # 10: worst <NA>
                          # 11: of <NA>
                          # 12: times the worst of times


                          With dplyr and only using data.table for the shift function:



                          library(dplyr)

                          df %>%
                          mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
                          chunks = trimws(ifelse(words != "times", NA, chunks)))

                          # # A tibble: 12 x 2
                          # words chunks
                          # <chr> <chr>
                          # 1 it NA
                          # 2 was NA
                          # 3 the NA
                          # 4 best NA
                          # 5 of NA
                          # 6 times the best of times it was the
                          # 7 it NA
                          # 8 was NA
                          # 9 the NA
                          # 10 worst NA
                          # 11 of NA
                          # 12 times the worst of times






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 3 hours ago

























                          answered 4 hours ago









                          IceCreamToucanIceCreamToucan

                          9,9921818




                          9,9921818























                              3














                              Here is a another tidyverse solution using lag and lead



                              laglead_f <- function(what, range)
                              setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

                              df %>%
                              mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
                              unite(chunks, -words, sep = " ") %>%
                              mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
                              ## A tibble: 12 x 2
                              # words chunks
                              # <chr> <chr>
                              # 1 it NA
                              # 2 was NA
                              # 3 the NA
                              # 4 best NA
                              # 5 of NA
                              # 6 times the best of times it was the
                              # 7 it NA
                              # 8 was NA
                              # 9 the NA
                              #10 worst NA
                              #11 of NA
                              #12 times the worst of times


                              The idea is to store values from the three lagged and leading vectors in new columns with mutate_at and a named function, unite those columns and then filter entries based on your condition where words == "times".






                              share|improve this answer


























                              • This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                                – wscampbell
                                1 hour ago











                              • You're very welcome @wscampbell and thanks for an interesting question!

                                – Maurits Evers
                                1 hour ago
















                              3














                              Here is a another tidyverse solution using lag and lead



                              laglead_f <- function(what, range)
                              setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

                              df %>%
                              mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
                              unite(chunks, -words, sep = " ") %>%
                              mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
                              ## A tibble: 12 x 2
                              # words chunks
                              # <chr> <chr>
                              # 1 it NA
                              # 2 was NA
                              # 3 the NA
                              # 4 best NA
                              # 5 of NA
                              # 6 times the best of times it was the
                              # 7 it NA
                              # 8 was NA
                              # 9 the NA
                              #10 worst NA
                              #11 of NA
                              #12 times the worst of times


                              The idea is to store values from the three lagged and leading vectors in new columns with mutate_at and a named function, unite those columns and then filter entries based on your condition where words == "times".






                              share|improve this answer


























                              • This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                                – wscampbell
                                1 hour ago











                              • You're very welcome @wscampbell and thanks for an interesting question!

                                – Maurits Evers
                                1 hour ago














                              3












                              3








                              3







                              Here is a another tidyverse solution using lag and lead



                              laglead_f <- function(what, range)
                              setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

                              df %>%
                              mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
                              unite(chunks, -words, sep = " ") %>%
                              mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
                              ## A tibble: 12 x 2
                              # words chunks
                              # <chr> <chr>
                              # 1 it NA
                              # 2 was NA
                              # 3 the NA
                              # 4 best NA
                              # 5 of NA
                              # 6 times the best of times it was the
                              # 7 it NA
                              # 8 was NA
                              # 9 the NA
                              #10 worst NA
                              #11 of NA
                              #12 times the worst of times


                              The idea is to store values from the three lagged and leading vectors in new columns with mutate_at and a named function, unite those columns and then filter entries based on your condition where words == "times".






                              share|improve this answer















                              Here is a another tidyverse solution using lag and lead



                              laglead_f <- function(what, range)
                              setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

                              df %>%
                              mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
                              unite(chunks, -words, sep = " ") %>%
                              mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
                              ## A tibble: 12 x 2
                              # words chunks
                              # <chr> <chr>
                              # 1 it NA
                              # 2 was NA
                              # 3 the NA
                              # 4 best NA
                              # 5 of NA
                              # 6 times the best of times it was the
                              # 7 it NA
                              # 8 was NA
                              # 9 the NA
                              #10 worst NA
                              #11 of NA
                              #12 times the worst of times


                              The idea is to store values from the three lagged and leading vectors in new columns with mutate_at and a named function, unite those columns and then filter entries based on your condition where words == "times".







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited 3 hours ago

























                              answered 3 hours ago









                              Maurits EversMaurits Evers

                              29k41535




                              29k41535













                              • This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                                – wscampbell
                                1 hour ago











                              • You're very welcome @wscampbell and thanks for an interesting question!

                                – Maurits Evers
                                1 hour ago



















                              • This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                                – wscampbell
                                1 hour ago











                              • You're very welcome @wscampbell and thanks for an interesting question!

                                – Maurits Evers
                                1 hour ago

















                              This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                              – wscampbell
                              1 hour ago





                              This is SUPER clever. Thanks a bunch, Maurits. This is really great and keeps things inside the tidyverse. Really grateful.

                              – wscampbell
                              1 hour ago













                              You're very welcome @wscampbell and thanks for an interesting question!

                              – Maurits Evers
                              1 hour ago





                              You're very welcome @wscampbell and thanks for an interesting question!

                              – Maurits Evers
                              1 hour ago










                              wscampbell is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              wscampbell is a new contributor. Be nice, and check out our Code of Conduct.













                              wscampbell is a new contributor. Be nice, and check out our Code of Conduct.












                              wscampbell is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55010810%2flead-or-lag-function-to-get-several-values-not-just-the-nth%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              宮崎県

                              濃尾地震

                              シテ島