How to remove symbols from a column using awk












6















I have data like this:



chr1    134901  139379  -   "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";


I generated by parsing a GTF file



I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:



chr1    134901  139379  -   ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9









share|improve this question




















  • 1





    you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

    – jbrahy
    Jan 14 '16 at 19:55











  • @DigitalTrauma ya, but Dani_l already gave that solution.

    – jbrahy
    Jan 14 '16 at 23:59
















6















I have data like this:



chr1    134901  139379  -   "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";


I generated by parsing a GTF file



I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:



chr1    134901  139379  -   ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9









share|improve this question




















  • 1





    you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

    – jbrahy
    Jan 14 '16 at 19:55











  • @DigitalTrauma ya, but Dani_l already gave that solution.

    – jbrahy
    Jan 14 '16 at 23:59














6












6








6








I have data like this:



chr1    134901  139379  -   "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";


I generated by parsing a GTF file



I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:



chr1    134901  139379  -   ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9









share|improve this question
















I have data like this:



chr1    134901  139379  -   "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";


I generated by parsing a GTF file



I want to remove the "'s and ;'s from column 5 using awk or sed if it possible. The result would look like this:



chr1    134901  139379  -   ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9






text-processing sed awk






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 14 '16 at 19:46









jasonwryan

50.3k14135189




50.3k14135189










asked Jan 14 '16 at 19:41









SystemSystem

62117




62117








  • 1





    you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

    – jbrahy
    Jan 14 '16 at 19:55











  • @DigitalTrauma ya, but Dani_l already gave that solution.

    – jbrahy
    Jan 14 '16 at 23:59














  • 1





    you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

    – jbrahy
    Jan 14 '16 at 19:55











  • @DigitalTrauma ya, but Dani_l already gave that solution.

    – jbrahy
    Jan 14 '16 at 23:59








1




1





you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55





you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename

– jbrahy
Jan 14 '16 at 19:55













@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59





@DigitalTrauma ya, but Dani_l already gave that solution.

– jbrahy
Jan 14 '16 at 23:59










7 Answers
7






active

oldest

votes


















6














Using gsub:



awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9


If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:



awk '{gsub(/"|;/,"",$5)}1' file 





share|improve this answer





















  • 1





    This would remove from all columns, not just 5th, no?

    – Dani_l
    Jan 14 '16 at 19:55











  • This is what I thought initally, but after using the code it seemed to keep all columns.

    – System
    Jan 14 '16 at 19:57











  • @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

    – jasonwryan
    Jan 14 '16 at 19:57













  • Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

    – System
    Jan 14 '16 at 19:58













  • @System updated to ensure it only operates on the fifth field.

    – jasonwryan
    Jan 14 '16 at 20:15



















5














Using sed to remove all instances of '";':
sed -i 's/[";]//g' file



To only remove from 5th column sed is probably not the best option.






share|improve this answer































    5














    If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:



    tr -d '";' < input.txt > output.txt





    share|improve this answer































      3














      I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.



      <?php

      foreach(file($argv[1]) as $line){

      $matches = array();
      preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
      $matched_line = array_shift($matches); // remove the first element
      vprintf("%st%st%st%st%sn",$matches);
      }


      this would output this



      $ php /tmp/preg_replace.php /tmp/data
      chr1 134901 139379 - ENSG00000237683.5
      chr1 860260 879955 + ENSG00000187634.6
      chr1 861264 866445 - ENSG00000268179.1
      chr1 879584 894689 - ENSG00000188976.6
      chr1 895967 901095 + ENSG00000187961.9





      share|improve this answer





















      • 1





        I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

        – jasonwryan
        Jan 14 '16 at 20:17











      • I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

        – jbrahy
        Jan 14 '16 at 20:18











      • I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

        – jasonwryan
        Jan 14 '16 at 20:24











      • ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

        – jbrahy
        Jan 14 '16 at 20:25











      • Fair enough, it is always good to see solutions using different approaches...

        – jasonwryan
        Jan 14 '16 at 20:46



















      3














      A sed solution that makes sure we're only fiddling around with the fifth column:



      sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
      chr1 134901 139379 - ENSG00000237683.5
      chr1 860260 879955 + ENSG00000187634.6
      chr1 861264 866445 - ENSG00000268179.1
      chr1 879584 894689 - ENSG00000188976.6
      chr1 895967 901095 + ENSG00000187961.9


      This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,} (or {1,} for BRE).



      In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.



      The regex in detail:



      ^               # Anchored at start of line
      ( # Capture group 1 for first 4 columns
      ( # Capture group 2 for repeat count
      [^ ]+ # 1 or more non-spaces
      + # 1 or more spaces
      ){4} # 4 times "word plus spaces" (columns)
      ) # End capture group 1
      " # Column 5 starts with double quote (not captured)
      ( # Capture group 3 for column 5
      [^"]+ # One or more non-quote characters
      ) # End capture group 3
      "; # Quote and semicolon at end of column 5
      $ # Anchored at end of line




      1 GNU sed, as an extension, allows + to be used in BRE as well.






      share|improve this answer

































        2














        If every line has fixed length (as in the example) than



        cut -c1-28,30-46 INFILE


        will work.






        share|improve this answer































          0














          In bash you can use string manipulation to achieve what you want. Here is the code



          [root@localhost]# cat ./test.sh
          #!/usr/bin/env bash

          while IFS= read -r line; do
          echo ${line//[";]/}
          done < sample.txt


          and this is the output



          [root@localhost]# ./test.sh
          chr1 134901 139379 - ENSG00000237683.5
          chr1 860260 879955 + ENSG00000187634.6
          chr1 861264 866445 - ENSG00000268179.1
          chr1 879584 894689 - ENSG00000188976.6
          chr1 895967 901095 + ENSG00000187961.9





          share|improve this answer








          New contributor




          Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.




















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "106"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f255380%2fhow-to-remove-symbols-from-a-column-using-awk%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            7 Answers
            7






            active

            oldest

            votes








            7 Answers
            7






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            6














            Using gsub:



            awk '{gsub(/"|;/,"")}1' file
            chr1 134901 139379 - ENSG00000237683.5
            chr1 860260 879955 + ENSG00000187634.6
            chr1 861264 866445 - ENSG00000268179.1
            chr1 879584 894689 - ENSG00000188976.6
            chr1 895967 901095 + ENSG00000187961.9


            If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:



            awk '{gsub(/"|;/,"",$5)}1' file 





            share|improve this answer





















            • 1





              This would remove from all columns, not just 5th, no?

              – Dani_l
              Jan 14 '16 at 19:55











            • This is what I thought initally, but after using the code it seemed to keep all columns.

              – System
              Jan 14 '16 at 19:57











            • @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

              – jasonwryan
              Jan 14 '16 at 19:57













            • Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

              – System
              Jan 14 '16 at 19:58













            • @System updated to ensure it only operates on the fifth field.

              – jasonwryan
              Jan 14 '16 at 20:15
















            6














            Using gsub:



            awk '{gsub(/"|;/,"")}1' file
            chr1 134901 139379 - ENSG00000237683.5
            chr1 860260 879955 + ENSG00000187634.6
            chr1 861264 866445 - ENSG00000268179.1
            chr1 879584 894689 - ENSG00000188976.6
            chr1 895967 901095 + ENSG00000187961.9


            If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:



            awk '{gsub(/"|;/,"",$5)}1' file 





            share|improve this answer





















            • 1





              This would remove from all columns, not just 5th, no?

              – Dani_l
              Jan 14 '16 at 19:55











            • This is what I thought initally, but after using the code it seemed to keep all columns.

              – System
              Jan 14 '16 at 19:57











            • @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

              – jasonwryan
              Jan 14 '16 at 19:57













            • Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

              – System
              Jan 14 '16 at 19:58













            • @System updated to ensure it only operates on the fifth field.

              – jasonwryan
              Jan 14 '16 at 20:15














            6












            6








            6







            Using gsub:



            awk '{gsub(/"|;/,"")}1' file
            chr1 134901 139379 - ENSG00000237683.5
            chr1 860260 879955 + ENSG00000187634.6
            chr1 861264 866445 - ENSG00000268179.1
            chr1 879584 894689 - ENSG00000188976.6
            chr1 895967 901095 + ENSG00000187961.9


            If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:



            awk '{gsub(/"|;/,"",$5)}1' file 





            share|improve this answer















            Using gsub:



            awk '{gsub(/"|;/,"")}1' file
            chr1 134901 139379 - ENSG00000237683.5
            chr1 860260 879955 + ENSG00000187634.6
            chr1 861264 866445 - ENSG00000268179.1
            chr1 879584 894689 - ENSG00000188976.6
            chr1 895967 901095 + ENSG00000187961.9


            If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:



            awk '{gsub(/"|;/,"",$5)}1' file 






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Jan 14 '16 at 20:11

























            answered Jan 14 '16 at 19:45









            jasonwryanjasonwryan

            50.3k14135189




            50.3k14135189








            • 1





              This would remove from all columns, not just 5th, no?

              – Dani_l
              Jan 14 '16 at 19:55











            • This is what I thought initally, but after using the code it seemed to keep all columns.

              – System
              Jan 14 '16 at 19:57











            • @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

              – jasonwryan
              Jan 14 '16 at 19:57













            • Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

              – System
              Jan 14 '16 at 19:58













            • @System updated to ensure it only operates on the fifth field.

              – jasonwryan
              Jan 14 '16 at 20:15














            • 1





              This would remove from all columns, not just 5th, no?

              – Dani_l
              Jan 14 '16 at 19:55











            • This is what I thought initally, but after using the code it seemed to keep all columns.

              – System
              Jan 14 '16 at 19:57











            • @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

              – jasonwryan
              Jan 14 '16 at 19:57













            • Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

              – System
              Jan 14 '16 at 19:58













            • @System updated to ensure it only operates on the fifth field.

              – jasonwryan
              Jan 14 '16 at 20:15








            1




            1





            This would remove from all columns, not just 5th, no?

            – Dani_l
            Jan 14 '16 at 19:55





            This would remove from all columns, not just 5th, no?

            – Dani_l
            Jan 14 '16 at 19:55













            This is what I thought initally, but after using the code it seemed to keep all columns.

            – System
            Jan 14 '16 at 19:57





            This is what I thought initally, but after using the code it seemed to keep all columns.

            – System
            Jan 14 '16 at 19:57













            @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

            – jasonwryan
            Jan 14 '16 at 19:57







            @Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...

            – jasonwryan
            Jan 14 '16 at 19:57















            Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

            – System
            Jan 14 '16 at 19:58







            Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.

            – System
            Jan 14 '16 at 19:58















            @System updated to ensure it only operates on the fifth field.

            – jasonwryan
            Jan 14 '16 at 20:15





            @System updated to ensure it only operates on the fifth field.

            – jasonwryan
            Jan 14 '16 at 20:15













            5














            Using sed to remove all instances of '";':
            sed -i 's/[";]//g' file



            To only remove from 5th column sed is probably not the best option.






            share|improve this answer




























              5














              Using sed to remove all instances of '";':
              sed -i 's/[";]//g' file



              To only remove from 5th column sed is probably not the best option.






              share|improve this answer


























                5












                5








                5







                Using sed to remove all instances of '";':
                sed -i 's/[";]//g' file



                To only remove from 5th column sed is probably not the best option.






                share|improve this answer













                Using sed to remove all instances of '";':
                sed -i 's/[";]//g' file



                To only remove from 5th column sed is probably not the best option.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Jan 14 '16 at 19:54









                Dani_lDani_l

                3,195929




                3,195929























                    5














                    If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:



                    tr -d '";' < input.txt > output.txt





                    share|improve this answer




























                      5














                      If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:



                      tr -d '";' < input.txt > output.txt





                      share|improve this answer


























                        5












                        5








                        5







                        If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:



                        tr -d '";' < input.txt > output.txt





                        share|improve this answer













                        If your data is formatted exactly as shown (i.e. no other " or ; in other columns that need to be preserved), then you can simply use tr to remove these characters:



                        tr -d '";' < input.txt > output.txt






                        share|improve this answer












                        share|improve this answer



                        share|improve this answer










                        answered Jan 14 '16 at 23:40









                        Digital TraumaDigital Trauma

                        5,90211528




                        5,90211528























                            3














                            I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.



                            <?php

                            foreach(file($argv[1]) as $line){

                            $matches = array();
                            preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
                            $matched_line = array_shift($matches); // remove the first element
                            vprintf("%st%st%st%st%sn",$matches);
                            }


                            this would output this



                            $ php /tmp/preg_replace.php /tmp/data
                            chr1 134901 139379 - ENSG00000237683.5
                            chr1 860260 879955 + ENSG00000187634.6
                            chr1 861264 866445 - ENSG00000268179.1
                            chr1 879584 894689 - ENSG00000188976.6
                            chr1 895967 901095 + ENSG00000187961.9





                            share|improve this answer





















                            • 1





                              I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                              – jasonwryan
                              Jan 14 '16 at 20:17











                            • I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                              – jbrahy
                              Jan 14 '16 at 20:18











                            • I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                              – jasonwryan
                              Jan 14 '16 at 20:24











                            • ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                              – jbrahy
                              Jan 14 '16 at 20:25











                            • Fair enough, it is always good to see solutions using different approaches...

                              – jasonwryan
                              Jan 14 '16 at 20:46
















                            3














                            I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.



                            <?php

                            foreach(file($argv[1]) as $line){

                            $matches = array();
                            preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
                            $matched_line = array_shift($matches); // remove the first element
                            vprintf("%st%st%st%st%sn",$matches);
                            }


                            this would output this



                            $ php /tmp/preg_replace.php /tmp/data
                            chr1 134901 139379 - ENSG00000237683.5
                            chr1 860260 879955 + ENSG00000187634.6
                            chr1 861264 866445 - ENSG00000268179.1
                            chr1 879584 894689 - ENSG00000188976.6
                            chr1 895967 901095 + ENSG00000187961.9





                            share|improve this answer





















                            • 1





                              I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                              – jasonwryan
                              Jan 14 '16 at 20:17











                            • I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                              – jbrahy
                              Jan 14 '16 at 20:18











                            • I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                              – jasonwryan
                              Jan 14 '16 at 20:24











                            • ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                              – jbrahy
                              Jan 14 '16 at 20:25











                            • Fair enough, it is always good to see solutions using different approaches...

                              – jasonwryan
                              Jan 14 '16 at 20:46














                            3












                            3








                            3







                            I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.



                            <?php

                            foreach(file($argv[1]) as $line){

                            $matches = array();
                            preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
                            $matched_line = array_shift($matches); // remove the first element
                            vprintf("%st%st%st%st%sn",$matches);
                            }


                            this would output this



                            $ php /tmp/preg_replace.php /tmp/data
                            chr1 134901 139379 - ENSG00000237683.5
                            chr1 860260 879955 + ENSG00000187634.6
                            chr1 861264 866445 - ENSG00000268179.1
                            chr1 879584 894689 - ENSG00000188976.6
                            chr1 895967 901095 + ENSG00000187961.9





                            share|improve this answer















                            I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.



                            <?php

                            foreach(file($argv[1]) as $line){

                            $matches = array();
                            preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
                            $matched_line = array_shift($matches); // remove the first element
                            vprintf("%st%st%st%st%sn",$matches);
                            }


                            this would output this



                            $ php /tmp/preg_replace.php /tmp/data
                            chr1 134901 139379 - ENSG00000237683.5
                            chr1 860260 879955 + ENSG00000187634.6
                            chr1 861264 866445 - ENSG00000268179.1
                            chr1 879584 894689 - ENSG00000188976.6
                            chr1 895967 901095 + ENSG00000187961.9






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Jan 15 '16 at 16:56

























                            answered Jan 14 '16 at 20:08









                            jbrahyjbrahy

                            22916




                            22916








                            • 1





                              I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                              – jasonwryan
                              Jan 14 '16 at 20:17











                            • I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                              – jbrahy
                              Jan 14 '16 at 20:18











                            • I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                              – jasonwryan
                              Jan 14 '16 at 20:24











                            • ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                              – jbrahy
                              Jan 14 '16 at 20:25











                            • Fair enough, it is always good to see solutions using different approaches...

                              – jasonwryan
                              Jan 14 '16 at 20:46














                            • 1





                              I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                              – jasonwryan
                              Jan 14 '16 at 20:17











                            • I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                              – jbrahy
                              Jan 14 '16 at 20:18











                            • I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                              – jasonwryan
                              Jan 14 '16 at 20:24











                            • ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                              – jbrahy
                              Jan 14 '16 at 20:25











                            • Fair enough, it is always good to see solutions using different approaches...

                              – jasonwryan
                              Jan 14 '16 at 20:46








                            1




                            1





                            I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                            – jasonwryan
                            Jan 14 '16 at 20:17





                            I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...

                            – jasonwryan
                            Jan 14 '16 at 20:17













                            I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                            – jbrahy
                            Jan 14 '16 at 20:18





                            I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.

                            – jbrahy
                            Jan 14 '16 at 20:18













                            I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                            – jasonwryan
                            Jan 14 '16 at 20:24





                            I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...

                            – jasonwryan
                            Jan 14 '16 at 20:24













                            ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                            – jbrahy
                            Jan 14 '16 at 20:25





                            ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.

                            – jbrahy
                            Jan 14 '16 at 20:25













                            Fair enough, it is always good to see solutions using different approaches...

                            – jasonwryan
                            Jan 14 '16 at 20:46





                            Fair enough, it is always good to see solutions using different approaches...

                            – jasonwryan
                            Jan 14 '16 at 20:46











                            3














                            A sed solution that makes sure we're only fiddling around with the fifth column:



                            sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
                            chr1 134901 139379 - ENSG00000237683.5
                            chr1 860260 879955 + ENSG00000187634.6
                            chr1 861264 866445 - ENSG00000268179.1
                            chr1 879584 894689 - ENSG00000188976.6
                            chr1 895967 901095 + ENSG00000187961.9


                            This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,} (or {1,} for BRE).



                            In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.



                            The regex in detail:



                            ^               # Anchored at start of line
                            ( # Capture group 1 for first 4 columns
                            ( # Capture group 2 for repeat count
                            [^ ]+ # 1 or more non-spaces
                            + # 1 or more spaces
                            ){4} # 4 times "word plus spaces" (columns)
                            ) # End capture group 1
                            " # Column 5 starts with double quote (not captured)
                            ( # Capture group 3 for column 5
                            [^"]+ # One or more non-quote characters
                            ) # End capture group 3
                            "; # Quote and semicolon at end of column 5
                            $ # Anchored at end of line




                            1 GNU sed, as an extension, allows + to be used in BRE as well.






                            share|improve this answer






























                              3














                              A sed solution that makes sure we're only fiddling around with the fifth column:



                              sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
                              chr1 134901 139379 - ENSG00000237683.5
                              chr1 860260 879955 + ENSG00000187634.6
                              chr1 861264 866445 - ENSG00000268179.1
                              chr1 879584 894689 - ENSG00000188976.6
                              chr1 895967 901095 + ENSG00000187961.9


                              This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,} (or {1,} for BRE).



                              In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.



                              The regex in detail:



                              ^               # Anchored at start of line
                              ( # Capture group 1 for first 4 columns
                              ( # Capture group 2 for repeat count
                              [^ ]+ # 1 or more non-spaces
                              + # 1 or more spaces
                              ){4} # 4 times "word plus spaces" (columns)
                              ) # End capture group 1
                              " # Column 5 starts with double quote (not captured)
                              ( # Capture group 3 for column 5
                              [^"]+ # One or more non-quote characters
                              ) # End capture group 3
                              "; # Quote and semicolon at end of column 5
                              $ # Anchored at end of line




                              1 GNU sed, as an extension, allows + to be used in BRE as well.






                              share|improve this answer




























                                3












                                3








                                3







                                A sed solution that makes sure we're only fiddling around with the fifth column:



                                sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
                                chr1 134901 139379 - ENSG00000237683.5
                                chr1 860260 879955 + ENSG00000187634.6
                                chr1 861264 866445 - ENSG00000268179.1
                                chr1 879584 894689 - ENSG00000188976.6
                                chr1 895967 901095 + ENSG00000187961.9


                                This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,} (or {1,} for BRE).



                                In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.



                                The regex in detail:



                                ^               # Anchored at start of line
                                ( # Capture group 1 for first 4 columns
                                ( # Capture group 2 for repeat count
                                [^ ]+ # 1 or more non-spaces
                                + # 1 or more spaces
                                ){4} # 4 times "word plus spaces" (columns)
                                ) # End capture group 1
                                " # Column 5 starts with double quote (not captured)
                                ( # Capture group 3 for column 5
                                [^"]+ # One or more non-quote characters
                                ) # End capture group 3
                                "; # Quote and semicolon at end of column 5
                                $ # Anchored at end of line




                                1 GNU sed, as an extension, allows + to be used in BRE as well.






                                share|improve this answer















                                A sed solution that makes sure we're only fiddling around with the fifth column:



                                sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
                                chr1 134901 139379 - ENSG00000237683.5
                                chr1 860260 879955 + ENSG00000187634.6
                                chr1 861264 866445 - ENSG00000268179.1
                                chr1 879584 894689 - ENSG00000188976.6
                                chr1 895967 901095 + ENSG00000187961.9


                                This works also without ERE (-E, or -r for some older sed), but requires a lot more backslashes. The +-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,} (or {1,} for BRE).



                                In case the columns aren't space-separated, the spaces can be replaced by the [:blank:] POSIX character class to also match tabs.



                                The regex in detail:



                                ^               # Anchored at start of line
                                ( # Capture group 1 for first 4 columns
                                ( # Capture group 2 for repeat count
                                [^ ]+ # 1 or more non-spaces
                                + # 1 or more spaces
                                ){4} # 4 times "word plus spaces" (columns)
                                ) # End capture group 1
                                " # Column 5 starts with double quote (not captured)
                                ( # Capture group 3 for column 5
                                [^"]+ # One or more non-quote characters
                                ) # End capture group 3
                                "; # Quote and semicolon at end of column 5
                                $ # Anchored at end of line




                                1 GNU sed, as an extension, allows + to be used in BRE as well.







                                share|improve this answer














                                share|improve this answer



                                share|improve this answer








                                edited Jul 3 '18 at 13:42

























                                answered Jan 17 '16 at 6:28









                                Benjamin W.Benjamin W.

                                397312




                                397312























                                    2














                                    If every line has fixed length (as in the example) than



                                    cut -c1-28,30-46 INFILE


                                    will work.






                                    share|improve this answer




























                                      2














                                      If every line has fixed length (as in the example) than



                                      cut -c1-28,30-46 INFILE


                                      will work.






                                      share|improve this answer


























                                        2












                                        2








                                        2







                                        If every line has fixed length (as in the example) than



                                        cut -c1-28,30-46 INFILE


                                        will work.






                                        share|improve this answer













                                        If every line has fixed length (as in the example) than



                                        cut -c1-28,30-46 INFILE


                                        will work.







                                        share|improve this answer












                                        share|improve this answer



                                        share|improve this answer










                                        answered Jan 17 '16 at 7:13









                                        JshuraJshura

                                        1693




                                        1693























                                            0














                                            In bash you can use string manipulation to achieve what you want. Here is the code



                                            [root@localhost]# cat ./test.sh
                                            #!/usr/bin/env bash

                                            while IFS= read -r line; do
                                            echo ${line//[";]/}
                                            done < sample.txt


                                            and this is the output



                                            [root@localhost]# ./test.sh
                                            chr1 134901 139379 - ENSG00000237683.5
                                            chr1 860260 879955 + ENSG00000187634.6
                                            chr1 861264 866445 - ENSG00000268179.1
                                            chr1 879584 894689 - ENSG00000188976.6
                                            chr1 895967 901095 + ENSG00000187961.9





                                            share|improve this answer








                                            New contributor




                                            Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                            Check out our Code of Conduct.

























                                              0














                                              In bash you can use string manipulation to achieve what you want. Here is the code



                                              [root@localhost]# cat ./test.sh
                                              #!/usr/bin/env bash

                                              while IFS= read -r line; do
                                              echo ${line//[";]/}
                                              done < sample.txt


                                              and this is the output



                                              [root@localhost]# ./test.sh
                                              chr1 134901 139379 - ENSG00000237683.5
                                              chr1 860260 879955 + ENSG00000187634.6
                                              chr1 861264 866445 - ENSG00000268179.1
                                              chr1 879584 894689 - ENSG00000188976.6
                                              chr1 895967 901095 + ENSG00000187961.9





                                              share|improve this answer








                                              New contributor




                                              Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                              Check out our Code of Conduct.























                                                0












                                                0








                                                0







                                                In bash you can use string manipulation to achieve what you want. Here is the code



                                                [root@localhost]# cat ./test.sh
                                                #!/usr/bin/env bash

                                                while IFS= read -r line; do
                                                echo ${line//[";]/}
                                                done < sample.txt


                                                and this is the output



                                                [root@localhost]# ./test.sh
                                                chr1 134901 139379 - ENSG00000237683.5
                                                chr1 860260 879955 + ENSG00000187634.6
                                                chr1 861264 866445 - ENSG00000268179.1
                                                chr1 879584 894689 - ENSG00000188976.6
                                                chr1 895967 901095 + ENSG00000187961.9





                                                share|improve this answer








                                                New contributor




                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.










                                                In bash you can use string manipulation to achieve what you want. Here is the code



                                                [root@localhost]# cat ./test.sh
                                                #!/usr/bin/env bash

                                                while IFS= read -r line; do
                                                echo ${line//[";]/}
                                                done < sample.txt


                                                and this is the output



                                                [root@localhost]# ./test.sh
                                                chr1 134901 139379 - ENSG00000237683.5
                                                chr1 860260 879955 + ENSG00000187634.6
                                                chr1 861264 866445 - ENSG00000268179.1
                                                chr1 879584 894689 - ENSG00000188976.6
                                                chr1 895967 901095 + ENSG00000187961.9






                                                share|improve this answer








                                                New contributor




                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.









                                                share|improve this answer



                                                share|improve this answer






                                                New contributor




                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.









                                                answered 22 mins ago









                                                Manish RManish R

                                                1032




                                                1032




                                                New contributor




                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.





                                                New contributor





                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.






                                                Manish R is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                                Check out our Code of Conduct.






























                                                    draft saved

                                                    draft discarded




















































                                                    Thanks for contributing an answer to Unix & Linux Stack Exchange!


                                                    • Please be sure to answer the question. Provide details and share your research!

                                                    But avoid



                                                    • Asking for help, clarification, or responding to other answers.

                                                    • Making statements based on opinion; back them up with references or personal experience.


                                                    To learn more, see our tips on writing great answers.




                                                    draft saved


                                                    draft discarded














                                                    StackExchange.ready(
                                                    function () {
                                                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f255380%2fhow-to-remove-symbols-from-a-column-using-awk%23new-answer', 'question_page');
                                                    }
                                                    );

                                                    Post as a guest















                                                    Required, but never shown





















































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown

































                                                    Required, but never shown














                                                    Required, but never shown












                                                    Required, but never shown







                                                    Required, but never shown







                                                    Popular posts from this blog

                                                    CARDNET

                                                    Boot-repair Failure: Unable to locate package grub-common:i386

                                                    濃尾地震