How to remove symbols from a column using awk
I have data like this:
chr1 134901 139379 - "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";
I generated by parsing a GTF file
I want to remove the "
's and ;
's from column 5 using awk or sed if it possible. The result would look like this:
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
text-processing sed awk
add a comment |
I have data like this:
chr1 134901 139379 - "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";
I generated by parsing a GTF file
I want to remove the "
's and ;
's from column 5 using awk or sed if it possible. The result would look like this:
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
text-processing sed awk
1
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59
add a comment |
I have data like this:
chr1 134901 139379 - "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";
I generated by parsing a GTF file
I want to remove the "
's and ;
's from column 5 using awk or sed if it possible. The result would look like this:
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
text-processing sed awk
I have data like this:
chr1 134901 139379 - "ENSG00000237683.5";
chr1 860260 879955 + "ENSG00000187634.6";
chr1 861264 866445 - "ENSG00000268179.1";
chr1 879584 894689 - "ENSG00000188976.6";
chr1 895967 901095 + "ENSG00000187961.9";
I generated by parsing a GTF file
I want to remove the "
's and ;
's from column 5 using awk or sed if it possible. The result would look like this:
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
text-processing sed awk
text-processing sed awk
edited Jan 14 '16 at 19:46
jasonwryan
50.3k14135189
50.3k14135189
asked Jan 14 '16 at 19:41
SystemSystem
62117
62117
1
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59
add a comment |
1
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59
1
1
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59
add a comment |
7 Answers
7
active
oldest
votes
Using gsub
:
awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:
awk '{gsub(/"|;/,"",$5)}1' file
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
|
show 2 more comments
Using sed to remove all instances of '";':
sed -i 's/[";]//g' file
To only remove from 5th column sed is probably not the best option.
add a comment |
If your data is formatted exactly as shown (i.e. no other "
or ;
in other columns that need to be preserved), then you can simply use tr
to remove these characters:
tr -d '";' < input.txt > output.txt
add a comment |
I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.
<?php
foreach(file($argv[1]) as $line){
$matches = array();
preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
$matched_line = array_shift($matches); // remove the first element
vprintf("%st%st%st%st%sn",$matches);
}
this would output this
$ php /tmp/preg_replace.php /tmp/data
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
add a comment |
A sed solution that makes sure we're only fiddling around with the fifth column:
sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
This works also without ERE (-E
, or -r
for some older sed), but requires a lot more backslashes. The +
-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,}
(or {1,}
for BRE).
In case the columns aren't space-separated, the spaces can be replaced by the [:blank:]
POSIX character class to also match tabs.
The regex in detail:
^ # Anchored at start of line
( # Capture group 1 for first 4 columns
( # Capture group 2 for repeat count
[^ ]+ # 1 or more non-spaces
+ # 1 or more spaces
){4} # 4 times "word plus spaces" (columns)
) # End capture group 1
" # Column 5 starts with double quote (not captured)
( # Capture group 3 for column 5
[^"]+ # One or more non-quote characters
) # End capture group 3
"; # Quote and semicolon at end of column 5
$ # Anchored at end of line
1 GNU sed, as an extension, allows +
to be used in BRE as well.
add a comment |
If every line has fixed length (as in the example) than
cut -c1-28,30-46 INFILE
will work.
add a comment |
In bash you can use string manipulation to achieve what you want. Here is the code
[root@localhost]# cat ./test.sh
#!/usr/bin/env bash
while IFS= read -r line; do
echo ${line//[";]/}
done < sample.txt
and this is the output
[root@localhost]# ./test.sh
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
New contributor
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f255380%2fhow-to-remove-symbols-from-a-column-using-awk%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using gsub
:
awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:
awk '{gsub(/"|;/,"",$5)}1' file
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
|
show 2 more comments
Using gsub
:
awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:
awk '{gsub(/"|;/,"",$5)}1' file
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
|
show 2 more comments
Using gsub
:
awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:
awk '{gsub(/"|;/,"",$5)}1' file
Using gsub
:
awk '{gsub(/"|;/,"")}1' file
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
If you want to operate only on the fifth field and preserve any quotes or semicolons in other fields:
awk '{gsub(/"|;/,"",$5)}1' file
edited Jan 14 '16 at 20:11
answered Jan 14 '16 at 19:45
jasonwryanjasonwryan
50.3k14135189
50.3k14135189
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
|
show 2 more comments
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
1
1
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This would remove from all columns, not just 5th, no?
– Dani_l
Jan 14 '16 at 19:55
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
This is what I thought initally, but after using the code it seemed to keep all columns.
– System
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
@Dani_l Yes, it can be refined to operate only on the fifth field, but that was not a requirement...
– jasonwryan
Jan 14 '16 at 19:57
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
Sorry I must have not made it clear, I DO want to keep all columns. This is why it is marked as the answer.
– System
Jan 14 '16 at 19:58
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
@System updated to ensure it only operates on the fifth field.
– jasonwryan
Jan 14 '16 at 20:15
|
show 2 more comments
Using sed to remove all instances of '";':
sed -i 's/[";]//g' file
To only remove from 5th column sed is probably not the best option.
add a comment |
Using sed to remove all instances of '";':
sed -i 's/[";]//g' file
To only remove from 5th column sed is probably not the best option.
add a comment |
Using sed to remove all instances of '";':
sed -i 's/[";]//g' file
To only remove from 5th column sed is probably not the best option.
Using sed to remove all instances of '";':
sed -i 's/[";]//g' file
To only remove from 5th column sed is probably not the best option.
answered Jan 14 '16 at 19:54
Dani_lDani_l
3,195929
3,195929
add a comment |
add a comment |
If your data is formatted exactly as shown (i.e. no other "
or ;
in other columns that need to be preserved), then you can simply use tr
to remove these characters:
tr -d '";' < input.txt > output.txt
add a comment |
If your data is formatted exactly as shown (i.e. no other "
or ;
in other columns that need to be preserved), then you can simply use tr
to remove these characters:
tr -d '";' < input.txt > output.txt
add a comment |
If your data is formatted exactly as shown (i.e. no other "
or ;
in other columns that need to be preserved), then you can simply use tr
to remove these characters:
tr -d '";' < input.txt > output.txt
If your data is formatted exactly as shown (i.e. no other "
or ;
in other columns that need to be preserved), then you can simply use tr
to remove these characters:
tr -d '";' < input.txt > output.txt
answered Jan 14 '16 at 23:40
Digital TraumaDigital Trauma
5,90211528
5,90211528
add a comment |
add a comment |
I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.
<?php
foreach(file($argv[1]) as $line){
$matches = array();
preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
$matched_line = array_shift($matches); // remove the first element
vprintf("%st%st%st%st%sn",$matches);
}
this would output this
$ php /tmp/preg_replace.php /tmp/data
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
add a comment |
I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.
<?php
foreach(file($argv[1]) as $line){
$matches = array();
preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
$matched_line = array_shift($matches); // remove the first element
vprintf("%st%st%st%st%sn",$matches);
}
this would output this
$ php /tmp/preg_replace.php /tmp/data
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
add a comment |
I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.
<?php
foreach(file($argv[1]) as $line){
$matches = array();
preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
$matched_line = array_shift($matches); // remove the first element
vprintf("%st%st%st%st%sn",$matches);
}
this would output this
$ php /tmp/preg_replace.php /tmp/data
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
I know the original post asked for sed or awk but if you want to remove the " and ; from only the fifth column I'd use regex and php. There's probably a way to do this in AWK but I like to use the easiest tools.
<?php
foreach(file($argv[1]) as $line){
$matches = array();
preg_match('/^(w+)s+(d+)s+(d+)s+(-|+)s+"(w+.d)";/',$line,$matches);
$matched_line = array_shift($matches); // remove the first element
vprintf("%st%st%st%st%sn",$matches);
}
this would output this
$ php /tmp/preg_replace.php /tmp/data
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
edited Jan 15 '16 at 16:56
answered Jan 14 '16 at 20:08
jbrahyjbrahy
22916
22916
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
add a comment |
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
1
1
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I'm not sure how this satisfies the "easiest tools" criteria; just the amont of typing alone...
– jasonwryan
Jan 14 '16 at 20:17
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I prefer php to awk and sed and this is the only answer that actually does what the original post requested by removing " and ; from only the fifth column. Give me that point back.
– jbrahy
Jan 14 '16 at 20:18
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
I wasn't the downvoter, and no, my edited answer also only operates on the fifth field (and has other advantages besides brevity)...
– jasonwryan
Jan 14 '16 at 20:24
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
ah, ok. I didn't see the edited version. $5 is definitely less typing. For me PHP code is easier so I provided a solution I thought would help someone.
– jbrahy
Jan 14 '16 at 20:25
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
Fair enough, it is always good to see solutions using different approaches...
– jasonwryan
Jan 14 '16 at 20:46
add a comment |
A sed solution that makes sure we're only fiddling around with the fifth column:
sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
This works also without ERE (-E
, or -r
for some older sed), but requires a lot more backslashes. The +
-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,}
(or {1,}
for BRE).
In case the columns aren't space-separated, the spaces can be replaced by the [:blank:]
POSIX character class to also match tabs.
The regex in detail:
^ # Anchored at start of line
( # Capture group 1 for first 4 columns
( # Capture group 2 for repeat count
[^ ]+ # 1 or more non-spaces
+ # 1 or more spaces
){4} # 4 times "word plus spaces" (columns)
) # End capture group 1
" # Column 5 starts with double quote (not captured)
( # Capture group 3 for column 5
[^"]+ # One or more non-quote characters
) # End capture group 3
"; # Quote and semicolon at end of column 5
$ # Anchored at end of line
1 GNU sed, as an extension, allows +
to be used in BRE as well.
add a comment |
A sed solution that makes sure we're only fiddling around with the fifth column:
sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
This works also without ERE (-E
, or -r
for some older sed), but requires a lot more backslashes. The +
-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,}
(or {1,}
for BRE).
In case the columns aren't space-separated, the spaces can be replaced by the [:blank:]
POSIX character class to also match tabs.
The regex in detail:
^ # Anchored at start of line
( # Capture group 1 for first 4 columns
( # Capture group 2 for repeat count
[^ ]+ # 1 or more non-spaces
+ # 1 or more spaces
){4} # 4 times "word plus spaces" (columns)
) # End capture group 1
" # Column 5 starts with double quote (not captured)
( # Capture group 3 for column 5
[^"]+ # One or more non-quote characters
) # End capture group 3
"; # Quote and semicolon at end of column 5
$ # Anchored at end of line
1 GNU sed, as an extension, allows +
to be used in BRE as well.
add a comment |
A sed solution that makes sure we're only fiddling around with the fifth column:
sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
This works also without ERE (-E
, or -r
for some older sed), but requires a lot more backslashes. The +
-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,}
(or {1,}
for BRE).
In case the columns aren't space-separated, the spaces can be replaced by the [:blank:]
POSIX character class to also match tabs.
The regex in detail:
^ # Anchored at start of line
( # Capture group 1 for first 4 columns
( # Capture group 2 for repeat count
[^ ]+ # 1 or more non-spaces
+ # 1 or more spaces
){4} # 4 times "word plus spaces" (columns)
) # End capture group 1
" # Column 5 starts with double quote (not captured)
( # Capture group 3 for column 5
[^"]+ # One or more non-quote characters
) # End capture group 3
"; # Quote and semicolon at end of column 5
$ # Anchored at end of line
1 GNU sed, as an extension, allows +
to be used in BRE as well.
A sed solution that makes sure we're only fiddling around with the fifth column:
sed -E 's/^(([^ ]+ +){4})"([^"]+)";$/13/' infile
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
This works also without ERE (-E
, or -r
for some older sed), but requires a lot more backslashes. The +
-quantifier is ERE-only according to the POSIX spec1 and can be replaced by {1,}
(or {1,}
for BRE).
In case the columns aren't space-separated, the spaces can be replaced by the [:blank:]
POSIX character class to also match tabs.
The regex in detail:
^ # Anchored at start of line
( # Capture group 1 for first 4 columns
( # Capture group 2 for repeat count
[^ ]+ # 1 or more non-spaces
+ # 1 or more spaces
){4} # 4 times "word plus spaces" (columns)
) # End capture group 1
" # Column 5 starts with double quote (not captured)
( # Capture group 3 for column 5
[^"]+ # One or more non-quote characters
) # End capture group 3
"; # Quote and semicolon at end of column 5
$ # Anchored at end of line
1 GNU sed, as an extension, allows +
to be used in BRE as well.
edited Jul 3 '18 at 13:42
answered Jan 17 '16 at 6:28
Benjamin W.Benjamin W.
397312
397312
add a comment |
add a comment |
If every line has fixed length (as in the example) than
cut -c1-28,30-46 INFILE
will work.
add a comment |
If every line has fixed length (as in the example) than
cut -c1-28,30-46 INFILE
will work.
add a comment |
If every line has fixed length (as in the example) than
cut -c1-28,30-46 INFILE
will work.
If every line has fixed length (as in the example) than
cut -c1-28,30-46 INFILE
will work.
answered Jan 17 '16 at 7:13
JshuraJshura
1693
1693
add a comment |
add a comment |
In bash you can use string manipulation to achieve what you want. Here is the code
[root@localhost]# cat ./test.sh
#!/usr/bin/env bash
while IFS= read -r line; do
echo ${line//[";]/}
done < sample.txt
and this is the output
[root@localhost]# ./test.sh
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
New contributor
add a comment |
In bash you can use string manipulation to achieve what you want. Here is the code
[root@localhost]# cat ./test.sh
#!/usr/bin/env bash
while IFS= read -r line; do
echo ${line//[";]/}
done < sample.txt
and this is the output
[root@localhost]# ./test.sh
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
New contributor
add a comment |
In bash you can use string manipulation to achieve what you want. Here is the code
[root@localhost]# cat ./test.sh
#!/usr/bin/env bash
while IFS= read -r line; do
echo ${line//[";]/}
done < sample.txt
and this is the output
[root@localhost]# ./test.sh
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
New contributor
In bash you can use string manipulation to achieve what you want. Here is the code
[root@localhost]# cat ./test.sh
#!/usr/bin/env bash
while IFS= read -r line; do
echo ${line//[";]/}
done < sample.txt
and this is the output
[root@localhost]# ./test.sh
chr1 134901 139379 - ENSG00000237683.5
chr1 860260 879955 + ENSG00000187634.6
chr1 861264 866445 - ENSG00000268179.1
chr1 879584 894689 - ENSG00000188976.6
chr1 895967 901095 + ENSG00000187961.9
New contributor
New contributor
answered 22 mins ago
Manish RManish R
1032
1032
New contributor
New contributor
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f255380%2fhow-to-remove-symbols-from-a-column-using-awk%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
you can also use multiple seach and replace statements in sed. sed 's/"//g; s/;//g' filename
– jbrahy
Jan 14 '16 at 19:55
@DigitalTrauma ya, but Dani_l already gave that solution.
– jbrahy
Jan 14 '16 at 23:59