awk - Deleting duplicated fields within record -
enter code here
i want delete duplicate instances of key (the first 2 fields) in each record. specific duplicates appear reversed.
so given
a b b stuff1 b stuff2 stuff3 b
where each space tab
i want:
a b stuff1 stuff2 stuff3
and thought it:
awk 'begin {fs=ofs="\t"} {gsub($2 "\t" $1,"")} 1' file
alternative solutions welcome particularly interested in why not work
(i have tried dynamic regexp , gensub
btw).
per previous question aware may/will end duplicate tabs , take care of outside awk
.
edit
solutions far don't work here real data. ^ read tab character
1874 ^passage de venus^ <directors> ^passage de venus^ 1874^ janssen, p.j.c.^ <keywords>^ passage de venus^ 1874^ astronomy^ astrophotography^ <genres>^ short
what want
1874^ passage de venus^ <directors>^ janssen, p.j.c.^ <keywords>^ astronomy^ astrophotography^ <genres>^ short
you try this
awk '{gsub($2 "[[:space:]]+" $1, "")}1' file
if works , using "\t"
doesn't aren't using tabs .
checked again there no bug have space in file next tabs
try
awk 'begin{fs=" *\t *"}{gsub($2 fs $1, "")}1' file
although answer purely meant troubleshooting whyt gsub not working, have decided add addendum eds concerns in comments
this stop words other $2
$1
being matched, , should sort out formatting messing up
awk 'begin{fs=" *\t *"}{$0=gensub("("fs")" $2 fs $1 "("fs")","\\1","g")}1' file
example
input 1234 mal mal 1234 formal 12345678 blah output 1234 mal formal 12345678 blah
this should more robust again metachars
awk -f' *\t *' '{x=y;for(i=1;i<=nf;i++)(i>2&&$i==$2&&$(i+1)==$1&&i++)||x=x?x"\t"$i:$i;$0=x}1' file
Comments
Post a Comment