awk - Deleting duplicated fields within record -
enter code herei want delete duplicate instances of key (the first 2 fields) in each record. specific duplicates appear reversed.
so given
a b b stuff1 b stuff2 stuff3 b where each space tab
i want:
a b stuff1 stuff2 stuff3 and thought it:
awk 'begin {fs=ofs="\t"} {gsub($2 "\t" $1,"")} 1' file alternative solutions welcome particularly interested in why not work
(i have tried dynamic regexp , gensub btw).
per previous question aware may/will end duplicate tabs , take care of outside awk.
edit
solutions far don't work here real data. ^ read tab character
1874 ^passage de venus^ <directors> ^passage de venus^ 1874^ janssen, p.j.c.^ <keywords>^ passage de venus^ 1874^ astronomy^ astrophotography^ <genres>^ short what want
1874^ passage de venus^ <directors>^ janssen, p.j.c.^ <keywords>^ astronomy^ astrophotography^ <genres>^ short
you try this
awk '{gsub($2 "[[:space:]]+" $1, "")}1' file if works , using "\t" doesn't aren't using tabs .
checked again there no bug have space in file next tabs
try
awk 'begin{fs=" *\t *"}{gsub($2 fs $1, "")}1' file although answer purely meant troubleshooting whyt gsub not working, have decided add addendum eds concerns in comments
this stop words other $2 $1 being matched, , should sort out formatting messing up
awk 'begin{fs=" *\t *"}{$0=gensub("("fs")" $2 fs $1 "("fs")","\\1","g")}1' file example
input 1234 mal mal 1234 formal 12345678 blah output 1234 mal formal 12345678 blah this should more robust again metachars
awk -f' *\t *' '{x=y;for(i=1;i<=nf;i++)(i>2&&$i==$2&&$(i+1)==$1&&i++)||x=x?x"\t"$i:$i;$0=x}1' file
Comments
Post a Comment