awk - Deleting duplicated fields within record -


enter code herei want delete duplicate instances of key (the first 2 fields) in each record. specific duplicates appear reversed.

so given

a b b stuff1 b stuff2 stuff3 b 

where each space tab

i want:

a b stuff1 stuff2 stuff3 

and thought it:

awk 'begin {fs=ofs="\t"}       {gsub($2 "\t" $1,"")}      1' file 

alternative solutions welcome particularly interested in why not work
(i have tried dynamic regexp , gensub btw).

per previous question aware may/will end duplicate tabs , take care of outside awk.

edit

solutions far don't work here real data. ^ read tab character

1874    ^passage de venus^  <directors> ^passage de venus^  1874^   janssen, p.j.c.^    <keywords>^ passage de venus^   1874^   astronomy^  astrophotography^   <genres>^   short 

what want

1874^   passage de venus^   <directors>^    janssen, p.j.c.^    <keywords>^ astronomy^  astrophotography^   <genres>^   short 

you try this

awk '{gsub($2 "[[:space:]]+" $1, "")}1' file 

if works , using "\t" doesn't aren't using tabs .

checked again there no bug have space in file next tabs

try

awk 'begin{fs=" *\t *"}{gsub($2 fs $1, "")}1' file 

although answer purely meant troubleshooting whyt gsub not working, have decided add addendum eds concerns in comments

this stop words other $2 $1 being matched, , should sort out formatting messing up

awk 'begin{fs=" *\t *"}{$0=gensub("("fs")" $2 fs $1 "("fs")","\\1","g")}1' file 

example

 input  1234    mal     mal     1234    formal  12345678        blah   output  1234    mal     formal  12345678        blah 

this should more robust again metachars

awk -f' *\t *' '{x=y;for(i=1;i<=nf;i++)(i>2&&$i==$2&&$(i+1)==$1&&i++)||x=x?x"\t"$i:$i;$0=x}1' file 

Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -