awk - Deleting duplicated fields within record -

- June 15, 2011

enter code herei want delete duplicate instances of key (the first 2 fields) in each record. specific duplicates appear reversed.

so given

a b b stuff1 b stuff2 stuff3 b

where each space tab

i want:

a b stuff1 stuff2 stuff3

and thought it:

awk 'begin {fs=ofs="\t"}       {gsub($2 "\t" $1,"")}      1' file

alternative solutions welcome particularly interested in why not work
(i have tried dynamic regexp , gensub btw).

per previous question aware may/will end duplicate tabs , take care of outside awk.

edit

solutions far don't work here real data. ^ read tab character

1874    ^passage de venus^  <directors> ^passage de venus^  1874^   janssen, p.j.c.^    <keywords>^ passage de venus^   1874^   astronomy^  astrophotography^   <genres>^   short

what want

1874^   passage de venus^   <directors>^    janssen, p.j.c.^    <keywords>^ astronomy^  astrophotography^   <genres>^   short

you try this

awk '{gsub($2 "[[:space:]]+" $1, "")}1' file

if works , using "\t" doesn't aren't using tabs .

checked again there no bug have space in file next tabs

try

awk 'begin{fs=" *\t *"}{gsub($2 fs $1, "")}1' file

although answer purely meant troubleshooting whyt gsub not working, have decided add addendum eds concerns in comments

this stop words other $2 $1 being matched, , should sort out formatting messing up

awk 'begin{fs=" *\t *"}{$0=gensub("("fs")" $2 fs $1 "("fs")","\\1","g")}1' file

example

 input  1234    mal     mal     1234    formal  12345678        blah   output  1234    mal     formal  12345678        blah

this should more robust again metachars

awk -f' *\t *' '{x=y;for(i=1;i<=nf;i++)(i>2&&$i==$2&&$(i+1)==$1&&i++)||x=x?x"\t"$i:$i;$0=x}1' file

Search This Blog

Deter

awk - Deleting duplicated fields within record -

example

Comments

Post a Comment

Popular posts from this blog

java - Unable to make sub reports with Jasper -

Save and close a word document by giving a name in R -

scala - play framework: Modules were resolved with conflicting cross-version suffixes -