Merge CSV files with dynamic headers in Java -


i have 2 or more .csv files have following data:

//csv#1 actor.id, actor.displayname, published, target.id, target.objecttype 1, test, 2014-04-03, 2, page  //csv#2 actor.id, actor.displayname, published, object.id 2, testing, 2014-04-04, 3 

desired output file:

//csv#output actor.id, actor.displayname, published, target.id, target.objecttype, object.id 1, test, 2014-04-03, 2, page,  2, testing, 2014-04-04, , , 3 

for case of might wonder: "." in header additional information in .csv file , shouldn't treated separator (the "." results conversion of json-file csv, respecting level of json-data). problem did not find solution far accepts different column counts. there fine way achieve this? did not have code far, thought following work:

  • read 2 or more files , add each row hashmap<integer,string> //integer = linenumber, string = data, each file gets it's own hashmap
  • iterate through indices , add data new hashmap.

why think thought not good:

  • if header , row data file 1 differs file 2 (etc.) order won't kept right.

i think might result if suggested thing:

//csv#suggested actor.id, actor.displayname, published, target.id, target.objecttype, object.id 1, test, 2014-04-03, 2, page //wrong, because 1 "," missing 2, testing, 2014-04-04, 3 // wrong, because 3 not belong target.id. furthermore empty values won't considered. 

is there handy way can merge data of 2 or more files without(!) knowing how many elements header contains?

this isn't answer can point in direction. merging hard, you're going have give rules , need decide rules are. can break down handful of criteria , go there.

i wrote "database" deal situations while back:

https://github.com/danielbchapman/groups 

it map<integer, map<integer. map<string, string>>> isn't complicated. i'd recommend read each row structure similar to:

(set one) -> map<column, data> (set two) -> map<column, data> 

a bidi map (as suggested in comments) make lookups faster carries pitfalls if have duplicate values.

once have these structures lookup can simple as:

 public list<data> process(data one, data two) //pseudo code   {      list<data> result = new list<>();      for(row row : one)      {        id id = row.getid();        row additional = two.lookup(id);        if(additional != null)          merge(row, additional);         result.add(row);      }   }    public void merge(row a, row b)   {     //your logic here.... either mutating or returning copy.   } 

nowhere in solution worried columns acting on raw data-types. can remap column names either storing them each time lookup or recreating them @ output.

the reason linked project i'm pretty sure have few methods in there (such outputing column names etc...) might save considerable time/point in right direction.

i lot of tsv processing in line of work , maps best friends.


Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -