Merge CSV files with dynamic headers in Java -
i have 2 or more .csv files have following data:
//csv#1 actor.id, actor.displayname, published, target.id, target.objecttype 1, test, 2014-04-03, 2, page //csv#2 actor.id, actor.displayname, published, object.id 2, testing, 2014-04-04, 3
desired output file:
//csv#output actor.id, actor.displayname, published, target.id, target.objecttype, object.id 1, test, 2014-04-03, 2, page, 2, testing, 2014-04-04, , , 3
for case of might wonder: "." in header additional information in .csv file , shouldn't treated separator (the "." results conversion of json-file csv, respecting level of json-data). problem did not find solution far accepts different column counts. there fine way achieve this? did not have code far, thought following work:
- read 2 or more files , add each row
hashmap<integer,string> //integer = linenumber, string = data
, each file gets it's own hashmap - iterate through indices , add data new hashmap.
why think thought not good:
- if header , row data file 1 differs file 2 (etc.) order won't kept right.
i think might result if suggested thing:
//csv#suggested actor.id, actor.displayname, published, target.id, target.objecttype, object.id 1, test, 2014-04-03, 2, page //wrong, because 1 "," missing 2, testing, 2014-04-04, 3 // wrong, because 3 not belong target.id. furthermore empty values won't considered.
is there handy way can merge data of 2 or more files without(!) knowing how many elements header contains?
this isn't answer can point in direction. merging hard, you're going have give rules , need decide rules are. can break down handful of criteria , go there.
i wrote "database" deal situations while back:
https://github.com/danielbchapman/groups
it map<integer, map<integer. map<string, string>>>
isn't complicated. i'd recommend read each row structure similar to:
(set one) -> map<column, data> (set two) -> map<column, data>
a bidi map (as suggested in comments) make lookups faster carries pitfalls if have duplicate values.
once have these structures lookup can simple as:
public list<data> process(data one, data two) //pseudo code { list<data> result = new list<>(); for(row row : one) { id id = row.getid(); row additional = two.lookup(id); if(additional != null) merge(row, additional); result.add(row); } } public void merge(row a, row b) { //your logic here.... either mutating or returning copy. }
nowhere in solution worried columns acting on raw data-types. can remap column names either storing them each time lookup or recreating them @ output.
the reason linked project i'm pretty sure have few methods in there (such outputing column names etc...) might save considerable time/point in right direction.
i lot of tsv processing in line of work , maps best friends.
Comments
Post a Comment