How to implement an ETL Process -
i implement synchronization between source sql base database , target triplestore.
however matter of simplicity let 2 databases. wonder approaches use have every change in source database replicated in target database. more specifically, each time row changes in source database can seen process read changes , populate target database accordingly while applying transformation in middle.
i have seen suggestion around mechanism of notification can available in database, or building tables such changes can tracked (meaning doing manually) , have process polling @ different intervals, or usage of logs (change data capture, etc...)
i'm puzzle of this. wonder if give guidance , explanation different approaches respect objective. meaning: name of methods , look.
my organization uses: postgres , oracle database.
i have take relational data , transform them in rdf store them in triplestore , keep triplestore synchronized data sql store.
please,
many thanks
ps:
a clarification between etl , replication techniques in change data capture, respect overall objective appreciated.
again need make sense of subject, know methods, can further start digging myself. far have understood cdc new way go.
assuming can't use replication , need use kind of etl process extract, transform , load changes destination database, use insert, update , delete triggers fill (manually created) audit table. columns generatedid, tablename, rowid, action (insert, update, delete) , boolean value determine if etl process has processed change. use table changed rows in database , transport them destination database. delete processed rows audit table doesn't grow big. how have run etl process depends on amount of changes occurring in source database.
Comments
Post a Comment