database - Data in regions of HBase could manually be arranged on basis of family:column's value -
i've been working on hbase couple of week, still in design state project ongoing poc. before ask query let me give brief description of i've inferenced.
the basic unit of horizontal scalability in hbase called region. regions subset of table’s data , contiguous, sorted range of rows stored together. when regions become large after adding more rows, region split 2 @ middle key, creating 2 equal halves.
the multi-map structure of hbase table can summarized key -> family -> column -> timestamp -> value.
hbase, internally, keeps special catalog tables named -root- , .meta. within maintains current list, state, , location of regions afloat on cluster. -root- table holds list of .meta. table regions. .meta. table holds list of user-space regions. entries in these tables keyed region name, region name made of table name region belongs to, region’s start row, time of creation, , finally, md5 hash of of former
numbers of rows can stored in region depends upon threshold value defined region i.e. believe can given manually.
so want :-
if table userid , role & year lets millions of tuples. want create 2 layers. 1 layer region nodes differentiated on year's range. lets 1 region stored data 1990 - 1995 , stores data 1996 - 2000 , on. & second layer having differentiated on roles. example 1 region node keeps data admin (id -1), users(id -2) , on. each layer has own region server , mapped in meta table , meta table managed zookeeper. refer below figure further clarification :-
perhaps more 1 zookeepers may work in sync managed zookeeper above them.
so design i'll proposing , want inquire feasibility
if create 2 tables, hbase automatically splitting , rebalancing if needed. if want manually pre-split table, on creation can specify set of key-ranges want, each table, , hbase create 1 region per range. balancer take care of distributing different regions different machines. don't need care zookeeper, -root- or .meta.
http://blog.cloudera.com/blog/2013/04/how-scaling-really-works-in-apache-hbase/
Comments
Post a Comment