How to filter Spark Hbase Rdd and obtain results? -
i getting rdd using spark , hbase. want filter rdd , specific value rdd. how can proceed with?
here have done now
val sc = new sparkcontext(sparkconf) val conf = hbaseconfiguration.create() conf.set(tableinputformat.input_table, "tbl_date") val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat], classof[immutablebyteswritable], classof[result])
now want use rdd(hbaserdd) , specific column data sending specific parameter rdd. how can achieve this?
what have:
val sc = new sparkcontext(sparkconf) val conf = hbaseconfiguration.create() conf.set(tableinputformat.input_table, "tbl_date") val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat], classof[immutablebyteswritable], classof[result])
add following:
val localdata = hbaserdd.collect() // array of result val filtereddata = localdata.map{ result => result.getcolumncells("mycolfamily", "mycolname").get(0) // assuming want first cell: otherwise // take of them.. }.filter{ cell => new string(cell.getvaluearray()).startswtih("someprefix") }
the above shows placeholder/dummy functions :
- get(0) need decide if want first cell or cells
- new string(cell.getvaluearray()) need convert proper data type
- .startswith(..) need decide data
but in case above gives flow , outline of how process hbase cell data.
Comments
Post a Comment