How to filter Spark Hbase Rdd and obtain results? -


i getting rdd using spark , hbase. want filter rdd , specific value rdd. how can proceed with?

here have done now

val sc = new sparkcontext(sparkconf) val conf = hbaseconfiguration.create() conf.set(tableinputformat.input_table, "tbl_date") val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat], classof[immutablebyteswritable], classof[result]) 

now want use rdd(hbaserdd) , specific column data sending specific parameter rdd. how can achieve this?

what have:

val sc = new sparkcontext(sparkconf) val conf = hbaseconfiguration.create() conf.set(tableinputformat.input_table, "tbl_date") val hbaserdd = sc.newapihadooprdd(conf, classof[tableinputformat], classof[immutablebyteswritable], classof[result]) 

add following:

val localdata = hbaserdd.collect()  // array of result val filtereddata = localdata.map{ result =>                result.getcolumncells("mycolfamily", "mycolname").get(0) // assuming want first cell: otherwise                                                        // take of them..              }.filter{ cell => new string(cell.getvaluearray()).startswtih("someprefix") } 

the above shows placeholder/dummy functions :

  • get(0) need decide if want first cell or cells
  • new string(cell.getvaluearray()) need convert proper data type
  • .startswith(..) need decide data

but in case above gives flow , outline of how process hbase cell data.


Comments

Popular posts from this blog

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Round ImageView Android -

How can I utilize Yahoo Weather API in android -