cql - Cassandra filtering by date with a secondary index -

- January 15, 2013

i have requirement answer following queries:

return number of new customers per quarter (up 36 months)
list new customers per quarter (up 36 months)

i've created following table in cassandra deal this:

create table first_purchase_by_shopper_date (     shop_id                     uuid,     shopper_id                  uuid,     dt_first_purchase           timestamp,      ... (some text fields)      primary key ((shop_id, shopper_id)) );

in order able answer query in cassandra, need able filter data on dt_first_purchase field.

but if add dt_first_purchase primary key, makes row non-unique shopper - , therefore multiple entries in table - ever want 1 entry per shopper.

so insert statement

insert first first_purchase_by_shopper_date (shop_id, shopper_id, dt_first_purchase, ... ) values(...) if not exists;

the if not exists @ end ensures entry written if none exists (e.g. no update performed on existing record.)

how can filter date on table - secondary index on dt_first_purchase column option - , isn't undesirable?

how can filter date on table - secondary index on dt_first_purchase column option - , isn't undesirable?

you try secondary index on dt_first_purchase (and querying range on require use of allow filtering directive). performance (especially large cluster), not recommend that.

but first , foremost, understand cassandra designed around returning data specific key on specific data partition. means best way query data date range, first partition data key makes sense model. instance, if had primary key defined this:

primary key ((shop_id), dt_first_purchase, shopper_id)

basically, record of shop (shop_id) recorded first purchase (dt_first_purchase) particular shopper (shopper_id)

with data partitioned shop (shop_id) query first purchases of new shoppers for particular shop_id this:

aploetz@cqlsh:stackoverflow> select *  first_purchase_by_shopper_date  shop_id=ce1089f6-c613-4d5b-a975-5dfd677b46f9  , dt_first_purchase >= '2014-01-01 00:00:00'  , dt_first_purchase < '2014-04-01 00:00:00';   shop_id                              | dt_first_purchase        | shopper_id                           | value --------------------------------------+--------------------------+--------------------------------------+-------  ce1089f6-c613-4d5b-a975-5dfd677b46f9 | 2014-02-12 18:33:22-0600 | a7480417-aaf8-42b1-85dd-5d9a4a30c204 | shopper1  ce1089f6-c613-4d5b-a975-5dfd677b46f9 | 2014-03-13 11:33:22-0500 | 07db2b71-2dc7-421d-bf73-82a5f6c55f89 | shopper2  (2 rows)

additionally, count number of first purchases (new shoppers) particular shop , date range, this:

aploetz@cqlsh:stackoverflow> select count(*) first_purchase_by_shopper_date shop_id=ce1089f6-c613-4d5b-a975-5dfd677b46f9  , dt_first_purchase >= '2014-01-01 00:00:00'  , dt_first_purchase < '2014-04-01 00:00:00';   count -------      2  (1 rows)

please note specific example may not work use case. take is: demonstration of how partitioning , querying work in cassandra.

for more information, check out patrick mcfadin's article on getting started time series data modeling. discusses ways solve use case similar yours.

Search This Blog

Deter

cql - Cassandra filtering by date with a secondary index -

Comments

Post a Comment

Popular posts from this blog

java - Unable to make sub reports with Jasper -

java - Plugin org.apache.maven.plugins:maven-install-plugin:2.4 or one of its dependencies could not be resolved -

Save and close a word document by giving a name in R -