cql - Cassandra filtering by date with a secondary index -
i have requirement answer following queries:
- return number of new customers per quarter (up 36 months)
- list new customers per quarter (up 36 months)
i've created following table in cassandra deal this:
create table first_purchase_by_shopper_date ( shop_id uuid, shopper_id uuid, dt_first_purchase timestamp, ... (some text fields) primary key ((shop_id, shopper_id)) );
in order able answer query in cassandra, need able filter data on dt_first_purchase field.
but if add dt_first_purchase primary key, makes row non-unique shopper - , therefore multiple entries in table - ever want 1 entry per shopper.
so insert statement
insert first first_purchase_by_shopper_date (shop_id, shopper_id, dt_first_purchase, ... ) values(...) if not exists;
the if not exists @ end ensures entry written if none exists (e.g. no update performed on existing record.)
how can filter date on table - secondary index on dt_first_purchase column option - , isn't undesirable?
how can filter date on table - secondary index on dt_first_purchase column option - , isn't undesirable?
you try secondary index on dt_first_purchase
(and querying range on require use of allow filtering
directive). performance (especially large cluster), not recommend that.
but first , foremost, understand cassandra designed around returning data specific key on specific data partition. means best way query data date range, first partition data key makes sense model. instance, if had primary key defined this:
primary key ((shop_id), dt_first_purchase, shopper_id)
basically, record of shop (shop_id) recorded first purchase (dt_first_purchase) particular shopper (shopper_id)
with data partitioned shop (shop_id) query first purchases of new shoppers for particular shop_id this:
aploetz@cqlsh:stackoverflow> select * first_purchase_by_shopper_date shop_id=ce1089f6-c613-4d5b-a975-5dfd677b46f9 , dt_first_purchase >= '2014-01-01 00:00:00' , dt_first_purchase < '2014-04-01 00:00:00'; shop_id | dt_first_purchase | shopper_id | value --------------------------------------+--------------------------+--------------------------------------+------- ce1089f6-c613-4d5b-a975-5dfd677b46f9 | 2014-02-12 18:33:22-0600 | a7480417-aaf8-42b1-85dd-5d9a4a30c204 | shopper1 ce1089f6-c613-4d5b-a975-5dfd677b46f9 | 2014-03-13 11:33:22-0500 | 07db2b71-2dc7-421d-bf73-82a5f6c55f89 | shopper2 (2 rows)
additionally, count number of first purchases (new shoppers) particular shop , date range, this:
aploetz@cqlsh:stackoverflow> select count(*) first_purchase_by_shopper_date shop_id=ce1089f6-c613-4d5b-a975-5dfd677b46f9 , dt_first_purchase >= '2014-01-01 00:00:00' , dt_first_purchase < '2014-04-01 00:00:00'; count ------- 2 (1 rows)
please note specific example may not work use case. take is: demonstration of how partitioning , querying work in cassandra.
for more information, check out patrick mcfadin's article on getting started time series data modeling. discusses ways solve use case similar yours.
Comments
Post a Comment