Create SQL transformation on AggregateCustomerPurchaseActivities
We'll be doing a series of SQL transformations. Do not apply the SQL transformations until the very end.
1. Convert amount_due
from a string to a float type and do aggregations:
This is the customers' purchase amount. After the type conversion, we can apply the SUM()
and AVERAGE()
functions. This is an example of how to do rollups on Rockset. Our SQL transformation/rollup will look like this:
Action Item: Delete all the code SQL transformation editor and apply the code snippet below. You'll notice we've applied Lines 1-5:
PREVIEW BOX: You’ll see that all the transactions are being summed and averaged. With the sample data shown below, the average spend is $70.99 and the total spend is $496.99:
2. Aggregate and convert the user’s credit card account number to a hex string representation:
To hash the user’s credit card account number, we can use the MD5() and TO_HEX() functions. MD5() converts the account number to a bytes value. From there, we can use TO_HEX() to convert the byte's value into a hex string representation.
Example: When we combine these 2 SQL functions, we’ll get a SQL that looks like this (do not type this in the SQL transformation editor):
CAST(TO_HEX(md5(account_number)) as string) as account_number_hash
A user might use multiple credit cards to make a transaction or use multiple cards to make different purchases. We can aggregate all the user’s credit card number and display only the distinct values via ARRAY_DISTINCT() and ARRAY_AGG().
Action Item: Delete all the code in the SQL transformation editor and apply the code snippet below. You'll notice we've applied the previous transformations and added Line 4:
PREVIEW BOX: You’ll notice in the preview box that ALL the distinct credit card account numbers are in one array. This is because we’re not grouping by any field. When we do a GROUP_BY
, then all the distinct credit card account number will be associated with the field we’re grouping by.
3. Group by the user’s id:
We'll cast user_id
to a string from an int because when Retool sends over the parameter for the user_id
, it’ll come in as a string. So, we can do this cast during ingest time and use ingest compute, instead of constantly converting the field to a string at query time and using query compute. Finally, we’ll group by the user's id.
Action Item: Delete all the code in the SQL transformation editor and apply the code snippet below. You'll notice we've applied the previous transformations and added Lines 5, 8-9:
PREVIEW BOX: You’ll notice that each user_id
has its distinct credit card account number hash. If a user makes a purchase with a different credit card, it’ll be in the array.
4. Create a time series bucket for each user_id transaction:
We’ll want to see how many transactions each user does daily. In addition to grouping by the user_id, we’ll need to group by a time period.
We’ll create daily time periods of all the transactions occurring via TIME_BUCKET(). We have the transaction times being ingested via event_time
. TIME_BUCKET()
accepts a timestamp object. We can convert event_time
into a timestamp object and create daily buckets. Something we could write is this (do not type this in the SQL transformation editor):
EXAMPLE:
TIME_BUCKET(DAYS(1), PARSE_TIMESTAMP_ISO8601(event_time)) as event_day
If a user has multiple transactions a day, their sum and average spend will be rolled up into one value for that day.
Action Item: Delete all the code in the SQL transformation editor and apply the code snippet below. You'll notice we've applied the previous transformations and added Line 6:
PREVIEW BOX: You’ll notice now that users’ distinct account_number_hash
, sum_amount_due
, and avg_amount_due
are being aggregated through a daily timeframe.
5. Aggregate distinct payment types:
We’ll want to see the customer’s payment types used in their transactions. We can aggregate and list the unique payment types by using ARRAY_DISTINCT() and ARRAY_AGG(). Something we could write is (do not type this in the SQL transformation editor):
EXAMPLE:
ARRAY_DISTINCT(array_agg(payment_type))as payment_type
Action Item: Delete all the code in the SQL transformation editor and apply the code snippet below. You'll notice we've applied the previous transformations and added Line 7:
PREVIEW BOX: You'll notice that there is a field called payment_type
. This is where we'll see the distinct payment types the customers used for a specific day:
6. Aggregate and flatten the customers’ purchases:
A customer may buy multiple products per transaction. We’ll want to aggregate all the products they bought during the daily timeframe. To do this, we’ll use ARRAY_AGG() and ARRAY_FLATTEN(). When we aggregate all the products they purchased via ARRAY_AGG()
, we'll have an array within an array. We can flatten this result to 1 array with ARRAY_FLATTEN()
.
EXAMPLE:
ARRAY_FLATTEN(ARRAY_AGG(purchased)) as purchased
Action Item: Delete all the code in the SQL transformation editor and apply the code snippet below. You'll notice we've applied the previous transformations and added Line 8:
PREVIEW BOX: You'll notice that there is a field called purchased
. This is where we'll see the product purchased for a specific day:
7. Go ahead and Apply the SQL transformations at the bottom of the SQL transformation page.
8. Keep the collection default settings as is. You can learn more about VI sizing and document retention by clicking on the links.
9. Create the collection at the bottom of the collection page.
NOTE: You can find us on the Rockset Community if you have questions or comments about the workshop.
Last updated