Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chore: Added stage tables for portal pageviews #1267

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
2e0cdbc
Added stage tables
munish7771 May 2, 2023
f0acfe5
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
3b05b3d
Revert "Update base_portal_prod__pageviews.sql"
munish7771 May 2, 2023
3473a19
Revert "Added stage tables"
munish7771 May 2, 2023
1ba4685
Update stage models
munish7771 May 2, 2023
d9d1475
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
8e06fc1
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
76e37d9
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
e499ede
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
8a75f26
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
024e972
Update base_portal_prod__pageviews.sql
munish7771 May 2, 2023
67d054c
Added new base models
munish7771 May 2, 2023
71c2d47
Update base_portal_prod__identifies.sql
munish7771 May 2, 2023
b681562
Update stg_portal_prod__pageviews.sql
munish7771 May 2, 2023
6a9748d
Added Intermediate model
munish7771 May 3, 2023
3c56450
Update int_portal_prod_signups.sql
munish7771 May 3, 2023
864850e
Updated intermediate model
munish7771 May 3, 2023
3a7ff3e
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
cad413a
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
fda7444
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
17a8feb
Some more changes
munish7771 May 3, 2023
c01aa85
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
ac2c852
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
280b8dc
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
7794524
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
3d0d7c1
Test changes
munish7771 May 3, 2023
e5c35df
Update int_portal_prod_signups_aggregated_to_users.sql
munish7771 May 3, 2023
5c7cb65
Added documentation
munish7771 May 3, 2023
e20fe5d
Update _portal_prod__models.yml
munish7771 May 3, 2023
82e5231
Update _int_signup__models.yml
munish7771 May 3, 2023
5838893
Update int_signups_aggregated_to_users.sql
munish7771 May 3, 2023
3e3426c
Update int_signups_aggregated_to_users.sql
munish7771 May 3, 2023
4375545
Updated base models
munish7771 May 4, 2023
0f171eb
Updated intermediate tables
munish7771 May 4, 2023
5fcf837
Update stg_portal_prod__identifies.sql
munish7771 May 4, 2023
e55f634
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
fe86592
Create int_user_signup_stages.sql
munish7771 May 4, 2023
8d8052c
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
14557c9
Update int_user_signup_stages.sql
munish7771 May 4, 2023
26ac99a
Update int_user_signup_stages.sql
munish7771 May 4, 2023
f64a115
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
8b5aa68
Update int_user_signup_stages.sql
munish7771 May 4, 2023
1ac6047
Update int_user_signup_stages.sql
munish7771 May 4, 2023
23427ec
Update int_user_signup_stages.sql
munish7771 May 4, 2023
b618102
Update int_user_signup_stages.sql
munish7771 May 4, 2023
0243f29
Update _int_signup__models.yml
munish7771 May 4, 2023
4f1d7be
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
fb51c91
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
24842b5
Update int_rudder_portal_user_mapping.sql
munish7771 May 4, 2023
ddccc72
Changed event table name
munish7771 May 4, 2023
e9cbd79
Review changes
munish7771 May 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
version: 2
Copy link
Contributor

@ifoukarakis ifoukarakis May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better place to place signup related models can be under the name of the team responsible for this flow. Placing them under data_eng feels a bit strange.


models:

- name: int_signups_aggregated_to_users
description: User signup stages, aggregated by users.

columns:
- name: portal_customer_id
description: Customer identifier that joins to customer info coming from stripe.
tests:
ifoukarakis marked this conversation as resolved.
Show resolved Hide resolved
- not_null
- unique
- name: account_created
description: Boolean value indicating if the user created the account.
- name: email_verified
description: Boolean value indicating if the user verified the email address.

- name: int_rudder_portal_user_mapping
catalintomai marked this conversation as resolved.
Show resolved Hide resolved
description: This table provide a list of distinct user and portal customer ID pairs with a non-null value for both IDs.

columns:
- name: user_id
description: User ID coming from rudder events.
tests:
- unique
- name: portal_customer_id
- unique
description: Customer ID in portal that joins to rudder user ID.
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{{
config({
"materialized": "table"
})
}}

SELECT
user_id,
portal_customer_id,
min(timestamp) as first_seen_at,
max(timestamp) as last_seen_at
FROM
{{ ref('stg_portal_prod__identifies') }}
WHERE
portal_customer_id IS NOT NULL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use same ordering like in the SELECT

-- getting values as `N/A` from portal
AND portal_customer_id NOT IN ('N/A')
AND user_id IS NOT NULL
GROUP BY
1,2
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{{
config({
"materialized": "table"
})
}}
WITH rudder_portal_user_mappings as (
SELECT
user_id,
portal_customer_id
FROM
{{ ref('int_rudder_portal_user_mapping') }}
), pageview_create_workspace as (
SELECT
user_id,
pageview_id,
timestamp
FROM
{{ ref('stg_portal_prod__pageview_create_workspace') }}
), user_signup_stages as (
SELECT
portal_customer_id,
-- Account is created when portal_customer_id exists
true AS account_created,
-- Email is verified and user is redirected to `pageview_create_workspace` screen.
CASE
WHEN pageview_create_workspace.pageview_id IS NOT NULL
THEN TRUE
ELSE FALSE
END AS email_verified,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps is_email_verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the is_ prefix.

timestamp
FROM
rudder_portal_user_mappings
LEFT JOIN
pageview_create_workspace
ON pageview_create_workspace.user_id = rudder_portal_user_mappings.user_id
)

select
portal_customer_id,
account_created,
email_verified
from user_signup_stages
qualify row_number() over (partition by portal_customer_id order by timestamp) = 1
Copy link
Contributor

@ifoukarakis ifoukarakis May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very sure on email_verified specification is.

Here's an example of two :

| portal_customer_id | email_verified | timestamp  |
|--------------------|----------------|------------|
| 1                  | false          | 2023-05-01 |
| 1                  | true           | 2023-05-02 |
| 1                  | false          | 2023-05-03 |
| 2                  | true           | 2023-05-02 |
| 2                  | false          | 2023-05-03 |
  • With quailify, resulting email_verified will be false.
  • With i.e. group by portal_customer_id and max(email_verified), email_verified will be true .

1/5 which one should be used.

Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,30 @@ models:
- name: received_at
description: Timestamp registered by RudderStack when the event was ingested (received).

- name: stg_portal_prod__identifies
description: |
Contains mapping from rudder user_id to portal user_id

columns:
- name: user_id
description: The ID of the user that sent the event.
- name: portal_customer_id
description: Portal customer id for the user
- name: context_traits_portal_customer_id
description: Duplicate of portal_customer_id ingested via context traits, required since sometimes portal_customer_id is null
- name: timestamp
description: Timestamp value associated with event.

- name: stg_portal_prod__pageview_create_workspace
description: |
Contains rudder data coming from `pageview_create_workspace`.

columns:
- name: pageview_id
description: The pageview ID of the event.
- name: user_id
description: The ID of the user that sent the event.
- name: event_table
description: The event table that records the event request.
- name: timestamp
description: Timestamp value associated with event.
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ sources:
- name: pageview_deployment_selection
- name: pageview_getting_started
- name: pageview_verify_email
description: Page in the signup flow that user reaches after email is validated.
- name: password_validation_error
- name: purchase_complete_create_portal_account
- name: purchase_fail_clicked_try_again
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
with source as (

select * from {{ source('portal_prod', 'identifies') }}

),

renamed as (

select
user_id,
coalesce(portal_customer_id,context_traits_portal_customer_id) as portal_customer_id,
timestamp

from source

)

select * from renamed
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
with source as (

select * from {{ source('portal_prod', 'pageview_create_workspace') }}

),

renamed as (

select
id as pageview_id,
user_id,
event as event_table,
timestamp

from source

)

select * from renamed