Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] rework destructurizing queries #1

Open
alex-nax opened this issue Nov 25, 2022 · 0 comments
Open

[Bug] rework destructurizing queries #1

alex-nax opened this issue Nov 25, 2022 · 0 comments
Labels
bug Something isn't working

Comments

@alex-nax
Copy link
Member

-- Past Experiments
WITH __exposures0 as (
SELECT
cast('anonymous_id' as varchar) as exposure_query,
experiment_id,
experiment_id as experiment_name,
cast(variation_id as varchar) as variation_id,
cast(variation_id as varchar) as variation_name,
date_trunc('day', timestamp) as date,
count(distinct anonymous_id) as users
FROM
(
SELECT
$amplitude_id as anonymous_id,
event_time as timestamp,
event_properties: experiment_id as experiment_id,
event_properties: variation_id as variation_id,
device_family as device,
os_name as os,
country,
paying
FROM
$events
WHERE
event_type = 'Experiment Viewed'
AND $amplitude_id is not null
) e0
WHERE
timestamp > from_iso8601_timestamp('2021-10-14T12:07:40.136Z')
GROUP BY
experiment_id,
variation_id,
date_trunc('day', timestamp)
),
__exposures1 as (
SELECT
cast('user_id' as varchar) as exposure_query,
experiment_id,
experiment_id as experiment_name,
cast(variation_id as varchar) as variation_id,
cast(variation_id as varchar) as variation_name,
date_trunc('day', timestamp) as date,
count(distinct user_id) as users
FROM
(
SELECT
user_id as user_id,
event_time as timestamp,
event_properties: experiment_id as experiment_id,
event_properties: variation_id as variation_id,
device_family as device,
os_name as os,
country,
paying
FROM
$events
WHERE
event_type = 'Experiment Viewed'
AND user_id is not null
) e1
WHERE
timestamp > from_iso8601_timestamp('2021-10-14T12:07:40.136Z')
GROUP BY
experiment_id,
variation_id,
date_trunc('day', timestamp)
),
__experiments as (
SELECT
*
FROM
__exposures0
UNION ALL
SELECT
*
FROM
__exposures1
),
__userThresholds as (
SELECT
exposure_query,
experiment_id,
MIN(experiment_name) as experiment_name,
variation_id,
MIN(variation_name) as variation_name,
-- It's common for a small number of tracking events to continue coming in
-- long after an experiment ends, so limit to days with enough traffic
max(users) * 0.05 as threshold
FROM
__experiments
WHERE
-- Skip days where a variation got 5 or fewer visitors since it's probably not real traffic
users > 5
GROUP BY
exposure_query,
experiment_id,
variation_id
),
__variations as (
SELECT
d.exposure_query,
d.experiment_id,
MIN(d.experiment_name) as experiment_name,
d.variation_id,
MIN(d.variation_name) as variation_name,
MIN(d.date) as start_date,
MAX(d.date) as end_date,
SUM(d.users) as users
FROM
__experiments d
JOIN __userThresholds u ON (
d.exposure_query = u.exposure_query
AND d.experiment_id = u.experiment_id
AND d.variation_id = u.variation_id
)
WHERE
d.users > u.threshold
GROUP BY
d.exposure_query,
d.experiment_id,
d.variation_id
)
SELECT
*
FROM
__variations
WHERE
-- Skip experiments at start of date range since it's likely missing data
date_diff(
'day',
from_iso8601_timestamp('2021-10-14T12:07:40.136Z'),
start_date
) > 2
ORDER BY
experiment_id ASC,
variation_id ASC

@alex-nax alex-nax added the bug Something isn't working label Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant