-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle dash for xff, and region id starting the path #712
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,6 +92,7 @@ Resources: | |
const crypto = require('crypto'); | ||
|
||
const IPV4_MASK = /\.[0-9]{1,3}$/; | ||
|
||
const maskIp = (ip, field) => { | ||
if (ip.match(IPV4_MASK)) { | ||
return ip.replace(IPV4_MASK, '.0'); | ||
|
@@ -119,6 +120,16 @@ Resources: | |
}); | ||
}; | ||
|
||
const findIp = (xff, ip) => { | ||
if (xff === '-') { | ||
return ip; | ||
} else if (xff) { | ||
return xff.split(',').map(s => s.trim()).filter(s => s)[0]; | ||
} else { | ||
return ip; | ||
} | ||
}; | ||
|
||
const PODCAST_IDS = process.env.PODCAST_IDS.split(',').map(s => s.trim()).filter(s => s); | ||
|
||
const IGNORE_PATHS = ['/', '/favicon.ico', '/robots.txt']; | ||
|
@@ -146,6 +157,12 @@ Resources: | |
// podcast id and episode guid (only works for dovetail3-cdn requests) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not related directly to this change but ... For re-processing purposes, it may be useful to also have this lambda log what S3 input file it's processing, and how many rows it had. Just above this line somewhere: console.info(`Read ${rows.length} rows from s3://${Bucket}/${Key}`); |
||
const datas = mappedRows.filter(data => { | ||
const parts = data['cs-uri-stem'].split('/').filter(s => s); | ||
|
||
// if the path starts with a region like usw2, shift that off | ||
if (parts[0] && parts[0].match(/^[a-z][a-z0-9\-]+$/)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the other bug, that we have requests with an aws region name prefix, like |
||
parts.shift(); | ||
} | ||
|
||
if (parts.length === 4) { | ||
data['prx-podcast-id'] = parts[0]; | ||
data['prx-episode-guid'] = parts[1]; | ||
|
@@ -163,8 +180,7 @@ Resources: | |
// calculate listener_ids | ||
datas.forEach(data => { | ||
// use leftmost XFF or IP | ||
const xffParts = (data['x-forwarded-for'] || '').split(',').map(s => s.trim()).filter(s => s); | ||
const leftMostIp = xffParts[0] || data['c-ip']; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this comparison was picking the dash, |
||
const leftMostIp = findIp(data['x-forwarded-for'], data['c-ip']); | ||
|
||
// truncate ipv6 but not ipv4 | ||
const truncatedIp = leftMostIp.includes(':') ? maskIp(leftMostIp, 'listener-id') : leftMostIp; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bug that made all the hashed ips the same - the xff is coming through most of the time as
-
, which is not blank, but also not an ip. This dash was being used as the ip instead of the client ipThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof - good catch. I think I had similar in the counts-lambda, but apparently forgot about it here.