-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal Documentation Coverage Endpoint #1584
base: dev
Are you sure you want to change the base?
Conversation
Proposed SQL for this endpoint
|
Sonar is failing on a security flaw around using HTTP instead of HTTPS, but it is for a test. This is how it's done in other tests too. |
…ta into sig_doc_coverage
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is good stuff, its nice and tight! i wouldnt have even thought about going with QueryBuilder and a db VIEW, but together they work well in getting the job done.
ive got a number of smallish suggestions that you can hopefully just hit "accept" for, a couple that you can ignore if you want, and a couple for removing duplicate results. for a bigger change, you should at least one more test case (which shouldnt be that complicated if you use CovidcastTestRow
).
el.signal_key_id, | ||
el.geo_key_id, | ||
MIN(el.time_value) as min_time_value, | ||
MAX(el.time_value) as max_time_value | ||
FROM covid.epimetric_latest el | ||
GROUP BY el.signal_key_id, el.geo_key_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this less busy without the table alias:
el.signal_key_id, | |
el.geo_key_id, | |
MIN(el.time_value) as min_time_value, | |
MAX(el.time_value) as max_time_value | |
FROM covid.epimetric_latest el | |
GROUP BY el.signal_key_id, el.geo_key_id; | |
signal_key_id, | |
geo_key_id, | |
MIN(time_value) AS min_time_value, | |
MAX(time_value) AS max_time_value | |
FROM covid.epimetric_latest | |
GROUP BY signal_key_id, geo_key_id; |
@@ -561,3 +561,32 @@ def retrieve_covidcast_meta_cache(self): | |||
for entry in cache: | |||
cache_hash[(entry['data_source'], entry['signal'], entry['time_type'], entry['geo_type'])] = entry | |||
return cache_hash | |||
|
|||
def compute_coverage_crossref(self): | |||
"""Compute coverage_crossref table.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compute coverage_crossref table.""" | |
"""Compute coverage_crossref table, for looking up available signals per geo or vice versa.""" | |
self._cursor.execute(coverage_crossref_delete_sql) | ||
logger.info(f"coverage_crossref_delete_sql:{self._cursor.rowcount}") | ||
|
||
self._cursor.execute(coverage_crossref_update_sql) | ||
logger.info(f"coverage_crossref_update_sql:{self._cursor.rowcount}") | ||
self.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging changes to make the value easier to extract and use in elastic, and another log message to get commit timing:
self._cursor.execute(coverage_crossref_delete_sql) | |
logger.info(f"coverage_crossref_delete_sql:{self._cursor.rowcount}") | |
self._cursor.execute(coverage_crossref_update_sql) | |
logger.info(f"coverage_crossref_update_sql:{self._cursor.rowcount}") | |
self.commit() | |
self._cursor.execute(coverage_crossref_delete_sql) | |
logger.info("coverage_crossref_delete", rows=self._cursor.rowcount) | |
self._cursor.execute(coverage_crossref_update_sql) | |
logger.info("coverage_crossref_update", rows=self._cursor.rowcount) | |
self.commit() | |
logger.info("coverage_crossref committed") |
@@ -78,6 +78,22 @@ def test_update_covidcast_meta_cache_query(self): | |||
self.assertIn('timestamp', sql) | |||
self.assertIn('epidata', sql) | |||
|
|||
def test_compute_coverage_crossref_query(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test doesnt add a lot of value, but i cant see a good way to effectively unit test a method thats just a few static sql statements, so you can delete this if you want (but leaving it around doesnt really hurt either) ¯\_(ツ)_/¯
@@ -0,0 +1,87 @@ | |||
"""Integration tests for covidcast's metadata caching.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Integration tests for covidcast's metadata caching.""" | |
"""Integration tests for the covidcast `geo_coverage` endpoint.""" |
|
||
q.apply_geo_filters("geo_type", "geo_value", geo_sets) | ||
q.set_sort_order("source", "signal") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is important to add, otherwise the response will grow ~linearly with the number of geos requested.
if we were writing straight SQL, i think using SELECT DISTINCT
would be preferable to GROUP BY
for clarity, though they should be virtually equivalent... but i dont think we can do that without modifying the QueryBuilder class -- this should work while using existing QB features:
q.group_by = fields_string # this condenses duplicate results, similar to `SELECT DISTINCT` | |
'epidata': [{'signal': 'sig', 'source': 'src'}, | ||
{'signal': 'sig', 'source': 'src'}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'epidata': [{'signal': 'sig', 'source': 'src'}, | |
{'signal': 'sig', 'source': 'src'}], | |
'epidata': [{'signal': 'sig', 'source': 'src'}], |
i have a suggestion in src/server/endpoints/covidcast.py
for doing a "group by" that should remove this duplication
# update the coverage crossref table | ||
main() | ||
|
||
results = self._make_request() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should do an additional non-wildcard request (or two) as well
|
||
|
||
logger.info( | ||
"Generated and updated covidcast metadata", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Generated and updated covidcast metadata", | |
"Generated and updated covidcast geo/signal coverage", |
def main(database_impl=Database): | ||
"""Updates the table for the `coverage_crossref`.""" | ||
|
||
logger = get_structured_logger("coverage_crossref_updater") | ||
start_time = time.time() | ||
database = database_impl() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should remove this since the dependency injection doesnt actually get used elsewhere
def main(database_impl=Database): | |
"""Updates the table for the `coverage_crossref`.""" | |
logger = get_structured_logger("coverage_crossref_updater") | |
start_time = time.time() | |
database = database_impl() | |
def main(): | |
"""Updates the table for the `coverage_crossref`.""" | |
logger = get_structured_logger("coverage_crossref_updater") | |
start_time = time.time() | |
database = Database() |
addresses issue(s) #1583
Summary:
Create a new endpoint that returns a list of signals when given a geo. There were other endpoints that did this in the past, but the tables they read from are no longer around in epi v4. Starting from scratch will be much quicker and rely less on legacy code.
Once this is merged, I will also make issues to delete the old coverage code so that we don't have duplicate endpoints.
Prerequisites:
dev
branchdev