-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slack ts
and thread_ts
inconsistent types
#412
Comments
I can make the PR for the changes, just let me know what direction to go! |
Hey @zilto good point! Thanks a lot for the suggestion to create a PR, this would be wonderful! Just put me as a reviewer :) |
I started making these changed to open a PR, but I encountered another challenge. In a few places, we have created_at: dlt.sources.incremental[DateTime] = dlt.sources.incremental(
"ts",
initial_value=start_dt,
end_value=end_dt,
allow_external_schedulers=True,
), I envision 2 solutions:
I could implement solution 1, but would need some additional guidance for solution 2. Let me know how to proceed |
Hi @zilto!
Or make everything a timestamp, as you've suggested in 1 solution. If you have some code ready -- could you open a PR? Or if you don't have a time to continue with this, just let me know, we'll assign someone on this issue! |
dlt version
0.4.7
Source name
slack
Describe the problem
Values by
get_messages()
andget_thread_replies()
don't return the same data types for fieldts
andthread_ts
. Values are returned astimestamp
for the first andstring
for the latter.This is problematic when trying to join tables of messages and replies based on their
thread_ts
(thread id), which is a very common operation.This is because
get_messages()
passesdatetime_fields=MSG_DATETIME_FIELDS
whereasget_thread_replies()
doesn't.Expected behavior
ts
andthread_ts
should both receive the same type fromMSG_DATETIME_FIELDS
More importantly, according to Slack specs,
ts
andthread_ts
are not timestamps andstring
is actually the proper type. (see ref)Given
ts
andthread_ts
do not exactly represent a timestamp but rather are unique ids that can be sorted chronologically, I just removing them from the default values ofMSG_DATETIME_FIELDS
.This would be a breaking change for the
message
tables, but not forreplies
tables, so it would the right time to introduce the change to defaults if accepted.Steps to reproduce
dlt init slack
How you are using the source?
I run this source for fun.
Operating system
Linux
Runtime environment
Local
Python version
3.10.9
dlt destination
duckdb
Additional information
As a solution, I manually change type of
ts
andthread_ts
of messages fromtimestamp
tostring
The text was updated successfully, but these errors were encountered: