Replies: 2 comments
-
@YesOrNo828 maybe we can take a look at this topic. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@nicochen Thanks for your input. This topic is interesting. And I have some questions:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here is a case that data throughput have significant difference between day and night. A flink job has an immutable parallelism of sinks once started. Take an un-keyed table as an example, 'none' distribution policy is select which means best utilization of all parallelism, while it creates biggest number of files even through small quality of data comes.
Therefore, I‘d like to propose a new distribution policy to dynamically allocate writing parallelism in streaming writing according to number of data. In this way, we can not only relieve the pressure from small files in hdfs, but also increate the efficiency of reading and optimisation work.
Beta Was this translation helpful? Give feedback.
All reactions