-
I am a data engineer, and my daily work involves a lot of feature engineering. I usually use Python pandas to build feature engineering, and then translate it into streaming calculations for online deployment. I am delighted to have found this project, and I am considering whether it's possible to construct a similar syntax in SQL. This would allow me to compute features offline using ClickHouse for model training and use the same set of SQL syntax to directly extract features for models during online deployment |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @yfclark , thanks for starting the discussion. Yes, one of the key strength of Proton is to provide the SQL-based analysis for both streaming data and historical data, or even processing both of them in a single SQL with CTE/subquery/JOIN/etc. Many of our users or customers are in the fintech area and they need to run so-called backtest to verify a certain pattern with massive historical data, as well as need to apply the similar logic/strategy for real-time, so that they can identify buy/sell signals with low latency. Before using Proton, they have to use Python or other batch framework for the "backtest" and refine the strategy. Once the design is confirmed, they may have to implement again with Flink or custom C++ code to run this in real-time. With Proton, they can build the same SQL for both backtest and streaming processing. This approach is very similar to what you described. You can also check the blog from out CTO at https://www.timeplus.com/post/real-time-machine-learning Be more specific, for example you have build a complex SQL for feature engineering and save this as a
Happy to discuss more in this thread, or in our Community Slack: https://timeplus.com/slack |
Beta Was this translation helpful? Give feedback.
Hi @yfclark , thanks for starting the discussion.
Yes, one of the key strength of Proton is to provide the SQL-based analysis for both streaming data and historical data, or even processing both of them in a single SQL with CTE/subquery/JOIN/etc.
Many of our users or customers are in the fintech area and they need to run so-called backtest to verify a certain pattern with massive historical data, as well as need to apply the similar logic/strategy for real-time, so that they can identify buy/sell signals with low latency. Before using Proton, they have to use Python or other batch framework for the "backtest" and refine the strategy. Once the design is confirmed, they may have to implemen…