-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: minimalistic storage layer setup #11
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
DemoImagine the following schema and query: CREATE TABLE t1(v1 INTEGER, v2 TEXT);
CREATE TABLE t2(v1 INTEGER, v2 TEXT);
SELECT * from t1 inner join t2 on t1.v1 = t2.v1 where t1.v2 = 'foo';
- LogicalFilter (on: t1.v2 = 'foo')
- child: LogicalJoin (inner, on t1.v1 = t2.v1)
- left: LogicalScan (t1)
- right: LogicalScan (t2) Besides adding it in initially, we also apply the join commutativity rule and register the equivalent of
$ cargo run --bin storage_demo_with_trait After running the demo, you will have: $ sqlite3 test_memo.db sqlite> select l.id, l.group_id, l.created_at, desc.name from logical_exprs as l, logical_op_kinds as desc where l.logical_op_kind_id = desc.id;
┌────┬──────────┬─────────────────────┬───────────────┐
│ id │ group_id │ created_at │ name │
├────┼──────────┼─────────────────────┼───────────────┤
│ 1 │ 1 │ 2025-01-24 05:52:45 │ LogicalScan │
│ 2 │ 2 │ 2025-01-24 05:52:45 │ LogicalScan │
│ 3 │ 3 │ 2025-01-24 05:52:45 │ LogicalJoin │
│ 4 │ 3 │ 2025-01-24 05:52:45 │ LogicalJoin │
│ 5 │ 4 │ 2025-01-24 05:52:45 │ LogicalFilter │
└────┴──────────┴─────────────────────┴───────────────┘
sqlite> select * from logical_joins;
┌─────────────────┬───────────┬──────┬───────┬───────────────┐
│ logical_expr_id │ join_type │ left │ right │ join_cond │
├─────────────────┼───────────┼──────┼───────┼───────────────┤
│ 3 │ 0 │ 1 │ 2 │ t1.v1 = t2.v1 │
│ 4 │ 0 │ 2 │ 1 │ t1.v1 = t2.v1 │
└─────────────────┴───────────┴──────┴───────┴───────────────┘
sqlite> select * from logical_scans;
┌─────────────────┬────────────┐
│ logical_expr_id │ table_name │
├─────────────────┼────────────┤
│ 1 │ t1 │
│ 2 │ t2 │
└─────────────────┴────────────┘
sqlite> select * from logical_filters;
┌─────────────────┬───────┬───────────────┐
│ logical_expr_id │ child │ predicate │
├─────────────────┼───────┼───────────────┤
│ 5 │ 3 │ t1.v2 = 'foo' │
└─────────────────┴───────┴───────────────┘ It is also fine if you run them multiple times. The created_at timestamp won't change. |
Signed-off-by: Yuchen Liang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM with the approach -- I could see some minor problems and we can figure out them later.
|
||
fn try_from(value: i32) -> Result<Self, Self::Error> { | ||
use JoinType::*; | ||
match value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using a 3rd party crate to reduce boilerplate code
|
||
let mut exprs = Vec::with_capacity(records.len()); | ||
|
||
for (record, name) in records { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like a lot of round trips to the database
} | ||
|
||
#[derive(Debug)] | ||
pub struct LogicalExprWithId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just FYI the old codebase doesn't have this and I'm unsure how useful it is to have it
fn insert_op(&self, id: LogicalExprId, storage: &mut StorageManager); | ||
|
||
/// Gets the logical operator kind id. | ||
fn op_kind(&self, storage: &mut StorageManager) -> LogicalOpKindId { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have plan to use the diesel async interface? sounds like we are starting with sync first and needs to do a significant refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking into how diesel-async works. Would there be a lot of problems besides adding in the await points?
/// Otherwise, it inserts the logical expression into the database and returns the generated logical expression id and | ||
/// the relational group id. | ||
fn add(&self, storage: &mut StorageManager) -> (LogicalExprId, RelGroupId) { | ||
if let Some((id, rel_group_id)) = self.get_identifiers(storage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do check and insert in one sql using insert on conflict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: wrap all these ops into a txn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do check and insert in one sql using
insert on conflict
?
Yea, I think to do this we probably need to build an index on the data columns and then you can start using upserts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also think that an in-memory caching layer is necessary to speed things up.
/// Gets the logical expression id if it is already in the database. | ||
fn id(&self, storage: &mut StorageManager) -> Option<LogicalExprId>; | ||
|
||
fn insert_op(&self, id: LogicalExprId, storage: &mut StorageManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this function do and any example using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used by add
to add an entry for per-operator tables. This reduces the LoC needed for each operator.
I also recommend having a single counter for both group id and expr id so that one number uniquely identifies an entity in the system. This helps with debugging. i.e., if we assign 1 as a group id, then 1 won't be used as an expression id. It is possible with Postgres's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass (super high level) review:
I added some comments about module file structure
In terms of style, I think we should be very intentional about naming. Things like Expr
are pretty self explanatory, but I actually have to think hard when stumbling across something like physical_op_kinds
and rel_group
-- it will be even harder for someone new jumping into this codebase. Default to spelling out everything unless it becomes unwieldy (like in the case of writing out the entire Expression
). Things like Rel
and OpKind
are, in my opinion, hard to grok.
@@ -0,0 +1,30 @@ | |||
//! The logical operator objects in the optimizer storage layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a mod.rs
file considering that we're going to have tens of operators and the directory structure is going to super difficult to navigate if the submodule root is located in a different place than the operator sub-submodules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer having a consistent project structure and avoid using mod. With the same convention, people will know where to find submodules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, and I'm saying that there is not much justification here in having a logical_operator.rs
file located so far away from the logical_operator
folder, especially since we're going to have a lot of modules.
See discussions:
- https://www.reddit.com/r/rust/comments/18pytwt/noob_question_foomodrs_vs_foors_foo_for_module/
- https://users.rust-lang.org/t/module-mod-rs-or-module-rs/122653
- https://internals.rust-lang.org/t/the-module-scheme-module-rs-file-module-folder-instead-of-just-module-mod-rs-introduced-by-the-2018-edition-maybe-a-little-bit-more-confusing/21977/17?u=zirconium-n
This is obviously not a black and white conversation, but in my opinion the only argument against having mod.rs
is that you have multiple files named the same thing. But with the last link listed above plus the fact that mod.rs
should really only be re-exporting stuff, I don't see any good reason to follow this.
@@ -0,0 +1,30 @@ | |||
//! The physical operator objects in the optimizer storage layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be a mod.rs
file
pub mod common; | ||
pub mod logical_expr; | ||
pub mod logical_operators; | ||
pub mod physical_expr; | ||
pub mod physical_operators; | ||
pub mod rel_group; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs to be in a mod.rs
file
@@ -0,0 +1,116 @@ | |||
// @generated automatically by Diesel CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I know that this is code generated, but is it possible to put comments here anyways? The workflow that worked for the codegen from sea-orm was that we added extra stuff into this file (like comments and lints), and when the codegen overwrote the extra stuff we just used git to put them back
@@ -91,7 +91,7 @@ jobs: | |||
uses: dtolnay/install@cargo-docs-rs | |||
- name: cargo docs-rs | |||
# TODO: Once we figure out the crates, rename this. | |||
run: cargo docs-rs -p optd-tmp | |||
run: cargo docs-rs -p optd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still personally think that this should be called optd-storage
instead of just optd
because it will become confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think last time we decided initially we are going to have a single crate called optd
. storage
will be a module inside that crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, then we arguably shouldn't have a workspace at all.
With what @SarveshOO7 was working on, I think it makes sense to have a separate crate for everything around the pipeline we talked about because we do not want to have to recompile datafusion every single time we make a change.
I think either we completely remove the workspace and call it optd
, or we start with 2 crates (something like optd-core
and optd-datafusion
) and make sure we don't expand beyond 2 without good reason.
resolver = "2" | ||
|
||
[workspace.dependencies] | ||
anyhow = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to use snafu
given the service-like-library nature of this project
#[diesel(belongs_to(RelGroup))] | ||
#[diesel(belongs_to(LogicalOpKind))] | ||
#[diesel(check_for_backend(diesel::sqlite::Sqlite))] | ||
pub struct LogicalExprRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ever talked about this type, this seems to be a new table? I see that it is used in one other place in get_all_logical_exprs_in_group
, but I don't immediately see the purpose of this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the flat record type on the logical_exprs table. You need this and the specific operator table to get out the full LogicalExpr
enum. Subject to change but I would argue this does not change to overall API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, can you make the visibility pub(super)
or whatever is the lowest visibility for the memo table to have access to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a storage_manager
submodule in the storage
module that only contains the StorageManager
type and impl
Signed-off-by: Yuchen Liang <[email protected]>
Signed-off-by: Yuchen Liang <[email protected]>
@@ -0,0 +1,8 @@ | |||
-- The physical operator descriptor table stores all the | |||
-- physical operators that can be used in the optimizer. | |||
CREATE TABLE physical_op_kinds ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, I think having an enum here might be more practical. It will also make all the queries simpler and safe us a join. Might as well codegen everywhere, right?
This table will never change at run time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll experiment with that during the WE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, even though I haven't actually given the logic of this PR a review, we should be pushing as much work as we can to compile-time / codegen / preprocessing, especially since we want the foundation to be minimal at runtime so that we don't run into a wave of runtime bugs in the future
.select(logical_op_kinds::id) | ||
.first::<LogicalOpKindId>(&mut storage.conn)?; | ||
|
||
let logical_filter_id = logical_op_kinds::table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we keep the table approach, I guess we could have a big function that loads all of them & caches the ids in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also always carry the operator kind.
-- which group a logical expression belongs to. | ||
CREATE TABLE logical_exprs ( | ||
-- The logical expression id. | ||
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with Chi's suggestion of having a unique ID for each object we shouldn't use autoincrement here
-- The logical expression id. | ||
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, | ||
-- The logical operator descriptor id. | ||
logical_op_kind_id BIGINT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably doesn't need to be BIGINT
here? Or do foreign key references HAVE to be BIGINT
?
@@ -0,0 +1,8 @@ | |||
-- A relational group contains a set of relational expressions | |||
-- that are logically equivalent. | |||
CREATE TABLE rel_groups ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think expand out the "relational" here, and also be consistent with what the migration file directory is called too
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, | ||
-- Time at which the group is created. | ||
created_at TIMESTAMP DEFAULT (CURRENT_TIMESTAMP) NOT NULL | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add TODOs here saying that we will add other columns like an optional winner, potentially cost, and also other group metadata related to our union-find parent pointer idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there might be an unnecessary level of indirection here?
For the sake of this example, suppose we only have logical expression types. My understanding of this is that there is a logical_exprs
table that tracks expression IDs and maps them to their logical operator kind (Scan, Filter, Join). The purpose of this is so that every logical expression ID lookup can be paired with a logical operator kind so that we know what specific table to go look at.
What if instead of having a full table that makes this mapping, we instead encode the operator kind inside of the expression ID? For example, we could have it such that the upper 16 bits of a 64-bit ID encode the operator kind, and the lower 48 bits encode the unique expression given the operator kind (we would likely not even need all 16 bits, we could probably get away with the top 8 bits even).
You could probably argue that this introduces complexity, but in my mind I feel that having extra tables that are not strictly necessary introduces even more complexity. When I was trying to read this PR I had to spend a non-trivial amount of time trying to understand the architecture, since I did not expect that specific table when we initially talked about the architecture in previous meetings.
In terms of implementation of the above proposal, yes it would be a bit ugly, but if abstracted correctly at the storage layer level, there should only really be 1 function that handles the conversion into some sort of strongly-typed struct like this:
struct ExprId {
kind: u16, // could also encode logical/physical difference before converting into stronger types
id: u64, // truncated at 48 bits
}
This is opposed to having a dedicated table that makes this mapping, where every single expression lookup now has to look up and extra record on disk (and even in memory, this roundtrip is going to need to happen for every single lookup during optimization).
As for the group ID mapping, I think that needs to be in a separate junction table regardless for group operations. Unless this is actually supposed to be the junction table? If that's the case, then all 3 things in this table would have needed to be a composite primary key tuple.
I think there are tradeoffs to both models, but I am leaning towards removing the layer of indirection.
Problem
Persistence storage is on top of the optd optimizer wishlist.
Summary of changes
Future works
This PR unlocks the opportunity for people to work on rule matching. A fully sketched-out schema will be implemented gradually this week and next week.