Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add native support to bitstring #577

Merged
merged 38 commits into from
Feb 27, 2024

Conversation

Gigitsu
Copy link
Contributor

@Gigitsu Gigitsu commented Nov 25, 2023

This is the companion PR of elixir-ecto/ecto#4328

@greg-rychlewski
Copy link
Member

Would you be able to update mix.exs so it points to your ecto branch? Basically change this part:

{:ecto, git: "https://github.com/elixir-ecto/ecto.git"}

to

{:ecto, git: "https://github.com/gigitsu/ecto.git", branch: "bitstring branch"}

Then run mix deps.update ecto

That way we could ensure the unit tests pass :)

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 26, 2023

Done! Sort of :)

There wasn't any {:ecto, git: "https://github.com/elixir-ecto/ecto.git"} in the repo but only {:ecto, "~> 3.11.0"} ( here ). Furthermore, the bitstring branch is only on my fork so I temporarily changed the main repo and branch just to verify the tests passes.

I wasn't sure ho to do this.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 26, 2023

I believe we will have an issue casting to the bitstring type. I was playing around with it and it seems like you must specify the number of bits when casting. If you omit it in postgres, like ::bit it will silently only take one bit.

Not sure atm what the best thing to do is. The only thing I can think is creating a parameterized type for bitstrings that requires the size parameter and has underlying type like {:bitstring, n}.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 26, 2023

I believe we will have an issue casting to the bitstring type. I was playing around with it and it seems like you must specify the number of bits when casting. If you omit it in postgres, like ::bit it will silently only take one bit.

Not sure atm what the best thing to do is. The only thing I can think is forcing a size when casting, like {:bitstring, n}. Though we would have to make the schema type that was well, in case casting is done to the field type. Other than that the only thing I can think of is to disallow casting to it. Would want to see what Jose thinks before choosing an option.

Yeah you need to specify the size when casting a bit column in Postgres. I'd like to be able to specify a size when casting.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 26, 2023

Another thing I just realized we need to think about is literals. I think we always have to tag them in Ecto like with binaries: https://github.com/elixir-ecto/ecto/blob/23c40e3d9b95ac4238b5417c4e3bb7329a0cc8b0/lib/ecto/query/builder.ex#L774

And then handle them specially in the adapter like we do for binary: https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L1013

This just popped into my head though I didn't have time to play around with it yet. But we need to consider it.

This is used, for example, when you do something like this from q in query, where: q.bitstring_field == <<4::4>>

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 27, 2023

Another thing I just realized we need to think about is literals. I think we always have to tag them in Ecto like with binaries: https://github.com/elixir-ecto/ecto/blob/23c40e3d9b95ac4238b5417c4e3bb7329a0cc8b0/lib/ecto/query/builder.ex#L774

And then handle them specially in the adapter like we do for binary: https://github.com/elixir-ecto/ecto_sql/blob/master/lib/ecto/adapters/postgres/connection.ex#L1013

This just popped into my head though I didn't have time to play around with it yet. But we need to consider it.

This is used, for example, when you do something like this from q in query, where: q.bitstring_field == <<4::4>>

I attempted to tag bitstrings similarly to binaries, but I couldn't find a reliable way to differentiate between them. My only option is to check whether the size is a multiple of 8 (indicating a binary) or not (indicating a bitstring), but this appears to be a weak distinction. Do you have any ideas?

Nevertheless, I also suspect that binary literals might suffice for bitstrings as well. I'll do some integration tests to verify this.

@greg-rychlewski
Copy link
Member

I attempted to tag bitstrings similarly to binaries, but I couldn't find a reliable way to differentiate between them. My only option is to check whether the size is a multiple of 8 (indicating a binary) or not (indicating a bitstring), but this appears to be a weak distinction. Do you have any ideas?

Could you point to some of the places where you'd need to make such a distinction but not know the underlying type? I can take a look.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 27, 2023

The process to tag literal starts with:

# literals
  def escape({:<<>>, _, args} = expr, type, params_acc, vars, _env) do
    valid? = Enum.all?(args, fn
      {:"::", _, [left, _]} -> is_integer(left) or is_binary(left)
      left -> is_integer(left) or is_binary(left)
    end)

    unless valid? do
      error! "`#{Macro.to_string(expr)}` is not a valid query expression. " <>
             "Only literal binaries and strings are allowed, " <>
             "dynamic values need to be explicitly interpolated in queries with ^"
    end

    {literal(expr, type, vars), params_acc}
  end

where the operator :<<>> is matched (here)

Then literal(expr, type, vars) is called and, in turn, it calls quoted_type(expr, vars)

defp literal(value, expected, vars),
    do: do_literal(value, expected, quoted_type(value, vars))

...

def quoted_type({:<<>>, _, _}, _vars), do: :binary`

that returns :binary.

Eventually, the literal is tagged with type :binary:

defp do_literal(value, _, current) when current in [:binary],
    do: {:%, [], [Ecto.Query.Tagged, {:%{}, [], [value: value, type: current]}]}

(it happens here)

In this process, I can't find a neat way to distinguish between bitstrings and binaries, since the operator for both types is the same (:<<>>) and some bitstrings are also a valid binary (eg. is_binary(<<5>>) = true)

I need that type: current to be :bitstring if I want convert it into a Postgres bit literal here.

@greg-rychlewski
Copy link
Member

Thanks that's really helpful. I will take a closer look.

@warmwaffles
Copy link
Member

I'll need to play with this to see if bitstrings are possible in sqlite. I think I'll have to use BLOB for this. If not, I'll have to make it just raise an exception saying it is unsupported.

@greg-rychlewski
Copy link
Member

In this process, I can't find a neat way to distinguish between bitstrings and binaries, since the operator for both types is the same (:<<>>) and some bitstrings are also a valid binary (eg. is_binary(<<5>>) = true)

@Gigitsu I think the best we can do here is check is_binary on the literal and if it's true call it :binary and if not :bitstring. And if the user really wants to treat it as :bitstring they can wrap it in type/2 like type(^<<5>>, :bitstring). I'm not seeing any other way.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 28, 2023

I'll need to play with this to see if bitstrings are possible in sqlite. I think I'll have to use BLOB for this. If not, I'll have to make it just raise an exception saying it is unsupported.

@warmwaffles from a quick Google search, it seems that in SQLite bitwise operators works on numeric types and not on BLOB so I wonder if it could be the right choice to use a numeric type, otherwise raise an exception.

@Gigitsu I think the best we can do here is check is_binary on the literal and if it's true call it :binary and if not :bitstring. And if the user really wants to treat it as :bitstring they can wrap it in type/2 like type(^<<5>>, :bitstring). I'm not seeing any other way.

@greg-rychlewski I believe I'm facing my knowledge limitations on Elixir macro on this. :)

Both here

  def escape({:<<>>, _, args} = expr, type, params_acc, vars, _env) do
    valid? = Enum.all?(args, fn
      {:"::", _, [left, _]} -> is_integer(left) or is_binary(left)
      left -> is_integer(left) or is_binary(left)
    end)

    unless valid? do
      error! "`#{Macro.to_string(expr)}` is not a valid query expression. " <>
             "Only literal binaries and strings are allowed, " <>
             "dynamic values need to be explicitly interpolated in queries with ^"
    end

    {literal(expr, type, vars), params_acc}
  end

and here

  # Tagged
  def quoted_type({:<<>>, _, _}, _vars), do: :binary
  def quoted_type({:type, _, [_, type]}, _vars), do: type

the expr is an AST {eg. {:<<>>, [line: 1], [1, 2, 3]}) so I cannot use is_binary on that, it always returns false.

I need to transform the AST into actual code to test it with is_binary, don't I?

My other option is to add a case match here like this

    defp expr(%Ecto.Query.Tagged{value: binary, type: :binary}, _sources, _query)
         when is_binary(binary) do
      ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
    end

    defp expr(%Ecto.Query.Tagged{value: binary, type: :binary}, _sources, _query)
         when is_bitstring(binary) do
      ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
    end

but the tagged type will always be :binary

@greg-rychlewski
Copy link
Member

def escape({:<<>>, _, args} = expr, type, params_acc, vars, _env) do
...

I think we can leave this one as is, it should just catch that users are not supplying variables (i.e. <<var>> or <<var::1>>). It doesn't look to me like it matters if it's a binary or bitstring.

 def quoted_type({:<<>>, _, _}, _vars), do: :binary

The only way I know to check here is using Code.eval_quoted. Like this:

def quoted_type({:<<>>, _, _} = expr, _, _vars) do
  if is_binary(Code.eval_quoted(expr)) do
    :binary
  else
    :bitstring
  end
end

Though I don't know how bad this is considered and would need a second opinion. It might be ok since we know it is a literal. There might also be a better strategy than the one I'm suggesting but I need to think more and possibly get other opinions.

defp expr(%Ecto.Query.Tagged{value: binary, type: :binary}, _sources, _query)
       when is_binary(binary) do
    ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
  end

Yes, we will have to create a clause like this for bitstring. Because that encoding will fail for non-binaries and also that representation will not be allowed to be cast to bit types in Postgres.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 28, 2023

def quoted_type({:<<>>, _, _} = expr, _, _vars) do
  if is_binary(Code.eval_quoted(expr)) do
    :binary
  else
    :bitstring
  end
end

There's another complication with this. Our current checks don't guard against dynamic bit size. We allow stuff like this

size = 3
<<2::size(size)>>

If we go this route we also need to guard against dynamic size. Otherwise we can't determine the type and I don't think we can guarantee using Code.eval_quoted is safe.

@@ -1443,6 +1459,7 @@ if Code.ensure_loaded?(MyXQL) do
defp ecto_to_db(:bigserial, _query), do: "bigint unsigned not null auto_increment"
defp ecto_to_db(:binary_id, _query), do: "binary(16)"
defp ecto_to_db(:string, _query), do: "varchar"
defp ecto_to_db(:bitstring, _query), do: "bit"
Copy link
Member

@greg-rychlewski greg-rychlewski Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MySQL it seems more complicated:

Assuming all the above is correct, maybe we should do this then:

  1. Add defp ecto_size_to_db(:bitstring), do: "bit".
  2. Raise here defp ecto_to_db(:bitstring)

This would mean that migrations work as long as you specify the size and casting raises. This would avoid unexpected surprises in migrations specifying size 1 silently. And since casting doesn't work it would be clearer if we make our own message rather than the user not knowing whether we just sent the wrong type to the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 28, 2023

I think we can leave this one as is, it should just catch that users are not supplying variables (i.e. <<var>> or <<var::1>>). It doesn't look to me like it matters if it's a binary or bitstring.

I totally agree.

The only way I know to check here is using Code.eval_quoted. Like this:

I don't like the idea of using Code.eval_quoted very much, I'd prefer to leave :binary for both :binary and :bitstring at this stage.

defp expr(%Ecto.Query.Tagged{value: binary, type: :binary}, _sources, _query)
       when is_binary(binary) do
    ["'\\x", Base.encode16(binary, case: :lower) | "'::bytea"]
  end

Yes, we will have to create a clause like this for bitstring. Because that encoding will fail for non-binaries and also that representation will not be allowed to be cast to bit types in Postgres.

I would differentiate :binary from :bitstring only here, adding a new clause with when is_bitstring(...) guard, leaving type: :binary in function sign.

It shouldn't be in use today because only size = 8 would work. So if we go this route we might also need to guard against dynamic size. Otherwise we can't determine the type at compile time.

Should we allow a syntax like this:

size = 3
<<2::size(^size)>>

with the ^ operator for dynamic size?

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 28, 2023

I would differentiate :binary from :bitstring only here, adding a new clause with when is_bitstring(...) guard, leaving type: :binary in function sign.

Let's get Jose's opinion before making any change to this part. Maybe what you are saying is the best way. But at the same time we'd be passing the wrong type through Ecto in the builder, the planner and out to all the adapters (not just the built-in ones). The types are used in places other than the query string building.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 29, 2023

That's ok, let's wait for Jose's opinion. In the meantime I'll ad some integration tests

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 29, 2023

The other way I can think of is we already iterate through the bitstring here:

valid? = Enum.all?(args, fn
      {:"::", _, [left, _]} -> is_integer(left) or is_binary(left)
      left -> is_integer(left) or is_binary(left)
    end)

So we could possibly determine at this point whether the sum of the sizes is divisible by 8 or not by tracking the remainder after each iteration and adding some pattern matches to size. Though we'd have to guard against variable sizes and raise the same message as if the value was variable:

unless valid? do
      error! "`#{Macro.to_string(expr)}` is not a valid query expression. " <>
             "Only literal binaries and strings are allowed, " <>
             "dynamic values need to be explicitly interpolated in queries with ^"
    end

i.e. if any part of the literal is variable, interpolate the entire thing (^<<..>>)

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 29, 2023

The other way I can think of is we already iterate through the bitstring here:

valid? = Enum.all?(args, fn
      {:"::", _, [left, _]} -> is_integer(left) or is_binary(left)
      left -> is_integer(left) or is_binary(left)
    end)

So we could also determine at this point whether the sum of the sizes is divisible by 8 or not by tracking the remainder after each iteration and adding some pattern matches to size. Though we'd have to guard against variable sizes and raise the same message as if the value was variable:

unless valid? do
      error! "`#{Macro.to_string(expr)}` is not a valid query expression. " <>
             "Only literal binaries and strings are allowed, " <>
             "dynamic values need to be explicitly interpolated in queries with ^"
    end

i.e. if any part of the literal is variable, interpolate the entire thing (^<<..>>)

Yeah I agree, I was experimenting exactly with that, but we should consider that there are different kinds of size definitions, eg <<1::1, 2::size(2), 3::bits(3), 4::integer, ...>> . I cannot find exhaustive documentation about these modifiers.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 29, 2023

but we should consider that there are different kinds of size definitions

Yeah I have the same problem. It does not feel very stable to do it in this way. Modifiers can even change in the future and Ecto misses one. If there is a safe way to use Code.eval_quoted then it's not clear to me that tracking the remainder is better. But I don't know if there is a safe way.

@greg-rychlewski
Copy link
Member

Regardless, I think we have to consider modifying the checks here:

valid? = Enum.all?(args, fn
      {:"::", _, [left, _]} -> is_integer(left) or is_binary(left)
      left -> is_integer(left) or is_binary(left)
    end)

Because this will fail for something like this << <<1::1>>::bitstring >>

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Nov 29, 2023

Could Code.eval_quoted be potentially dangerous? There could be anything in ::size(..) modifier.

@greg-rychlewski
Copy link
Member

greg-rychlewski commented Nov 29, 2023

Yeah, that's why I mentioned we'd need to block user input and only accept literals. But given the multitude of modifiers I don't know how feasible this is. And modifiers might change in the future and Ecto misses a new one. So maybe it's unacceptable no matter what. Though there might be some protection given it is inside size(...). But I don't know enough to say that.

Basically none of the options are looking too appealing right now:

  • eval_quoted
  • passing the wrong type through the builder/planner/adapters
  • tracking the size remainder
  • change the default type to :bitstring and if users want to treat it as binary they use type/2. but this would break existing code

I'm hoping by continuing to discuss/think about it a better way becomes clear.

@greg-rychlewski
Copy link
Member

Yeah I got the impression looking into the MyXQL history that this is a tough topic. For example: elixir-ecto/myxql#91.

My first impulse was to try and figure out the issue, but given the complexity maybe we go ahead with this PR for now without MySQL and then add it in later once we figure it out.

@wojtekmach
Copy link
Member

Agreed.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 19, 2024

My first impulse was to try and figure out the issue, but given the complexity maybe we go ahead with this PR for now without MySQL and then add it in later once we figure it out.

I agree too.

So my next step is to remove the MySQL implementation ad do an integration test only for Postgres, am I right?

@greg-rychlewski
Copy link
Member

@Gigitsu Yeah you can keep the test in Ecto because that's where all the other types are. But you could pull the :bitstring tests into their own test and tag them with @tag :bitstring_type. And then in ecto_sql integration test helper files for MySQL and TDS you ignore the tag.

And then last thing is to add literal support in Ecto and then catch the tagged value here in the postgres adapter.

@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 25, 2024

Hi @greg-rychlewski, I've removed everything related to bitstring from MySQL and TDS adapters.
I've also created a new schema specifically for the bitstring tests (sorry for the lack of creativity in naming this table and its fields; any suggestions are welcome) and I've isolated these tests to run only for Postgres.

Now everything seems to work. Please let me know if there is anything I forgot.

Additionally, I've updated the Earthfile to make it compatible with ARM (Apple Silicon in my case) CPUs. However, I only have experience with Docker and haven't used Earthly before, so please take a look at it.

And then last thing is to add literal support in Ecto and then catch the tagged value here in the postgres adapter.

I almost forgot this, I work on it

mix.exs Outdated Show resolved Hide resolved
Co-authored-by: José Valim <[email protected]>
@greg-rychlewski
Copy link
Member

sorry @Gigitsu could you please do mix deps.update with the latest change so that it updates mix.lock with the latest ecto gitub commit rather than hex

@greg-rychlewski greg-rychlewski merged commit 95e2fff into elixir-ecto:master Feb 27, 2024
10 checks passed
@greg-rychlewski
Copy link
Member

thanks @Gigitsu !

@Gigitsu Gigitsu deleted the feature/bitstring-support branch February 27, 2024 13:24
@Gigitsu
Copy link
Contributor Author

Gigitsu commented Feb 27, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants