Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSGO-30 DSE Search Results May Not Be Queryable #1689

Open
cardonator opened this issue Apr 7, 2023 · 10 comments
Open

CASSGO-30 DSE Search Results May Not Be Queryable #1689

cardonator opened this issue Apr 7, 2023 · 10 comments

Comments

@cardonator
Copy link

cardonator commented Apr 7, 2023

What version of Cassandra are you using?

DataStax Enterprise 5.1 / Cassandra 3.11

What version of Gocql are you using?

1.3.2

What version of Go are you using?

1.20.2

What did you do?

I am using the Search product in DSE to run solr queries for BI purposes. gocql works great for this the vast majority of the time because the queries return results that look just like the table schema, however there is a style of solr query with a pivot/facet where the response fields can differ from the table fields and gocql does not seem to be able to handle this while other Cassandra connectors can.

What did you expect to see?

For example, if I run a query like this:

select * from keyspace.mytable where solr_query = '{"q": "*:*", "fq":["account_id:(1 2 3 4)", "created:[2020-01-01T00:00:00Z TO 2020-12-31T23:59:59Z]"], "facet":{"pivot":"profile_name", "range":"created", "f.created.range.start":"2020-01-01T00:00:00Z", "f.created.range.end":"2020-12-31T23:59:59Z", "f.created.range.gap":"+1MONTH"}}';

If I run this from cqlsh or from a tool we use called SQLPad, I get back two columns in the response, facet_ranges and facet_pivot. These are json fields that have various faceted statistics in them. I tried using the dynamic fields functionality but even those weren't able to see the data returned from these fields in any way that I could parse.

What did you see instead?

The data is not returned in the gocql files in any way I can find it.

What I'm looking for is if there is something I should try, some kind of debug output I should enable, or anything else I could do that would help get support for this type of response working in gocql. I realize the maintainers don't really have access to or use DSE, but it would be very helpful for BI use cases.

@martin-sucha
Copy link
Contributor

Hi!

How does your code that calls gocql look like? What do you see in the results with gocql instead of the expected result? "The data is not returned" is not very specific. I need more information to be able to help you.

Gocql can be built with gocql_debug tag for more verbose logs, however it's possible that will not provide enough information. Can you provide packet capture showing this query execution with gocql (if the results of the query are not sensitive)? Does framer.parseResultRows see the columns in the result metadata? You could also try running a reproducer program under the debugger.

@remiphilippe
Copy link

From what I see the problem is linked to the "direct" JSON response that DSE returns. The reponse doesn't match any colums.

We get a crash when we run solr queries (panic is the same as #1460):

panic: not enough bytes in buffer to read int require 4 got: 0 [recovered]
	panic: not enough bytes in buffer to read int require 4 got: 0

I've attached the output of the debugger
image

The full buffer is:

"\x00\x00\x00\x82{\"accesslocation\":{\"gridLevel\":2,\"columns\":32,\"rows\":32,\"minX\":-180.0,\"maxX\":180.0,\"minY\":-90.0,\"maxY\":90.0,\"counts_ints2D\":null}}"

if there is a way to get the response from that buffer, that would work.

@remiphilippe
Copy link

remiphilippe commented Apr 25, 2023

A little more context, the first column readInt works and the buffer contains the value, it's the second column read that causes the crash (cols shows 23 columns total which are indeed the 23 columns from the table, but the solr query returns a single column).

A workaround that works for us, specify any column of the text data type (we use the solr_query one select "solr_query" from ...) and the query will succeed and be processed by gocql.

@martin-sucha
Copy link
Contributor

Thanks! @remiphilippe could you please provide the the Go code that calls gocql in the reproducer? Having a dump of the whole PREPARE/EXECUTE exchange from the protocol would help as well, as it is not clear to me how the result metadata looks like in this case and I don't have access to DSE to try myself.

In any case it seems that the error handling in frame parsing should be more robust, it definitely should not panic, just return an error. However, we also need to find out why gocql expects int while there is JSON blob.

cc @joao-r-reis is this issue something that Datastax would be willing to help with?

@remiphilippe
Copy link

@martin-sucha sure can do that, are you looking for outputs from delve or specific dumps? If you let me know what to dump I'll update my test code.

@joao-r-reis
Copy link
Contributor

We can't look into this at this exact moment but I'm going to add this to my list and see if I can find some free time in the near future to investigate and potentially fix the issue.

@martin-sucha
Copy link
Contributor

@remiphilippe pcap dump from Wireshark/tcpdump with the network communication would help. Please make sure there isn't any secret included, like the database password or sensitive contents of tables.

Specifically, gocql will send a PREPARE message to the server when first executing the query and then EXECUTE message to execute the prepared statement. The RESULT of the prepare message will contain description of the columns and the RESULT from the execute call will contain the values.

With the captured network packets, it should be possible to:

  • check whether the response conforms to the CQL protocol
  • create a unit test that uses TestServer to reproduce the situation without DSE.

@cardonator
Copy link
Author

Now that Apache owns this library, and DataStax is a major contributor to the open source project, is there anyone at DataStax that could help with this issue?

@lukasz-antoniak
Copy link
Member

Background
GoCQL will prepare the statement and reuse result metadata for RESULT message parsing. When we prepare CQL like:
SELECT * FROM my_table WHERE solr_query='{"q": "id:*", "facet":{"pivot":"description", "limit":"10"}, "useFieldCache":true}'
C* would return as many columns as my_table has for PREPARE request (e.g. 3). When we execute SOLR query, there is only 1 column return and this causes issue.

Workaround
When executing query use .NoSkipMetadata().

Proposed Solution
What do you think about adding ScanSolr similar to ScanCAS that could handle result metadata correctly? I have mixed feelings the reason is strong enough to have new scan function.

@joao-r-reis joao-r-reis changed the title DSE Search Results May Not Be Queryable CASSGO-30 DSE Search Results May Not Be Queryable Nov 6, 2024
@joao-r-reis
Copy link
Contributor

joao-r-reis commented Nov 6, 2024

Thanks @lukasz-antoniak for the investigation. Since this is related to gocql not prioritizing the metadata of the response over the prepared metadata this will be fixed by #1822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants