Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for s3 fields in discover #8609

Merged
merged 7 commits into from
Oct 23, 2024

Conversation

ps48
Copy link
Member

@ps48 ps48 commented Oct 16, 2024

Description

  • Add support for s3 fields in discover

Issues Resolved

  • Added async loaders to field selectors in side bar
  • On page reload, fetch the fields for the selected datasets
  • Added support for SQL/PPL fields to OSD field types
  • Added field type conversion from SQL/PPL to OSD field types
  • Added areFieldsLoading flag to the index pattern specification to showcase fields loading in the side bar
  • Added supportsTimeFilter flag to datasetConfig and dataset meta

Screenshot

Screen.Recording.2024-10-16.at.8.42.02.AM.mov

Changelog

  • fix: Discover 2.0 support for S3 fields

Check List

  • All tests pass
    • yarn test:jest
    • yarn test:jest_integration
  • New functionality includes testing.
  • New functionality has been documented.
  • Update CHANGELOG.md
  • Commits are signed per the DCO using --signoff

Copy link
Contributor

❌ Empty Changelog Section

The Changelog section in your PR description is empty. Please add a valid changelog entry or entries. If you did add a changelog entry, check to make sure that it was not accidentally included inside the comment block in the Changelog section.

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 20 lines in your changes missing coverage. Please review.

Project coverage is 60.86%. Comparing base (7da5443) to head (53198c4).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...ry/query_string/dataset_service/dataset_service.ts 41.17% 6 Missing and 4 partials ⚠️
...s/data/public/ui/dataset_selector/configurator.tsx 0.00% 4 Missing ⚠️
...gins/query_enhancements/public/datasets/s3_type.ts 90.90% 1 Missing and 2 partials ⚠️
...mon/index_patterns/index_patterns/index_pattern.ts 66.66% 1 Missing ⚠️
...pplication/components/sidebar/discover_sidebar.tsx 0.00% 0 Missing and 1 partial ⚠️
...ication/view_components/utils/use_index_pattern.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8609   +/-   ##
=======================================
  Coverage   60.85%   60.86%           
=======================================
  Files        3793     3793           
  Lines       90368    90442   +74     
  Branches    14181    14198   +17     
=======================================
+ Hits        54997    55045   +48     
- Misses      31893    31909   +16     
- Partials     3478     3488   +10     
Flag Coverage Δ
Linux_1 29.08% <0.00%> (-0.02%) ⬇️
Linux_2 56.38% <0.00%> (-0.02%) ⬇️
Linux_3 37.68% <34.61%> (-0.01%) ⬇️
Linux_4 29.85% <53.44%> (+0.02%) ⬆️
Windows_1 29.09% <0.00%> (-0.02%) ⬇️
Windows_2 56.33% <0.00%> (-0.02%) ⬇️
Windows_3 37.68% <34.61%> (-0.01%) ⬇️
Windows_4 29.85% <53.44%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ashwin-pc ashwin-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While most of the comments are abou code changes that dont affect users and can be fast followed, there are 2 user facing bugs that we need to address:

  1. There are 3 requests fired when both fields are being loaded and the query is running. I can reproduce this the easiest when ive selected an s3 datasource and refreshed the page. See screenshot
Screenshot 2024-10-18 at 4 51 33 AM
  1. The available fields just dint stop spinning when the page refreshed even once the requests are successful. See secind screenshot
Screenshot 2024-10-18 at 5 23 29 AM

Copy link
Collaborator

@virajsanghvi virajsanghvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't finish my review as I assume there will be changes based on ashwin's comments

src/plugins/data/common/datasets/types.ts Outdated Show resolved Hide resolved
ps48 pushed a commit to ps48/OpenSearch-Dashboards that referenced this pull request Oct 18, 2024
@ps48 ps48 force-pushed the discover-s3-fields-updated branch from 2d19e68 to 5976444 Compare October 18, 2024 21:18
@ps48
Copy link
Member Author

ps48 commented Oct 18, 2024

Looks like Flint is breaking on main, get this when running any s3 based SQL query:

server    log   [21:53:31.717] [error][plugins][queryEnhancements] Facet fetch: enhancements.sqlQuery: Error: Data Source Error: Internal Server Error
server    log   [21:53:31.717] [error][plugins][queryEnhancements] sqlSearchStrategy: {
  "error": {
    "reason": "There was internal problem at backend",
    "details": "Glue storage engine is not supported.",
    "type": "UnsupportedOperationException"
  },
  "status": 500
}

@ps48
Copy link
Member Author

ps48 commented Oct 18, 2024

Debugging the commits led me this 923cce8. this commit might be the root-cause why we're seeing this weird behavior for flint datasources

ps48 pushed a commit to ps48/OpenSearch-Dashboards that referenced this pull request Oct 19, 2024
@ps48 ps48 force-pushed the discover-s3-fields-updated branch from 5976444 to 72f0b6e Compare October 19, 2024 00:01
@ps48
Copy link
Member Author

ps48 commented Oct 19, 2024

Current status:

  • Rebased my PR on main again - no conflicts as of now
  • Reverted the dataselector 923cce8 changes on local to test my PR it works
  • I am able to load and reload pages without any issue
  • I the bug Ashwin raised on 3 queries being emitted out for fields -> Will take this and other non-blocking comments in fast follow once we get this merged in.

Video of local test with the revert dataselector changes:

flint-s3-discover.mov

@ps48 ps48 force-pushed the discover-s3-fields-updated branch from 72f0b6e to 000834f Compare October 21, 2024 22:12
@@ -28,6 +28,8 @@ export interface DataSourceMeta {
name?: string;
/** Optional session ID for faster responses when utilizing async query sources */
sessionId?: string;
/** Optional supportsTimeFilter determines if a time filter is needed */
supportsTimeFilter?: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be controlled based on whether timeField is present in the dataset rather than additional flag?

Copy link
Member Author

@ps48 ps48 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today even if timeField is present we don't have support for S3 datasources to automatic time filter injection in the query. This is surely where we want to go towards in future though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some older conversations around this: #8337 (comment), #8337 (comment)

const temporaryIndexPattern = await this.indexPatterns?.create(spec, true);

// Load schema asynchronously if it's an async index pattern
if (asyncType && temporaryIndexPattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does async index patterns mean the metadata like field schema is not stored in the index pattern saved object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async index patterns means that the fields are lazy loaded; i.e. the temporary index pattern is created with empty fields and when the fields are loaded in the index pattern gets hydrated with the fields.

@@ -80,6 +83,11 @@ export const Configurator = ({
setTimeFields(dateFields || []);
};

if (baseDataset?.dataSource?.meta?.supportsTimeFilter === false) {
setTimeFields([]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this [] creates a different object every time when called, and will trigger a re-render. might be better to add a guard

Suggested change
setTimeFields([]);
if (timeFields.length > 0) setTimeFields([]);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here: 53198c4

<EuiSplitPanel.Inner
className="eui-yScroll dscSideBar_fieldListContainer"
paddingSize="none"
>
{fields.length > 0 && (
{(fields.length > 0 || selectedIndexPattern.fieldsLoading) && (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if loading and fields list is empty, what does UI show? should it indicate that it's loading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If loading and fields list is empty -> The UI shows the sidebar in the loading state.

@@ -312,6 +313,7 @@ const FieldList = ({
size="xs"
className="dscSideBar_fieldGroup"
aria-label={title}
isLoading={selectedIndexPattern.fieldsLoading ?? false}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
isLoading={selectedIndexPattern.fieldsLoading ?? false}
isLoading={selectedIndexPattern.fieldsLoading}

or coerce to boolean !!selectedIndexPattern.fieldsLoading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coerce makes sense: 53198c4

export interface SQLQueryResponse {
status: string;
schema: Array<{ name: string; type: string }>;
datarows: Array<Array<string | null>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be

Suggested change
datarows: Array<Array<string | null>>;
datarows: Array<Array<unknown>>;

i think columns in SQL response is not always string?

Copy link
Member Author

@ps48 ps48 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here: 53198c4

const sessionId = (dataset.dataSource?.meta as DataStructureCustomMeta).sessionId;
const response = await http.fetch({
method: 'POST',
path: trimEnd(`${API.DATA_SOURCE.ASYNC_JOBS}`),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit.

Suggested change
path: trimEnd(`${API.DATA_SOURCE.ASYNC_JOBS}`),
path: trimEnd(API.DATA_SOURCE.ASYNC_JOBS),

Copy link
Member Author

@ps48 ps48 Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here: 53198c4

fetchStatus: () =>
http.fetch({
method: 'GET',
path: trimEnd(`${API.DATA_SOURCE.ASYNC_JOBS}`),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: trimEnd(`${API.DATA_SOURCE.ASYNC_JOBS}`),
path: trimEnd(API.DATA_SOURCE.ASYNC_JOBS),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here: 53198c4

Signed-off-by: Shenoy Pratik <[email protected]>
Copy link
Member

@joshuali925 joshuali925 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't pull down to test but from code perspective lgtm

Copy link
Collaborator

@virajsanghvi virajsanghvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have time to pull this down to test, but approving assuming bugs Ashwin brought up are resolved (which appear to be the case from comments)

Other comments are not blocking

.fetchFields(dataset, services)
.then((fields) => {
temporaryIndexPattern.fields.replaceAll([...fields]);
this.indexPatterns?.saveToCache(dataset.id, temporaryIndexPattern);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling of this request should ideally be tested. Can we follow up on this?

path: trimEnd(API.DATA_SOURCE.ASYNC_JOBS),
body: JSON.stringify({
lang: 'sql',
query: `DESCRIBE TABLE ${dataset.title}`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be escaped in anyway or can we trust dataset metadata?

Copy link
Member

@ashwin-pc ashwin-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When i pulled it down im seeing issues with the state. Blocking so that we dont accidentally merge it in the meantime

My bad. Still had the older commits when i rviewed. When i pulled down the latest changes, it works

@ashwin-pc
Copy link
Member

Screen.Recording.2024-10-22.at.6.52.21.PM.mov

Noticed one bug, but wont block it because it can come in a fast follow

@ashwin-pc ashwin-pc merged commit 12d072d into opensearch-project:main Oct 23, 2024
68 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 23, 2024
* add support for s3 fields in discover

Signed-off-by: Shenoy Pratik <[email protected]>

* Changeset file for PR #8609 created/updated

* resolve comments, make fields fetch async

Signed-off-by: Shenoy Pratik <[email protected]>

* fix unit tests

Signed-off-by: Shenoy Pratik <[email protected]>

* update services to be Partial<IDataPluginServices>

Signed-off-by: Shenoy Pratik <[email protected]>

* fix async field fetch in cachedataset

Signed-off-by: Shenoy Pratik <[email protected]>

* resolve comments

Signed-off-by: Shenoy Pratik <[email protected]>

---------

Signed-off-by: Shenoy Pratik <[email protected]>
Co-authored-by: opensearch-changeset-bot[bot] <154024398+opensearch-changeset-bot[bot]@users.noreply.github.com>
(cherry picked from commit 12d072d)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
mengweieric pushed a commit that referenced this pull request Oct 23, 2024
* add support for s3 fields in discover



* Changeset file for PR #8609 created/updated

* resolve comments, make fields fetch async



* fix unit tests



* update services to be Partial<IDataPluginServices>



* fix async field fetch in cachedataset



* resolve comments



---------



(cherry picked from commit 12d072d)

Signed-off-by: Shenoy Pratik <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: opensearch-changeset-bot[bot] <154024398+opensearch-changeset-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants