You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a list of improvements that can be currently be taken in the Glossaries module:
Implement server-side permissions for glossary operations
This should be the first priority. For glossaries there are no permissions decorating the service. Which means that for example if a team owns glossary1, another team could delete glossary1. This is avoided in the UI but not programmatically.
Ensure business logic is in GlossariesService and not in db.repositories
The service layer in the glossaries module offloads all the business logic to the GlossariesRepository. Instead we should execute the business logic in the service and use the repository exclusively for db operations.
Clearly split Glossaries and Catalog. At the moment in the backend there is a catalog module which is in fact handling all APIs for Glossaries which are not necessary the same. In the frontend we separate among Glossaries and Catalog.
Remove duplicated gql fields for Glossaries/Categories/Terms
Categories, Terms and Glossaries share most of their gql types and input_types. We should reduce the lines of code and re-use a single input_type and single type for them. This is also applicable to the frontend query definition.
Simplify API calls and make them more straight-forward
Secondly, the way the API calls are designed to get information from a glossary can be simplified. We currently define 3 APIs in the frontend:
getGlossary
getGlossaryTree - which in reality is getGlossary with a bunch of extra fields
listAssociations - which in reality is getGlossaryTree - which in reality is getGlossary
getGlossary is doing too many things. We should break it down by responsibility:
getGlossary - input: glossary.nodeUri, output: glossary db metadata + resolve_stats
NEW `getGlossaryTree - input: glossary.nodeUri, output: tree object of categories, terms with their parentUri
NEW listAssociations - input: glossary.nodeUri, output: paginated list of associations
Remove duplicated code
Glossaries API calls return a great number of unused fields. First of all we should remove unused fields from both backend and frontend to improve performance. I am mostly referring to the Glossary object.
The text was updated successfully, but these errors were encountered:
### Feature or Bugfix
- Feature:Testing
### Detail
Implement tests for Glossaries/Catalog as part of
#1220⚠️ To test glossary associations, we need resources that use the
glossary terms (associate a term to a resource). Possible resources
include: s3_datasets, s3_tables, s3_folders, dashboards,
redshift_datasets and redshift_tables. All of which are part of other
modules that might be enabled or disabled. In this PR we assume that the
s3_datasets module is enabled! If it was not enabled, then the glossary
tests would fail!
During the implementation several enhancement ideas came up and are
collected in #1557
Tested in local with connection to real AWS deployment:
![image](https://github.com/user-attachments/assets/d759cc3d-7e44-4057-b16b-0fc94edd2290)
### Relates
- #1220
### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).
- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
- Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
- Is injection prevented by parametrizing queries?
- Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
- Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
- Do you use a standard proven implementations?
- Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
- Have you used the least-privilege principle? How?
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Here is a list of improvements that can be currently be taken in the Glossaries module:
Implement server-side permissions for glossary operations
This should be the first priority. For glossaries there are no permissions decorating the service. Which means that for example if a team owns glossary1, another team could delete glossary1. This is avoided in the UI but not programmatically.
Ensure business logic is in GlossariesService and not in db.repositories
The service layer in the glossaries module offloads all the business logic to the GlossariesRepository. Instead we should execute the business logic in the service and use the repository exclusively for db operations.
Clearly split Glossaries and Catalog. At the moment in the backend there is a catalog module which is in fact handling all APIs for Glossaries which are not necessary the same. In the frontend we separate among Glossaries and Catalog.
Remove duplicated gql fields for Glossaries/Categories/Terms
Categories, Terms and Glossaries share most of their gql types and input_types. We should reduce the lines of code and re-use a single input_type and single type for them. This is also applicable to the frontend query definition.
Simplify API calls and make them more straight-forward
Secondly, the way the API calls are designed to get information from a glossary can be simplified. We currently define 3 APIs in the frontend:
getGlossary
getGlossaryTree
- which in reality isgetGlossary
with a bunch of extra fieldslistAssociations
- which in reality isgetGlossaryTree
- which in reality isgetGlossary
getGlossary
is doing too many things. We should break it down by responsibility:getGlossary
- input: glossary.nodeUri, output: glossary db metadata + resolve_statslistAssociations
- input: glossary.nodeUri, output: paginated list of associationsGlossaries API calls return a great number of unused fields. First of all we should remove unused fields from both backend and frontend to improve performance. I am mostly referring to the Glossary object.
The text was updated successfully, but these errors were encountered: