Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Add one-pager for Terraform registry scrapers and registry metadata #2956

Closed
wants to merge 9 commits into from

Conversation

ulucinar
Copy link
Contributor

@ulucinar ulucinar commented Mar 10, 2022

Description of your changes

Fixes crossplane/terrajet/issues/203

With this one-pager, we would like to discuss how metadata about Terraform native resources can be scraped and processed from the Terraform registry and used in Terrajet codegen pipelines to produce example manifests or CRD documentation and in various other use cases.

We believe such metadata offers large potential to improve the user experience with the Crossplane providers generated using Terrajet as well as the quality of them (e.g., auto-generated CRD documentation).

We aim to gather community feedback and stir up some discussions to discover new use cases with such metadata in Terrajet-based providers.

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

N.A.

NOTE: We are considering to move this proposal to https://github.com/crossplane/terrajet after the review process is completed.

enhanced Terrajet codegen pipelines

- Fixes crossplane/terrajet/issues/203

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
implementations to be able to fetch metadata from different sources but the
Terrajet pipelines will always be working on a well defined format regardless
of how those metadata are scraped.
- We would like to have the scrapers run as needed, produce their output in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this specific item is leaking the proposal detail into the goal section. IMO, the goals section as a whole should be a bit higher level and allow alternative proposal to achieve the same goals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @muvaf for the detailed review.
I have restructured the Goals section and removed the proposal hints from it.

run each time with a `make generate`, just like the existing codegen pipelines
we have. This would allow us to separate the lifecycles of metadata-scraping
and code generation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Proposal section could be helpful for readers here to understand the whole solution before jumping to the metadata format which is one of the implementation details of the solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Listing what's available in TF registry as metadata information would also be very valuable for having readers think about the future expansions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a top-level proposal section giving a high-level overview and made the existing proposal sections subsections of this top-level section.
I have also extended the discussion on available metadata in the Terraform registry.


### Metadata Format
The proposed syntax for scraped metadata documents is YAML as we would also like
the metadata to be human readable, searchable and maintainable, if needed. A
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by maintainable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added example maintenance tasks for the scraped metadata in the document.

`azurerm_analysis_services_server` of the native Terraform provider
[terraform-provider-azurerm] could be as follows:

```yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an example definitely helps to understand what goes where but I wonder if we can have an API reference before the examples to show the schema of the YAML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a full example (an example that contains all keys) with comments instead of a formal notation (like BNF or OpenAPI schema) serves our purposes better here. This is not intended to be a formal specification but rather the purpose is to convey the idea.

importStatements:
- terraform import azurerm_analysis_services_server.server /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resourcegroup1/providers/Microsoft.AnalysisServices/servers/server1
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why the chosen method is better than these alternatives? This would help us reason about them more deliberately when we want to make changes to the pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarifications on the choices we made here.

from a pointed directory in the local filesystem, which is specified as a
command-line argument, for instance.

As already indicated, if it turns out that a common registry scraper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that optionality.

none or some are not available in the corresponding metadata document.

Metadata is valuable; the scrapers should capture as much metadata as possible
and store them in the common format, even for future use cases we do not yet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit on the fence about storing everything we can because we have to have a format for each of those information that we don't use and once we do use them, we may want to change the format, i.e. where it lives in the YAML and also since we provide manual input ability, the format become one of our public APIs. So, I wonder if we should, for example, just keep the examples since that's all we use today and then once we use other information, we could add them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added example use cases for the proposed fields. In fact (apart from documentation) most have been utilized in provider-jet-azure. I think this also serves as an opportunity to show the potential we have here.

Metadata is valuable; the scrapers should capture as much metadata as possible
and store them in the common format, even for future use cases we do not yet
envision. New Terrajet pipelines can be added, or existing ones can be enhanced
to support advanced use cases. One such proposal could be to extend the CRD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Future Considerations section could work well to list all the use cases you have in mind at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Future Considerations section.

providers `provider-jet-aws`, `provider-jet-gcp` and `provider-jet-azure` in the
context of the corresponding [Terrajet issue #48], we have seen utility in
extracting such metadata from the Terraform registry and use it to generate
example manifests and documentation. In this document, we would like to propose:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Background section could be limited to the problem statement in general and the actual proposal could be kept for its own section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed proposal hints from the Background section as suggested.

@@ -0,0 +1,272 @@
# Metadata Extraction from Terraform Registry for Terrajet-based providers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Metadata Extraction from Terraform Registry for Terrajet-based providers
# Metadata Extraction from Terraform Registry in Terrajet

Since we're removing the difference between Terrajet vs SDK calls at repository level, I think removing Terrajet-based provider or Terrajet-based repository terms would be more future-proof for readers.

@negz
Copy link
Member

negz commented Apr 16, 2022

Would it make sense to move this design (and perhaps the Terrajet design doc too?) to a new design/ directory under https://github.com/crossplane/terrajet?

move existing proposal sections as subsections of this top-level section.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
from getting lost as scrapers are rerun, and extend the discussion
on when the proposed scrapers are run.

Signed-off-by: Alper Rifat Ulucinar <[email protected]>
@ulucinar
Copy link
Contributor Author

Hi @negz,
I think what you propose makes sense. Let me have the PR approved here (to preserve the existing context) and then I can reproduce it in https://github.com/crossplane/terrajet referring to the review here. I will also mark this PR with do-not-merge. Thank you!

@ulucinar ulucinar changed the title Add one-pager for Terraform registry scrapers and registry metadata [DO NOT MERGE] Add one-pager for Terraform registry scrapers and registry metadata Apr 22, 2022
@muvaf
Copy link
Member

muvaf commented Apr 25, 2022

Would it make sense to move this design (and perhaps the Terrajet design doc too?) to a new design/ directory under https://github.com/crossplane/terrajet?

I think having all design docs under crossplane/crossplane helps with discoverability of those docs for the community. Maybe something like crossplane/enhancements or crossplane/design-docs (similar to kubernetes) that could contain all design docs of Crossplane community? It could be a place for non-crossplane org stuff as well, like provider or other extension designs.

@negz
Copy link
Member

negz commented May 2, 2022

I think having all design docs under crossplane/crossplane helps with discoverability of those docs for the community. Maybe something like crossplane/enhancements or crossplane/design-docs (similar to kubernetes) that could contain all design docs of Crossplane community? It could be a place for non-crossplane org stuff as well, like provider or other extension designs.

I don't feel super strongly, but it does seem a little odd to me that a design pertaining completely to Terrajet is not in the Terrajet repo. We can just stick with everything going in the design folder here if folks feel differently.

@ulucinar
Copy link
Contributor Author

Closing this PR as we would like to keep code generation in terrajet only scoped to generating CRDs.

@ulucinar ulucinar closed this May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

One Pager - Metadata Extraction from Terraform Registry for Terrajet-based providers
3 participants