Skip to content

Commit

Permalink
Docs updates (#6264)
Browse files Browse the repository at this point in the history
* Add diagrams to deploy docs
Move GC docs into their own folder

* Tidy up and restructure GC docs

* Update links from UI to point to current docs pages instead of relying on redirects

* Add content to how-to index page

* Docs nits

* Fix links

* Fix links

* Fix links

* Fix typo in output

* Use live reload in docs-serve

* Fix broken links

* Fix broken links

* Metaclient: handle initial commit (#6263)

* Add ssl optional configuration (#6268)

* Added SSL Certificate_verfified_Failed

* add proxy configuration

* Fix broken link to API doc (#6271)

* Add content to index pages of /understand (#6270)

* Remove default layout metadata
- It's the only layout
- It's the default anyway
- It adds unnecessary content to the front page matter metadata for pages, making it more likely to make a mistake editing what is there

* Add content to index pages of /understand, fix a few page nits

* Always that file you forget to update 🤦🏻

* Add diagrams to deploy docs
Move GC docs into their own folder

* Reword deploy doc

* Fix links

* Reword GCP lakefs CLoud callout

---------

Co-authored-by: Yoni <[email protected]>
Co-authored-by: iddoavn <[email protected]>
  • Loading branch information
3 people authored Aug 14, 2023
1 parent 37cdba2 commit 0480105
Show file tree
Hide file tree
Showing 34 changed files with 155 additions and 80 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/check-ui-links.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
- name: Check Lychee output
run: |
if grep "Errors per input" /tmp/lychee/results.md; then
echo "## 🙀 Outbuond links found in the lakeFS UI that are broken" > $GITHUB_STEP_SUMMARY
echo "## 🙀 Outbound links found in the lakeFS UI that are broken" > $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
while IFS= read -r line; do
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ docs/assets/js/swagger.yml: api/swagger.yml
docs: docs/assets/js/swagger.yml

docs-serve: ### Serve local docs
cd docs; bundle exec jekyll serve
cd docs; bundle exec jekyll serve --livereload

docs-serve-docker: ### Serve local docs from Docker
docker run --rm \
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/cloud/managed-gc.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ lakeFS Cloud
{: .label .label-green }

{: .note}
> Managed GC is only available for [lakeFS Cloud]({% link cloud/index.md %}). If you are using the self-managed lakeFS, garbage collection is [available to run manually]({% link howto/garbage-collection-index.md %}).
> Managed GC is only available for [lakeFS Cloud]({% link cloud/index.md %}). If you are using the self-managed lakeFS, garbage collection is [available to run manually]({% link howto/garbage-collection/index.md %}).
## Benefits of using managed GC
* The quick and safe way to delete your unnecessary objects
Expand All @@ -20,7 +20,7 @@ lakeFS Cloud
* Support from the Treeverse team

## How it works
Similarly to the self-managed lakeFS, managed GC uses [garbage collection rules]({% link howto/garbage-collection-index.md %}) to determine which objects to delete.
Similarly to the self-managed lakeFS, managed GC uses [garbage collection rules]({% link howto/garbage-collection/index.md %}) to determine which objects to delete.
However, it uses our super-fast and efficient engine to detect stale objects and branches (depends on your configuration) and prioritize them for deletion.

## Setting up
Expand Down
15 changes: 11 additions & 4 deletions docs/howto/deploy/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: AWS
grand_parent: How-To
parent: Install lakeFS
description: This section will guide you through deploying and setting up a production-suitable lakeFS environment on AWS
description: How to deploy and set up a production-suitable lakeFS environment on AWS
redirect_from:
- /deploying-aws/index.html
- /deploying-aws/install.html
Expand All @@ -15,9 +15,16 @@ next: ["Import data into your installation", "/howto/import.html"]

# Deploy lakeFS on AWS

These instructions are for a self-managed deployment of lakeFS on AWS.<br/>
For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud/).
{: .note }
{: .tip }
> The instructions given here are for a self-managed deployment of lakeFS on AWS.
>
> For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
When you deploy lakeFS on AWS these are the options available to use:

![](/assets/img/deploy/deploy-on-aws.excalidraw.png)

This guide walks you through the options available and how to configure them, finishing with configuring and running lakeFS itself and creating your first repository.

{% include toc.html %}

Expand Down
13 changes: 7 additions & 6 deletions docs/howto/deploy/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Azure
grand_parent: How-To
parent: Install lakeFS
description: This section will guide you through deploying and setting up a production-suitable lakeFS environment on Microsoft Azure
description: How to deploy and set up a production-suitable lakeFS environment on Microsoft Azure
redirect_from:
- /setup/storage/blob.html
- /deploy/azure.html
Expand All @@ -11,13 +11,14 @@ next: ["Import data into your installation", "/howto/import.html"]

# Deploy lakeFS on Azure

These instructions are for a self-managed deployment of lakeFS on Azure. <br/>
For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud/).
{: .note }
{: .tip }
> The instructions given here are for a self-managed deployment of lakeFS on Azure.
>
> For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
lakeFS has several dependencies for which you need to select and configure a technology or interface:
When you deploy lakeFS on Azure these are the options available to use:

![](./deploy-on-azure.excalidraw.png)
![](/assets/img/deploy/deploy-on-azure.excalidraw.png)

This guide walks you through the options available and how to configure them, finishing with configuring and running lakeFS itself and creating your first repository.

Expand Down
14 changes: 10 additions & 4 deletions docs/howto/deploy/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: GCP
grand_parent: How-To
parent: Install lakeFS
description: This section will guide you through deploying and setting up a production-suitable lakeFS environment on Google Cloud Platform (GCP).
description: How to deploy and set up a production-suitable lakeFS environment on Google Cloud Platform (GCP).
redirect_from:
- /setup/storage/gcs.html
- /deploy/gcp.html
Expand All @@ -11,9 +11,15 @@ next: ["Import data into your installation", "/howto/import.html"]

# Deploy lakeFS on GCP

These instructions are for a self-managed deployment of lakeFS on GCP. <br/>
For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud/).
{: .note }

{: .tip }
> The instructions given here are for a self-managed deployment of lakeFS on GCP.
>
> For a hosted lakeFS service with guaranteed SLAs, please [contact us]([email protected]) for details of lakeFS Cloud on GCP.
When you deploy lakeFS on GCP these are the options available to use:

![](/assets/img/deploy/deploy-on-gcp.excalidraw.png)

{% include toc.html %}

Expand Down
2 changes: 1 addition & 1 deletion docs/howto/deploy/includes/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

When you first open the lakeFS UI, you will be asked to create an initial admin user.

1. open `http://<lakefs-host>/` in your browser. If you haven't set up a load balancer, this will likely be `http://<instance ip address>:8000/`
1. Open `http://<lakefs-host>/` in your browser. If you haven't set up a load balancer, this will likely be `http://<instance ip address>:8000/`
1. On first use, you'll be redirected to the setup page:

<img src="{{ site.baseurl }}/assets/img/setup.png" alt="Create user">
Expand Down
22 changes: 19 additions & 3 deletions docs/howto/deploy/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ title: Install lakeFS
parent: How-To
description: This section will guide you through deploying and setting up a production lakeFS environment.
has_children: true
has_toc: false
nav_order: 1
redirect_from:
- /setup/
Expand All @@ -14,9 +15,24 @@ redirect_from:

# Deploy and Setup lakeFS

For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
{: .note }
{: .tip }
> The instructions given here are for a self-managed deployment of lakeFS.
>
> For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
This section will guide you through deploying lakeFS on top of an object store. You will require a database, and can optionally configure authentication using providers specific to your deployment platform.

Which options are available depends on your deployment platform. For example, the object store available on Azure differs from that on AWS.

![](/assets/img/deploy/deploy-lakefs.excalidraw.png)

## Deployment and Setup Details

lakeFS releases include [binaries](https://github.com/treeverse/lakeFS/releases) for common operating systems, a [containerized option](https://hub.docker.com/r/treeverse/lakefs) or a [Helm chart](https://artifacthub.io/packages/helm/lakefs/lakefs).

Check out our guides for running lakeFS on [AWS]({% link howto/deploy/aws.md %}), [GCP]({% link howto/deploy/gcp.md %}) and [more]({% link howto/deploy/index.md %}}).
Check out our guides below for full deployment details:

* [AWS]( {% link howto/deploy/aws.md %})
* [Azure]( {% link howto/deploy/azure.md %})
* [GCP]( {% link howto/deploy/gcp.md %})
* [On-premises and other cloud providers]( {% link howto/deploy/onprem.md %})
12 changes: 7 additions & 5 deletions docs/howto/deploy/onprem.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: On-Premises Deployment of lakeFS
title: On-Premises
grand_parent: How-To
parent: Install lakeFS
description: This section will guide you through deploying and setting up a production-suitable lakeFS environment on-premises (or on other cloud providers)
description: How to deploy and set up a production-suitable lakeFS environment on-premises (or on other cloud providers)
redirect_from:
- /deploy/k8s.html
- /deploy/docker.html
Expand All @@ -12,10 +12,12 @@ redirect_from:
next: ["Import data into your installation", "/howto/import.html"]
---

# On-Premises deployment
# On-Premises Deployment

For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
{: .note }
{: .tip }
> The instructions given here are for a self-managed deployment of lakeFS.
>
> For a hosted lakeFS service with guaranteed SLAs, try [lakeFS Cloud](https://lakefs.cloud)
{% include toc.html %}

Expand Down
2 changes: 1 addition & 1 deletion docs/howto/deploy/upgrade.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Upgrade lakeFS
description: A guide to upgrading lakeFS to the latest version.
description: How to upgrade lakeFS to the latest version.
grand_parent: How-To
parent: Install lakeFS
has_children: false
Expand Down
6 changes: 0 additions & 6 deletions docs/howto/garbage-collection-index.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@ title: (deprecated) Committed Objects
description: Clean up unnecessary objects using the garbage collection feature in lakeFS.
parent: Garbage Collection
grand_parent: How-To
nav_order: 10
nav_order: 98
has_children: false
redirect:
- /howto/garbage-collection-committed.html
---

# Garbage Collection: committed objects
Expand All @@ -13,7 +15,7 @@ has_children: false
> Deprecation notice
>
> This feature will be available up to version 0.9.1 of the lakeFS metadata client. It will be discontinued in subsequent versions.
> Please visit the new [garbage collection documentation](./garbage-collection.md).
> Please visit the new [garbage collection documentation](./index.md).
By default, lakeFS keeps all your objects forever. This allows you to travel back in time to previous versions of your data.
However, sometimes you may want to hard-delete your objects - namely, delete them from the underlying storage.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
---
title: Garbage Collection
description: Clean up expired objects using the garbage collection feature in lakeFS.
parent: Garbage Collection
grand_parent: How-To
nav_order: 1
has_children: false
parent: How-To
has_children: true
redirect_from:
- /reference/garbage-collection.html
- /howto/garbage-collection-index.html
- /howto/garbage-collection.html
---

# Garbage Collection

[lakeFS Cloud](https://lakefs.cloud) users enjoy a managed garbage collection service, and do not need to run this Spark program.
{: .tip }

# Garbage Collection

By default, lakeFS keeps all your objects forever. This allows you to travel back in time to previous versions of your data.
However, sometimes you may want to remove the objects from the underlying storage completely.
Reasons for this include cost-reduction and privacy policies.

The garbage collection job is a Spark program that removes the following from the underlying storage:
1. _Committed objects_ that have been deleted (or replaced) in lakeFS, and are considered expired according to [rules you define](#understanding-garbage-collection-rules).
The garbage collection (GC) job is a Spark program that removes the following from the underlying storage:
1. _Committed objects_ that have been deleted (or replaced) in lakeFS, and are considered expired according to [rules you define](#garbage-collection-rules).
2. _Uncommitted objects_ that are no longer accessible
* For example, objects deleted before ever being committed.

{% include toc.html %}

## Understanding garbage collection rules
## Garbage collection rules

{: .note }
These rules only apply to objects that have been _committed_ at some point.
Expand All @@ -53,9 +53,9 @@ In the above example, objects will be retained for 14 days after deletion by def
However, if present in the branch `main`, objects will be retained for 21 days.
Objects present _only_ in the `dev` branch will be retained for 7 days after they are deleted.

## Configuring garbage collection rules
### How to configure garbage collection rules

To define retention rules, either use the `lakectl` command or the lakeFS web UI:
To define retention rules, either use the `lakectl` command, the lakeFS web UI, or [API](/reference/api.html#/retention/set%20garbage%20collection%20rules):

<div class="tabs">
<ul>
Expand Down Expand Up @@ -96,7 +96,7 @@ From the lakeFS web UI:
</div>
</div>

## Running the GC job
## How to run the garbage collection job

To run the job, use the following `spark-submit` command (or using your preferred method of running Spark programs).

Expand Down Expand Up @@ -238,7 +238,7 @@ spark.hadoop.lakefs.gc.do_mark=false
spark.hadoop.lakefs.gc.mark_id=<MARK_ID> # Replace <MARK_ID> with the identifier you obtained from a previous mark-only run
```

## Considerations
## Garbage collection notes

1. In order for an object to be removed, it must not exist on the HEAD of any branch.
You should remove stale branches to prevent them from retaining old objects.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
---
title: "Internals: Committed GC"
title: Internals
description: How Garbage Collection in lakeFS works
parent: Garbage Collection
grand_parent: How-To
nav_order: 30
nav_order: 1
has_children: false
redirect:
- /howto/gc-internals.html
---

## Committed GC Internals
# Committed Garbage Collection Internals

### What gets collected
{: .warning-title }
> Deprecation notice
>
> This page describes a deprecated feature. Please visit the new [garbage collection documentation](./index.html).

## What gets collected

Because each object in lakeFS may be accessible from multiple branches, it
might not be obvious which objects will be considered garbage and collected.
Expand Down Expand Up @@ -52,15 +60,15 @@ The garbage collection process proceeds in three main phases:
about the object, but attempting to read it via the lakeFS API or the S3
gateway will return HTTP status 410 ("Gone").

### What does _not_ get collected
## What does _not_ get collected

Some objects will _not_ be collected regardless of configured GC rules:
* Any object that is accessible from any branch's HEAD.
* Objects stored outside the repository's [storage namespace][storage-namespace].
For example, objects imported using the lakeFS import UI are not collected.
* Uncommitted objects, see [Uncommitted Garbage Collection](./garbage-collection-uncommitted.md),
* Uncommitted objects, see [Uncommitted Garbage Collection](./uncommitted.html),

### Performance
## Performance

Garbage collection reads many commits. It uses Spark to spread the load of
reading the contents of all of these commits. For very large jobs running
Expand All @@ -75,7 +83,7 @@ on very large clusters, you may want to tweak this load. To do this:

Normally this should not be needed.

### Networking
## Networking

Garbage collection communicates with the lakeFS server. Very large
repositories may require increasing a read timeout. If you run into timeout errors during communication from the Spark job to lakeFS consider increasing these timeouts:
Expand Down
Loading

0 comments on commit 0480105

Please sign in to comment.