Skip to content

Commit

Permalink
Docs: add lakeFS Mount Use Cases and FAQs (#7860)
Browse files Browse the repository at this point in the history
  • Loading branch information
talSofer authored Jun 18, 2024
1 parent 2bbc93f commit 65918ac
Show file tree
Hide file tree
Showing 3 changed files with 80 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/cloud/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ nav_order: 105
* [Managed Garbage Collection]({% link howto/garbage-collection/managed-gc.md %})
* [Private-Link]({% link howto/private-link.md %})
* [Transactional Mirroring]({% link howto/mirroring.md %})
* [lakeFS Mount]({% link reference/mount.md %})
* SOC 2 Type II Compliance

## How lakeFS Cloud interacts with your infrastructure
Expand Down
2 changes: 2 additions & 0 deletions docs/enterprise/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ It provides several benefits over the open-source version:
* [SSO]({% link reference/security/sso.md %}) - lets users sign in with existing credentials from a trusted provider, eliminating separate logins.
* [STS Auth]({% link reference/security/sts-login.md %}) - offers temporary, secure logins using an Identity Provider, simplifying user access and enhancing security.
* [Authenticate to lakeFS with AWS IAM Roles]({% link reference/security/external-principals-aws.md %}) - lets programs authenticate using AWS IAM roles instead of lakeFS credentials, granting access based on IAM policies.
* [lakeFS Mount]({% link reference/mount.md %})

* Support SLA

[Contact Sales](https://lakefs.io/contact-sales/) to get the token for Fluffy.
Expand Down
78 changes: 77 additions & 1 deletion docs/reference/mount.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ lakeFS Cloud
lakeFS Enterprise
{: .label .label-purple }

Private preview
{: .label .label-yellow }


Everest is a complementary binary to lakeFS that allows users to virtually mount a remote lakeFS repository onto a local directory.
Once mounted, users can access the data as if it resides on their local filesystem, using any tool, library, or framework that reads from a local filesystem.
Expand All @@ -24,6 +27,12 @@ This functionality is currently in limited support and is a Read-Only file syste

{% include toc.html %}

## Use Cases

* **Simplified Data Loading**: With lakeFS Mount, there's no need to write custom data loaders or use special SDKs. You can use your existing tools to read files directly from the filesystem.
* **Handle Large-scale Data Without changing Work Habits**: Seamlessly scale from a few local files to millions without changing your tools or workflow. Use the same code from early experimentation all the way to production.
* **Enhanced Data Loading Efficiency**: lakeFS Mount supports billions of files and offers fast data fetching, making it ideal for optimizing GPU utilization and other performance-sensitive tasks.

## Requirements

- For enterprise installations: lakeFS Version `1.25.0` or higher.
Expand Down Expand Up @@ -131,4 +140,71 @@ duckdb "SELECT * FROM read_parquet('pets/labels.parquet')"
everest umount "./pets"
```
[lakectl]: {% link reference/cli.md %}
[lakectl]: {% link reference/cli.md %}
## FAQs
<!-- START EXCLUDE FROM TOC -->
### How do I get started with lakeFS Mount (Everest)?
lakeFS Mount is avaialble for lakeFS Cloud and lakeFS Enterprise customers. Once your setup is complete, [contact us]([email protected]) to access the lakeFS Mounts (beta) binary and follow the provided docs.
* Want to try lakeFS Cloud? [Signup](https://lakefs.cloud/register) for a 30-day free trial.
* Interested in lakeFS Enterprise? [Contact sales](https://lakefs.io/contact-sales/) for a 30-day free license.
### Can I write to lakeFS using lakeFS Mount?
Currently, lakeFS Mount supports read-only file system operations. Write support is on our roadmap and will be added in the future.
### What operating systems are supported by lakeFS Mount?
lakeFS Mount supports Linux and MacOS. Windows support is on the roadmap.
### How can I control access to my data when using lakeFS Mount?
You can use lakeFS’s existing [Role-Based Access Control mechanism](../reference/security/rbac.md), which includes repository and path-level policies. lakeFS Mount translates filesystem operations into lakeFS API operations and authorizes them based on these policies.
### Does data pass through the lakeFS server when using lakeFS Mount?
lakeFS Mount leverages pre-signed URLs to read data directly from the underlying object store, meaning data doesn’t pass through the lakeFS server. By default, presign is disabled. To enable it, use:
```shell
everest mount <lakefs_uri> <mount_directory> --presign
```
### What happens if a lakeFS branch is updated after I mount it?
lakeFS Mount points to the commit that was the HEAD commit of the branch at the time of mounting. This means the local directory reflects the branch state at the time of mounting and does not update with subsequent branch changes.
### When are files downloaded to my local environment?
lakeFS Mount uses a lazy prefetch strategy. Files are not downloaded at mount time or during operations that only inspect file metadata (e.g., `ls`). Files are downloaded only when commands that require file access (e.g., `cat`) are used.
### What are the scale limitations of lakeFS Mount, and what are the recommended configurations for dealing with large datasets?
When using lakeFS Mount, the volume of data accessed by the local machine influences the scale limitations more than the total size of the dataset under the mounted prefix. This is because lakeFS Mount uses a lazy downloading approach, meaning it only downloads the accessed files. lakeFS Mount listing capability is limited to performing efficiently for prefixes containing fewer than 8000 objects, but we are working to increase this limit.
##### Recommended Configuration
Ensure your **cache size** is large enough to accommodate the volume of files being accessed.
### I’m already using lakectl local for working with lakeFS data locally, why should I use lakeFS Mount?
While both lakectl local and lakeFS Mount enable working with lakeFS data locally, they serve different purposes:
##### Use lakectl local
* For enabling lakeFS writes with [lakectl local commit](https://docs.lakefs.io/reference/cli.html#lakectl-local-commit).
* To integrate seamlessly with [Git](https://docs.lakefs.io/integrations/git.html).
##### Use lakeFS Mount
For read-only local data access. lakeFS Mount offers several benefits over lakectl local:
* **Optimized selective data access**: The lazy prefetch strategy saves storage space and reduces latency by only fetching the required data.
* **Reduced initial latency**: Start working on your data immediately without waiting for downloads.
**Note**
Note: Write support for lakeFS Mount is on our roadmap.
{: .note }
<!-- END EXCLUDE FROM TOC -->

0 comments on commit 65918ac

Please sign in to comment.