diff --git a/docs/intro/windows.rst b/docs/intro/windows.rst index 15cb4ebf3..af9043041 100644 --- a/docs/intro/windows.rst +++ b/docs/intro/windows.rst @@ -32,9 +32,9 @@ Windows also has insufficient support for :term:`symlink`\ing and locking files In addition, Windows has a (default) `maximum path length limitation of only 260 characters `_. However, DataLad (or rather, :term:`git-annex`) relies on `file content hashing `_ to ensure file integrity. Usually, the *longer* the `hash` that is created, the more fail-safe it is. -For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 256 characters long hash) as their default. -As git-annex represents files with their content hash as a name, though, a secure 256 character file name is too long for Windows. -Datasets thus adjust this default to a 128 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_. +For a general idea about the length of hashes, consider that many tools including :term:`git-annex` use ``SHA256`` (a 64 characters long hash) as their default. +As git-annex represents files with their content hash as their name, and places them into a directory of the same name, half of the total path length is already used up with a ``SHA256`` hash. +Datasets thus adjust this default to a 32 character hash [#f2]_, but still, if you place a DataLad dataset into a deeply nested directory location, you may run into issues due to hitting the path length limit [#f3]_. You *can* enable long paths in recent builds of Windows 10, `but it requires some tweaking `_. Windows also doesn't really come with a decent :term:`terminal`. @@ -165,7 +165,7 @@ Take a look at user forums such as `forums.linuxmint.com `_, a hash function that produces a 128 character hash value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers. +.. [#f2] The path length limitation on Windows is the reason that DataLad datasets always use hashes based on `MD5 `_, a hash function that produces a 32 character hash digest value. This wouldn't be necessary on Unix-based operating systems, but is required to ensure portability of datasets to Windows computers. .. [#f3] The path length limitation certainly isn't only a problem for DataLad and its underlying tools. Many users run into a Path length related problems at least once, by accident. Downloading or copying files with long names into a folder that itself has a long name, for example, can become an unexpected issue (especially if you are not aware of the limit). Imagine transferring pictures from your friends camera into ``C:\Users\"Bob McBobface"\Desktop\Pictures\"Vacation Pictures"\2020\Saint-Remy-en-Bouzemont-Saint-Genest-et-Isson\"From Alice and Sasha"\Camera\`` -- those file names shouldn't be too long to fit in the limit. Likewise, when ``git clone``\ing a :term:`Git` repository that was created on a Unix computer and contains very long file names could fail.