-
-
Notifications
You must be signed in to change notification settings - Fork 943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove duplicates on large number of files #568
Comments
The dedupe logic is the same across the web and desktop. Moving the issue to |
@im7mortal hey, we've updated the implementation to use file-hashes instead of file-sizes + creation-times, since the former was more reliable. You might still be seeing the older flow, since we might not have computed the hashes for items that were uploaded in the past. If you clear your library and re-upload, the experience will be as you expect it to be. If you run into a crash, please share logs (Help > View logs) with [email protected], we'd love to take a look! |
As I mentioned I also used ente.io on the phone some time and I have other photos which I can't loose. Is there an api to update hash ? I could run long task on my laptop |
We don't have an API to update the hash at the moment :( Could you try running the de-dupe on your mobile device? There we only process those items that have a hash available. In the future we intend to provide a way to de-dupe "similar" images, that would solve for this use case, but we don't have a clear timeline for that at the moment. |
I tried it overnight in android app. It's showed me a spin, I waited some time then went to sleep. As I just mentioned in photos-app#1380, it wasnt clear if ente was doing something or not. When I got up it showed that it found only 2 duplicates , which I loaded this week. |
Sounds like there are only 2 photos that are exact duplicates (with a matching hash)? |
Hi @vishnukvmd, @mnvr , @abhinavkgrd ! I didn't try to clean duplicates from time we discussed it. I saw there is AI feature now which will handle images locally. So I am thinking why do not also generate hashes for all images in the same run? |
Yes. No ETA, but we've sketched out some approaches for using form of perceptual hashing or cosine similarity based deduping, and hope to get around to working on it at some point soon. |
What I understand from this comment. Ente create simple hash on first upload of the image. So I was advised to erase all photos and reupload them to generate hash for all old photos. I understand it, and I don't want to erase my photos and do it manually, but now ⬇️⬇️⬇️⬇️⬇️⬇️ Why don't generate the hash during indexing and do not update metadata? It will be quick effective solution. I mean if we compare with perceptual hashing or cosine similarity which are more sophisticated , the hash generation is extremely straight forward and will not take time to implement it. |
I am trying to sync my google zip file with ente. I tried it to 2 years back #ente-io/photos-web/issues/243 and had problems.
So turn out that it every time created new file. Now I have 2 or 3 duplicates for at least 15 000 files.
So how I can see the possible solution
Some more prehistory . I was trying to migrate to ente in 2021 but I couldn't because of the bugs. Then I had my phone connected to ente. So I just can't remove all files and start google zip import from the scratch.
The text was updated successfully, but these errors were encountered: