5

Deduplication via Relative Symlinks (explicit)


N
Norman Richter

When I got it right, atm the app decides whether to do deduplication via hardlink or absolute symbolic link depending on drives.
An explicit option to choose relative symbolic links for deduplication within the same drive would be nice.

A

Activity Newest / Oldest

Avatar

Team TreeSize

We will most likely implement this feature, but due to the low number of votes this does not have a high priority and will not happen in the near future.


H

hmadsen

I am using Treesize to prepare to migrate files from shared disk locations to an enterprise content management system solution (Laserfiche). I used the deduplication feature to create hardlinks for all duplicate files in the file location, but when I did a test migration of the files, the hardlinks were replaced with the files they were duplicates of due to how the hardlinks are perceived by the migration software and windows. I came up with a process to replace symbolic links with persistent links in the new enterprise content management system (so the change in environment location won't affect the functionality of the links themselves), but it requires replacing all file duplicates with symbolic links to the master files. Could this functionality be added?


Avatar

Team TreeSize

Merged with: Replace duplicate files with symbolic links

Avatar

Team TreeSize

We could extend the dialog to replace duplicates by links with the option to use symbolic links instead of hardlinks in all cases. Would this fit your needs?

While all hardlinks to a file are equal, there needs to be one original among the duplicates to that the other symbolic links point. How would you choose this original? Would you specify a drive or path that should work as the original / reference location?


  • H
H

hmadsen

yes it would! Being able to choose a criteria for the original (edited date, location) would be helpful, but at this point I'm really seeking to reduce the quantity of copies that we have throughout our fileshare (and going into our new system, which has no meaningful way of deduplicating files).


N

Norman Richter


Avatar

Team TreeSize

Status changed to: Open

Avatar

Team TreeSize

Status changed to: Under review

Avatar

Team TreeSize

May I ask why you would prefer relative symlinks over hardlinks? I don't see any advantages.


N

Norman Richter

For me relative symlinks can be more useful than hardlinks in some special scenarios.
E.g. if you want to make a linking explicit, so that the user recognizes on first sight that he is working on duplicates (or links).
On Windows a symlink really tells "I am a link" and therefore kind of a duplicate.
For example if you take a selection / aggregation out of a bigger collection and store it in another place on the same drive. It could be an important hint for the one who finds the selection (containing out of symlinks) that there is a bigger source and where to find it. This meta-information is lost without special searching, if you would make the selection just out of hardlinks - they seem kind of decoupled if you view them through Windows standard tools that don't support special hardlink handling.
Sometimes you just wish to keep this coupling between link and source in an explicit way because it creates some meta-information.


  • H
H

hmadsen

I'll add a case here that could illustrate other values in a file migration context:

when we're migrating files from our shared drive to our new content management system (Laserfiche), hardlinks are read as being the same as the original files, so the imported files are restored to duplicates upon import to the new system.

In contrast, if we had deduplication with symbolic links, the symbolic links would register as links when migrating, and using Treesize's ability to identify the target of those symbolic links, I could produce a report, then re-map the symbolic links and replace them with true links using a workflow inside of the new system (whereas I wouldn't be able to otherwise because hard links would be indistinguishable from the master version of the file).