Enhance SpaceObServer scan service to execute scans concurrently, with the goal to make scanning of large file system trees faster.
Please note that we are planning to implement this in the upcoming version 7.2
We also have many large filesystems of up to 100+TB in size and 10's of millions of files. Largely HPC data. I have similar experiences as expressed here, but I also find it difficult to scale at the database end. Adding more scanning servers seems to move the bottleneck to the database. So then you need to add more database instances &/or servers and some logic to allocating scan locations to scanning servers and database instances. Gets messy quickly. This may not be the appropriate forum, but I would be interested in know/hearing how others approach these issues.
I also have a number of huge locations to scan and report on. Am I the only one that has to wait overnight to export data from the scans after they complete? I need a list of folders and sizes which are actually displayed in the grid already-- but SpaceObserver seems to want to calcluate the sizes again and that takes many hours and often crashes for me.
Really looking forward to this, we have a NAS location with well over 1 billion files, to do an initial scan would take months, so we had to split up the location into 100's of separate scans and deploy multiple Space Ob Servers just so we can do scans in parallel.
I would say the following things need to change:
1. Allow a single Space Ob Server to execute multiple parallel scans, having to deploy multiple VM's/ Space Ob Server's to do this is a waste of resources in our case.
2. Multithread the initial scans, something like a dir listing at the root and then multiple threads can split off to scan each subdirectory and so on.
3. Increase multithreading of scans after initial, I think the current limit is 32