I'm running preview of a pipeline about 4 times per hour on a CDAP Sandbox instance. Within about 11 hours, the CDAP Sandbox has hit its open file limit of ~4k.
Each run seems to leak about 40-100 files, and they mostly seem to be files used for LevelDB.
Fixed a resource leak in preview feature.
Two users may be running preview independently, right? So it wouldn't be good to lower that value from to 1.
There is a continued leak even after that limit of 10 is encountered.
See the attached graph of open file usage. The first 10 bumps are large, but even when we run preview more, there is an increase.
Its unclear from the graph at which point we reach to the limit of 10. Is it at 22:00 or happens way earlier than that? Any idea why the graph is flat after that at the value of 3.8K?
One thing we can try is - set PREVIEW_CACHE_SIZE to 1 and launch 1 preview run. Get the lsof of the process. Launch another preview run. Since cache size is 1 the leveldb for the older run should get deleted. Take the lsof output again and see if the file descriptors for the deleted directories are still hold by the process.
The graph shows the runs from the beginning, so the 10th step (counting from the left) is the 10th run.
The graph is flat after the value of 3.8K because additional preview runs fail to run (due to "Too many open files" errors).
mentioned that the leak is probably due to not closing the DB object in LevelDBTableService.
Yeah thats true
We are deleting the directory but not closing the underlying DB in LevelDBTableService which causes process to hold fds for the deleted files as well.
Fix implemented in https://github.com/caskdata/cdap/pull/10535.