It should only drop partitions that were successfully added.
Basically, this means that the create operation should only be recorded after the create was successful., Right now that happens before. In addPartition():
Two aspects here: whether to delete the partition, and whether to delete the files.
the partition should only be deleted if it did not exist before . (That is, if the tx fails because the addPartition() finds an existing partition, we cannot delete the existing partition as part of rollback).
for the files there are several cases to be distinguished:
A program registers an existing file system location as a new partition. If the transaction fails for any reason, the partition should be removed from Hive (it already is removed from the PFS's partition table due to tx rollback).
A program wants to write some data to the file system in order to create a new partition. It gets a PartitionOutput from the PFS and uses its location to write the files. Then calls addPartition. In this case, the files should also be removed if the transaction fails (unless they existed before the getPartitionOutput() call).
A MapReduce writes data that needs to be added as a new partition. If the MapReduce fails, the files should also be removed (unless they existed prior to the MapReduce run)
Fixes an issue where a transaction failure could leave a PartitionedFileSet in an inconsistent state.
During testing I found this additional problem. It is quite complex to fix. I am considering this one (8426) resolved anyway because it fixes the bug of incorrectly dropping partitions and deleting files.