HDFS How to Delete Files
To delete a file in HDFS you would generally use the “rm” command. Here is an example:
hdfs dfs -rm hdfs://nn1/file
You can use the “-r” or “-R” flags to delete recursively. This is needed for deleting directories. Everything within the directory will be removed. Here is another example:
hdfs dfs -rm -R hdfs://nn1/user/hadoop/emptydir
Check out our HDFS directory deletion guide HERE.
You can explicitly specify more than one to delete with a single command like this:
hdfs dfs -rm hdfs://nn1/file hdfs://nn1/user/hadoop/emptydir
If you want to delete files permanently without sending them to the trash you can use the “-skipTrash” option. This is great for situations where you are over quota. Keep in mind the trash may not even be enabled in the first place, so you should be aware of that. Here is an example:
hdfs dfs -rm -skipTrash hdfs://nn1/file
If you want you can specify that the command should ask for confirmation before deleting when the number of files that will be deleted is over a certain threshold. That threshold is defined in this variable: hadoop.shell.delete.limit.num.files. You can enable asking for confirmation with this flag: “-safely”. Here is an example
hdfs dfs -rm -safely hdfs://nn1/file
Delay Freeing Space After File Deletion on HDFS
It is worth noting that after a file has been deleted from HDFS space may not become free immediately. Even if you aren’t using the trash and even when you clear the trash, the space could still be held for some time.
One reason this happens is that the file deletion needs to be replicated to all nodes in the cluster. This can take time. The files can be spread across multiple servers and multiple disks. The deletion can occur in the background after the command has been run. If the delay in freeing space is caused by this then it will be cleared by itself once it finishes.