Hadoop HDFS Commands
Hadoop is a major big data platform. One of the main components of Hadoop is HDFS or the Hadoop Distributed File System. We are going to try to cover all of the Hadoop HDFS commands but we will start with the DFS commands as those are really handy for day to day operations and may people search for those specifically.
Hadoop HDFS DFS Commands
These are what many people look for the most. Other HDFS commands are important but these are especially handy. They allow you to perform all sorts of normal operations on the files that reside on an HDFS file system.
If you want more detail than what we provide here, you can check the Hadoop File System Shell Guide.
When using HDFS, your current working directory will be something like this: /user/
You can use either “hadoop fs” or “hdfs dfs”. We prefer “hdfs dfs”. We are going to use intuitive examples instead of covering every aspect of the exact command usage. A URI would normally look like this hdfs://namenode1/parent/child but you could just use /parent/child if hdfs://namenode1 is already setup in your configuration. We’re going to use hdfs://nn1 for many of our examples.
appendToFile - appends one or more local files to a single destination file
The “-“ in the second command causes it to read from stdin.
hdfs dfs -appendToFile localfile hdfs://nn1/file1
hdfs dfs -appendToFile localfile1 localfile2 hdfs://nn1/file1
hdfs dfs -appendToFile - hdfs://nn1/file1
cat - concatenates files together and prints to stdout
Use “-ignoreCrc” to disable checksum verification.
hdfs dfs -cat hdfs://nn1/file1 hdfs://nn2/file2
checksum - take a checksum
hdfs dfs -checksum hdfs://nn1/file1
chgrp - change the group that owns the files, -R for recursive
Use “-R” for recursive.
hdfs dfs -chgrp group1 hdfs://nn1/file1
hdfs dfs -chgrp -R group1 hdfs://nn1/file1
chmod - change permissions, -R for recursive
Use “-R” for recursive.
You can specify ‘r’, ‘w’, and ‘x’ as you would expect. See the section on HDFS permissions below or check out the official guide for more detail.
hdfs dfs -chmod rwx hdfs://nn1/file1
hdfs dfs -chmod -R rwx hdfs://nn1/file1
hdfs dfs -chmod -R 777 hdfs://nn1/file1
chown - change the owner of the files, -R for recursive
Use “-R” for recursive.
hdfs dfs -chown user1 hdfs://nn1/file1
hdfs dfs -chown -R user1 hdfs://nn1/file1
copyFromLocal - copies a file, source needs to be local
hdfs dfs -copyFromLocal file1 hdfs://nn1/file1
-p | preserve permissions, ownership, access/modify times |
-f | overwrite if exists |
-l | lazy persist |
-d | skip creating temporary file |
copyToLocal - copies a file, destination needs to be local
hdfs dfs -copyToLocal hdfs://nn1/file1 file1
-p | preserve permissions, ownership, modification times, and access times |
-f | overwrite destination if existing |
-ignorecrc | copy even with failed CRC check |
-crc | also copy CRCs |
count - show a count of bytes, files, and dirs under the specified paths
-h | human readable format |
-v | show a header |
-q | show quotas |
-u | limit to only usage and quotas |
-t | show usage and quota for each storage type …. |
-x | exclude snapshots |
-e | for each file, show erasure coding policy |
hdfs dfs -count hdfs://nn1/file1 hdfs://nn2/file2
hdfs dfs -count -q -h -v hdfs://nn1/file1
cp - copy, allows multiple sources if the dest is a dir
-f | force overwrite if exists |
-p | preserve attributes |
hdfs dfs -cp hdfs://nn1/file1 hdfs://nn2/file2
hdfs dfs -cp hdfs://nn1/file1 hdfs://nn1/file2 hdfs://nn1/dir1
createSnapshot - creates a snapshot of a directory (must be snapshottable)
hdfs dfs -createSnapshot hdfs://nn1/dir
hdfs dfs -createSnapshot hdfs://nn1/dir snapshot1
deleteSnapshot - deletes a snapshot from a directory
hdfs dfs -deleteSnapshot hdfs://nn1/dir snapshot1
renameSnapshot - rename a snapshot
hdfs dfs -renameSnapshot hdfs://nn1/dir snapshot1 snapshot2
df - show free space
Use “-h” for human readable format.
hdfs dfs -df hdfs://nn1/dir
du - show directory and file sizes
-s | aggregate |
-h | human readable |
-v | show header line |
-x | exclude snapshots |
hdfs dfs -du hdfs://nn1/dir1
dus - show a summary of file lengths. Don’t use this anymore. It has been deprecated. Use ‘hdfs dfs -du -s’ instead.
expunge - Permanantly delete any files in the trash dir that are older than the retention threshold. Also this creates a new checkpoint.
Use “-immediate” to ignore fs.trash.interval and delete everything in the trash now.
hdfs dfs -expunge
After creating a checkpoint, recently deleted files are moved past that checkpoint. The next run of expunge will permanently delete any file that is in a checkpoint that is older than the value in fs.trash.interval.
find - find matching files and perform an action on them
- default path is the current directory
- default action is to print
Use “-iname” instead of “-name” for case insensitive matching.
hdfs dfs -find hdfs://nn1/ -name test -print
get - copy files to the local filesystem
-p | preserve permissions, ownership, modification times, and access times |
-f | overwrite destination if existing |
-ignorecrc | copy even with failed CRC check |
-crc | also copy CRCs |
hdfs dfs -get hdfs://nn1/file1 localfile
getfacl - show ACLs for dirs and files
Use “-R” for recursive.
hdfs dfs -getfacl hdfs://nn1/file1
hdfs dfs -getfacl -R hdfs://nn1/dir1
getfattr - show extended attributes
-R | recursive |
-n name | dump this attribute value |
-d | dump all extended attribute values |
-e | encode values (“text”, “hex”, “base64”) |
hdfs dfs -getfattr -d hdfs://nn1/file1
hdfs dfs -getfattr -R -n user.myAttr hdfs://nn1/dir1
getmerge - concatenate all files form a source dir and append them to a destination file
-nl | add new lines between files |
-skip-empty-file | no newline for empty files |
hdfs dfs -getmerge -nl hdfs://nn1/src hdfs://nn1/output
hdfs dfs -getmerge -nl hdfs://nn1/src/file1 hdfs://nn1/src/file2 hdfs://nn1/output
head - prints first kilobyte of file
hdfs dfs -head hdfs://nn1/file1
help - gives usage info
hdfs dfs -help
ls - show files
hdfs dfs -ls hdfs://nn1/file1
hdfs dfs -ls hdfs://nn1/dir1
-h | human readable formatting |
-R | recursive |
-t | sort by most recently modified |
-S | sort by size |
-r | reverse order of sort |
-u | for sorting, use access time instead of modification time |
-C | only show paths |
-d | dirs listed as plain files |
-q | use “?” instead of non-printable chars |
-e | show erasure coding policy of files and directories only |
lsr - recursive ls, deprecated, don’t use it
hdfs dfs -lsr hdfs://nn1/data
mkdir - create a directory
Use “-p” to automatically create parent dirs.
hdfs dfs -mkdir hdfs://nn1/user/hadoop/dir1
hdfs dfs -mkdir hdfs://nn1/user/hadoop/dir1 hdfs://nn2/user/hadoop/dir1
moveFromLocal - move file ( source is deleted ), source needs to be local
hdfs dfs -moveFromLocal file1 hdfs://nn1/file1
moveToLocal - will print out the message “Not implemented yet”
hdfs dfs -moveToLocal hdfs://nn1/file1 file1
mv - moves a file, can’t move between file systems
hdfs dfs -mv hdfs://nn1/file1 hdfs://nn1/file2
hdfs dfs -mv hdfs://nn1/file1 hdfs://nn1/file2 hdfs://nn1/dir1
put - copy files from local file system to destination
Use “-“ to read from stdin.
-p | preserve permissions, ownership, access/modify times |
-f | overwrite if exists |
-l | lazy persist |
-d | skip creating temporary file |
hdfs dfs -put localfile hdfs://nn1/file1
hdfs dfs -put -f localfile1 localfile2 hdfs://nn1/dir1
hdfs dfs -put -d localfile hdfs://nn1/file1
hdfs dfs -put - hdfs://nn1/file1
rm - delete a file or move to trash if enabled
-f | no error if file doesn’t exist |
-R | recursive |
-skipTrash | bypass trash, great if you are over quota |
-safely | ask for confirmation if number of files to be deleted is over hadoop.shell.delete.limit.num.files |
hdfs dfs -rm hdfs://nn1/file
hdfs dfs -rm hdfs://nn1/file hdfs://nn1/user/hadoop/emptydir
NOTE - Trash is disabled by default. To enable it, edit core-site.xml and set a value higher than zero for the variable fs.trash.interval.
rmdir - delete a directory
–ignore-fail-on-non-empty | don’t fail if dir isn’t empty and you’re using wild cards |
hdfs dfs -rmdir hdfs://nn1/emptydir
rmr - recursive delete, DEPRECATED, don’t use it
hdfs dfs -rmr hdfs://nn1/file1
setfacl - set Access Control List (ACL)
-b | remove all except the base ACL entries |
-k | remove default ACL |
-R | recursive |
-m | modify ACL, new entries added, old entries kept |
-x | remove specified entries, keep others |
–set | completely replace, acl_spec needs all info, if either access or default enteries are ommited, retain them |
hdfs dfs -setfacl -m user:hadoop:rw- hdfs://nn1/file
hdfs dfs -setfacl -x user:hadoop hdfs://nn1/file
hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- hdfs://nn1/file
hdfs dfs -setfacl -R -m user:hadoop:r-x hdfs://nn1/dir
hdfs dfs -setfacl -m default:user:hadoop:r-x hdfs://nn1/dir
hdfs dfs -setfacl -b hdfs://nn1/file
hdfs dfs -setfacl -k hdfs://nn1/dir
setfattr - set extended attributes
-n name | name to assign to |
-v value | value to assign |
-x name | remove extended attribute |
hdfs dfs -setfattr -n user.myAttr -v myValue hdfs://nn1/file
hdfs dfs -setfattr -n user.noValue hdfs://nn1/file
hdfs dfs -setfattr -x user.myAttr hdfs://nn1/file
setrep - change replication factor for file recursively, ignore EC files
-R | no effect, for backwards compatibilty |
-w | wait for replication, could be very long time |
hdfs dfs -setrep -w 3 hdfs://nn1/dir1
stat - show stats in specified format
hdfs dfs -stat "type:%F perm:%a %u:%g size:%b mtime:%y atime:%x name:%n" hdfs://nn1/file
%a | permissions in octal |
%A | permissions in symbolic |
%b | filesize in bytes |
%F | type |
%g | group name of owner |
%n | name |
%o | block size |
%r | replication |
%u | user name of owner |
%x, %X | access date |
%y, %Y | modification date |
%x and %y | “yyyy-MM-dd HH:mm:ss” |
%X and %Y | milliseconds since January 1, 1970 UTC |
%y | default if format not specified |
tail - show last kilobyte of file
Use “-f” to follow. You can watch appended data as the file is written.
hdfs dfs -tail hdfs://nn1/dir1/file1
test - test the file …
hdfs dfs -test -e hdfs://nn1/file1
-d | return 0 if is directory |
-e | return 0 if exists |
-f | return 0 if is file |
-s | return 0 if not empty |
-w | return 0 if exists and you have write permission |
-r | return 0 if exists and you have read permission |
-z | return 0 if file is zero length |
text - output a zip or TextRecordInputStream file as text
hdfs dfs -text hdfs://nn1/file1
touch - update modification and access times, create empty file if it doesn’t exist
hdfs dfs -touch hdfs://nn1/file1
hdfs dfs -touch -m -t 20180809230000 hdfs://nn1/file1
hdfs dfs -touch -t 20180809230000 hdfs://nn1/file1
hdfs dfs -touch -a hdfs://nn1/file1
-a | only change access time |
-m | only change modification time |
-t | specify time stamp |
-c | don’t create if it doesn’t exist |
touchz - create zero length file, return error if a non-zero length file exists
hdfs dfs -touchz hdfs://nn1/file1
truncate - truncate all matching files to specified length
Use “-w” wait for block recovery to complete. Check the offical doc for more details on this.
hdfs dfs -truncate 55 hdfs://nn1/file1 hdfs://nn1/file2
hdfs dfs -truncate -w 127 hdfs://nn1/file1
usage - get help for a specific command
hdfs dfs -usage command
HDFS Permissions
Files don’t really have a need for execute permission as they aren’t ever meant to be executed. There is no no setuid or setgid bit. The sticky bit does exist
r - read file, list files in directory w - write to file, create and delete files in directory x - view children of a directory
You can also use numbers, for example “777”, “544”, etc.
For more details, check the HDFS Permissions Guide
What is the difference between “hdfs dfs” and “hadoop fs”?
The “hadoop fs” command is actually more of a general purpose filesystem command. It isn’t just a tool for working with HDFS. It can be used to operate on local files, S3, HDFS, and more. The “hadoop dfs” command is used purely for working with HDFS. The “hdfs dfs” command is also used exclusively for working with HDFS and is the preferred command to use.
hadoop fs |
manages all filesystems (local, HDFS, S3, et.) |
hadoop dfs |
specific to HDFS |
hdfs dfs |
recommended command for HDFS |
References
- Hadoop HDFS Commands - Official Docs
- Hadoop File System Shell Guide
- Hadoop fs vs hdfs fs
- Hadoop Commands Guide
- HDFS Permissions Guide
- HDFS Architecture
- HDFS Snapshots