Low Orbit Flux Logo 2 F

Hadoop HDFS Commands

Hadoop is a major big data platform. One of the main components of Hadoop is HDFS or the Hadoop Distributed File System. We are going to try to cover all of the Hadoop HDFS commands but we will start with the DFS commands as those are really handy for day to day operations and may people search for those specifically.

Hadoop HDFS DFS Commands

These are what many people look for the most. Other HDFS commands are important but these are especially handy. They allow you to perform all sorts of normal operations on the files that reside on an HDFS file system.

If you want more detail than what we provide here, you can check the Hadoop File System Shell Guide.

When using HDFS, your current working directory will be something like this: /user/. You can use both relative and absolute paths. The commands usually behave similarly to their Unix equivalents.

You can use either “hadoop fs” or “hdfs dfs”. We prefer “hdfs dfs”. We are going to use intuitive examples instead of covering every aspect of the exact command usage. A URI would normally look like this hdfs://namenode1/parent/child but you could just use /parent/child if hdfs://namenode1 is already setup in your configuration. We’re going to use hdfs://nn1 for many of our examples.

appendToFile - appends one or more local files to a single destination file

The “-“ in the second command causes it to read from stdin.


hdfs dfs -appendToFile localfile hdfs://nn1/file1
hdfs dfs -appendToFile localfile1 localfile2 hdfs://nn1/file1
hdfs dfs -appendToFile - hdfs://nn1/file1

cat - concatenates files together and prints to stdout

Use “-ignoreCrc” to disable checksum verification.


hdfs dfs -cat hdfs://nn1/file1 hdfs://nn2/file2

checksum - take a checksum


hdfs dfs -checksum hdfs://nn1/file1

chgrp - change the group that owns the files, -R for recursive

Use “-R” for recursive.


hdfs dfs -chgrp group1 hdfs://nn1/file1
hdfs dfs -chgrp -R group1 hdfs://nn1/file1

chmod - change permissions, -R for recursive

Use “-R” for recursive.

You can specify ‘r’, ‘w’, and ‘x’ as you would expect. See the section on HDFS permissions below or check out the official guide for more detail.


hdfs dfs -chmod rwx hdfs://nn1/file1
hdfs dfs -chmod -R rwx hdfs://nn1/file1
hdfs dfs -chmod -R 777 hdfs://nn1/file1

chown - change the owner of the files, -R for recursive

Use “-R” for recursive.


hdfs dfs -chown user1 hdfs://nn1/file1
hdfs dfs -chown -R user1 hdfs://nn1/file1

copyFromLocal - copies a file, source needs to be local


hdfs dfs -copyFromLocal file1 hdfs://nn1/file1
-p preserve permissions, ownership, access/modify times
-f overwrite if exists
-l lazy persist
-d skip creating temporary file

copyToLocal - copies a file, destination needs to be local


hdfs dfs -copyToLocal hdfs://nn1/file1 file1
-p preserve permissions, ownership, modification times, and access times
-f overwrite destination if existing
-ignorecrc copy even with failed CRC check
-crc also copy CRCs

count - show a count of bytes, files, and dirs under the specified paths

-h human readable format
-v show a header
-q show quotas
-u limit to only usage and quotas
-t show usage and quota for each storage type ….
-x exclude snapshots
-e for each file, show erasure coding policy

hdfs dfs -count hdfs://nn1/file1 hdfs://nn2/file2
hdfs dfs -count -q -h -v hdfs://nn1/file1

cp - copy, allows multiple sources if the dest is a dir

-f force overwrite if exists
-p preserve attributes

hdfs dfs -cp  hdfs://nn1/file1 hdfs://nn2/file2
hdfs dfs -cp  hdfs://nn1/file1 hdfs://nn1/file2  hdfs://nn1/dir1

createSnapshot - creates a snapshot of a directory (must be snapshottable)


hdfs dfs -createSnapshot hdfs://nn1/dir
hdfs dfs -createSnapshot hdfs://nn1/dir snapshot1

deleteSnapshot - deletes a snapshot from a directory


hdfs dfs -deleteSnapshot hdfs://nn1/dir snapshot1

renameSnapshot - rename a snapshot


hdfs dfs -renameSnapshot hdfs://nn1/dir snapshot1 snapshot2

df - show free space

Use “-h” for human readable format.


hdfs dfs -df hdfs://nn1/dir

du - show directory and file sizes

-s aggregate
-h human readable
-v show header line
-x exclude snapshots

hdfs dfs -du hdfs://nn1/dir1

dus - show a summary of file lengths. Don’t use this anymore. It has been deprecated. Use ‘hdfs dfs -du -s’ instead.

expunge - Permanantly delete any files in the trash dir that are older than the retention threshold. Also this creates a new checkpoint.

Use “-immediate” to ignore fs.trash.interval and delete everything in the trash now.


hdfs dfs -expunge

After creating a checkpoint, recently deleted files are moved past that checkpoint. The next run of expunge will permanently delete any file that is in a checkpoint that is older than the value in fs.trash.interval.

find - find matching files and perform an action on them

Use “-iname” instead of “-name” for case insensitive matching.


hdfs dfs -find hdfs://nn1/ -name test -print

get - copy files to the local filesystem

-p preserve permissions, ownership, modification times, and access times
-f overwrite destination if existing
-ignorecrc copy even with failed CRC check
-crc also copy CRCs

hdfs dfs -get hdfs://nn1/file1 localfile

getfacl - show ACLs for dirs and files

Use “-R” for recursive.


hdfs dfs -getfacl hdfs://nn1/file1
hdfs dfs -getfacl -R hdfs://nn1/dir1

getfattr - show extended attributes

-R recursive
-n name dump this attribute value
-d dump all extended attribute values
-e encode values (“text”, “hex”, “base64”)

hdfs dfs -getfattr -d hdfs://nn1/file1
hdfs dfs -getfattr -R -n user.myAttr hdfs://nn1/dir1

getmerge - concatenate all files form a source dir and append them to a destination file

-nl add new lines between files
-skip-empty-file no newline for empty files

hdfs dfs -getmerge -nl hdfs://nn1/src hdfs://nn1/output
hdfs dfs -getmerge -nl hdfs://nn1/src/file1 hdfs://nn1/src/file2 hdfs://nn1/output

head - prints first kilobyte of file


hdfs dfs -head hdfs://nn1/file1

help - gives usage info


hdfs dfs -help

ls - show files


hdfs dfs -ls hdfs://nn1/file1
hdfs dfs -ls hdfs://nn1/dir1
-h human readable formatting
-R recursive
-t sort by most recently modified
-S sort by size
-r reverse order of sort
-u for sorting, use access time instead of modification time
-C only show paths
-d dirs listed as plain files
-q use “?” instead of non-printable chars
-e show erasure coding policy of files and directories only

lsr - recursive ls, deprecated, don’t use it


hdfs dfs -lsr hdfs://nn1/data

mkdir - create a directory

Use “-p” to automatically create parent dirs.


hdfs dfs -mkdir hdfs://nn1/user/hadoop/dir1
hdfs dfs -mkdir hdfs://nn1/user/hadoop/dir1 hdfs://nn2/user/hadoop/dir1

moveFromLocal - move file ( source is deleted ), source needs to be local


hdfs dfs -moveFromLocal file1 hdfs://nn1/file1

moveToLocal - will print out the message “Not implemented yet”


hdfs dfs -moveToLocal hdfs://nn1/file1 file1

mv - moves a file, can’t move between file systems


hdfs dfs -mv hdfs://nn1/file1 hdfs://nn1/file2
hdfs dfs -mv hdfs://nn1/file1 hdfs://nn1/file2 hdfs://nn1/dir1

put - copy files from local file system to destination

Use “-“ to read from stdin.

-p preserve permissions, ownership, access/modify times
-f overwrite if exists
-l lazy persist
-d skip creating temporary file

hdfs dfs -put localfile hdfs://nn1/file1
hdfs dfs -put -f localfile1 localfile2 hdfs://nn1/dir1
hdfs dfs -put -d localfile hdfs://nn1/file1
hdfs dfs -put - hdfs://nn1/file1

rm - delete a file or move to trash if enabled

-f no error if file doesn’t exist
-R recursive
-skipTrash bypass trash, great if you are over quota
-safely ask for confirmation if number of files to be deleted is over hadoop.shell.delete.limit.num.files

hdfs dfs -rm hdfs://nn1/file
hdfs dfs -rm hdfs://nn1/file hdfs://nn1/user/hadoop/emptydir

NOTE - Trash is disabled by default. To enable it, edit core-site.xml and set a value higher than zero for the variable fs.trash.interval.

rmdir - delete a directory

–ignore-fail-on-non-empty don’t fail if dir isn’t empty and you’re using wild cards

hdfs dfs -rmdir hdfs://nn1/emptydir

rmr - recursive delete, DEPRECATED, don’t use it


hdfs dfs -rmr hdfs://nn1/file1

setfacl - set Access Control List (ACL)

-b remove all except the base ACL entries
-k remove default ACL
-R recursive
-m modify ACL, new entries added, old entries kept
-x remove specified entries, keep others
–set completely replace, acl_spec needs all info, if either access or default enteries are ommited, retain them

hdfs dfs -setfacl -m user:hadoop:rw- hdfs://nn1/file
hdfs dfs -setfacl -x user:hadoop hdfs://nn1/file
hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- hdfs://nn1/file
hdfs dfs -setfacl -R -m user:hadoop:r-x hdfs://nn1/dir
hdfs dfs -setfacl -m default:user:hadoop:r-x hdfs://nn1/dir
hdfs dfs -setfacl -b hdfs://nn1/file
hdfs dfs -setfacl -k hdfs://nn1/dir

setfattr - set extended attributes

-n name name to assign to
-v value value to assign
-x name remove extended attribute

hdfs dfs -setfattr -n user.myAttr -v myValue hdfs://nn1/file
hdfs dfs -setfattr -n user.noValue hdfs://nn1/file
hdfs dfs -setfattr -x user.myAttr hdfs://nn1/file

setrep - change replication factor for file recursively, ignore EC files

-R no effect, for backwards compatibilty
-w wait for replication, could be very long time

hdfs dfs -setrep -w 3 hdfs://nn1/dir1

stat - show stats in specified format


hdfs dfs -stat "type:%F perm:%a %u:%g size:%b mtime:%y atime:%x name:%n" hdfs://nn1/file
%a permissions in octal
%A permissions in symbolic
%b filesize in bytes
%F type
%g group name of owner
%n name
%o block size
%r replication
%u user name of owner
%x, %X access date
%y, %Y modification date
%x and %y “yyyy-MM-dd HH:mm:ss”
%X and %Y milliseconds since January 1, 1970 UTC
%y default if format not specified

tail - show last kilobyte of file

Use “-f” to follow. You can watch appended data as the file is written.


hdfs dfs -tail hdfs://nn1/dir1/file1

test - test the file …


hdfs dfs -test -e hdfs://nn1/file1
-d return 0 if is directory
-e return 0 if exists
-f return 0 if is file
-s return 0 if not empty
-w return 0 if exists and you have write permission
-r return 0 if exists and you have read permission
-z return 0 if file is zero length

text - output a zip or TextRecordInputStream file as text


hdfs dfs -text hdfs://nn1/file1

touch - update modification and access times, create empty file if it doesn’t exist


hdfs dfs -touch hdfs://nn1/file1
hdfs dfs -touch -m -t 20180809230000 hdfs://nn1/file1
hdfs dfs -touch -t 20180809230000 hdfs://nn1/file1
hdfs dfs -touch -a hdfs://nn1/file1
-a only change access time
-m only change modification time
-t specify time stamp
-c don’t create if it doesn’t exist

touchz - create zero length file, return error if a non-zero length file exists


hdfs dfs -touchz hdfs://nn1/file1

truncate - truncate all matching files to specified length

Use “-w” wait for block recovery to complete. Check the offical doc for more details on this.


hdfs dfs -truncate 55 hdfs://nn1/file1 hdfs://nn1/file2
hdfs dfs -truncate -w 127 hdfs://nn1/file1

usage - get help for a specific command


hdfs dfs -usage command

HDFS Permissions

Files don’t really have a need for execute permission as they aren’t ever meant to be executed. There is no no setuid or setgid bit. The sticky bit does exist

r - read file, list files in directory w - write to file, create and delete files in directory x - view children of a directory

You can also use numbers, for example “777”, “544”, etc.

For more details, check the HDFS Permissions Guide

What is the difference between “hdfs dfs” and “hadoop fs”?

The “hadoop fs” command is actually more of a general purpose filesystem command. It isn’t just a tool for working with HDFS. It can be used to operate on local files, S3, HDFS, and more. The “hadoop dfs” command is used purely for working with HDFS. The “hdfs dfs” command is also used exclusively for working with HDFS and is the preferred command to use.

hadoop fs manages all filesystems (local, HDFS, S3, et.)
hadoop dfs specific to HDFS
hdfs dfs recommended command for HDFS

References