Difference between hadoop fs -put and hadoop fs -copyFromLocal

HadoopHdfs

Hadoop Problem Overview


-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?

Same thing for -get and -copyToLocal

Hadoop Solutions


Solution 1 - Hadoop

-copyFromLocal is similar to -put command, except that the source is restricted to a local file reference.

So basically, you can do with put, all that you do with -copyFromLocal, but not vice-versa.

Similarly,

-copyToLocal is similar to get command, except that the destination is restricted to a local file reference.

Hence, you can use get instead of -copyToLocal, but not the other way round.

Reference: Hadoop's documentation.

Update: For the latest as of Oct 2015, please see this answer below.

Solution 2 - Hadoop

Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.

Solution 3 - Hadoop

Despite what is claimed by the documentation, as of now (Oct. 2015), both -copyFromLocal and -put are the same.

From the online help:

[cloudera@quickstart ~]$ hdfs dfs -help copyFromLocal 
-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :
  Identical to the -put command.

And this is confirmed by looking at the sources, where you can see that the CopyFromLocal class extends the Put class, but without adding any new behavior:

  public static class CopyFromLocal extends Put {
    public static final String NAME = "copyFromLocal";
    public static final String USAGE = Put.USAGE;
    public static final String DESCRIPTION = "Identical to the -put command.";
  }
 
  public static class CopyToLocal extends Get {
    public static final String NAME = "copyToLocal";
    public static final String USAGE = Get.USAGE;
    public static final String DESCRIPTION = "Identical to the -get command.";
  }

As you might notice it, this is exactly the same for get/copyToLocal.

Solution 4 - Hadoop

  • both are the same except
  • -copyFromLocal is restricted to copy from local while -put can take file from any (other HDFS/local filesystem/..)

Solution 5 - Hadoop

Both -put & -copyFromLocal commands work exactly the same. You cannot use -put command to copy files from one HDFS directory to another. Let's see this with an example: say your root has two directories, named 'test1' and 'test2'. If 'test1' contains a file 'customer.txt' and you try copying it to test2 directory

$ hadoop fs -put /test1/customer.txt /test2

It will result in 'no such file or directory' error since 'put' will look for the file in the local file system and not hdfs. They are both meant to copy files (or directories) from the local file system to HDFS, only.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsnappyView Question on Stackoverflow
Solution 1 - HadoopOzair KafrayView Answer on Stackoverflow
Solution 2 - HadoopThomas JungblutView Answer on Stackoverflow
Solution 3 - HadoopSylvain LerouxView Answer on Stackoverflow
Solution 4 - HadoopManish AgrawalView Answer on Stackoverflow
Solution 5 - HadoopRoney JView Answer on Stackoverflow