DistCp (distributed copy) is a tool generally used for large inter/intra-cluster copying in hadoop.
But it can also be used to copy the files from local file system to hadoop hdfs.
To test this i have created around 3000+ files in my files system.
My local filesytem : /home/rajesh/testfiles
rajesh@namenode1:~/testfiles$ ls -lrt |wc -l
3133
HDFS Directory: (I haven't created the folder in hdfs)
rajesh@namenode1:~/testfiles$ hadoop fs -ls /user/rajesh
Found 5 items
drwx------ - rajesh hdfs 0 2016-07-18 06:59 /user/rajesh/.Trash
drwx------ - rajesh hdfs 0 2016-07-18 06:06 /user/rajesh/.staging
-rw-r--r-- 3 rajesh hdfs 428959 2016-07-05 07:54 /user/rajesh/Hadoop_Tuning_Guide-Version5.pdf
drwxr-xr-x - rajesh hdfs 0 2016-07-05 07:27 /user/rajesh/hive
Command to Copy:
hadoop distcp file:///home/rajesh/testfiles /user/rajesh
Logs:
rajesh@namenode1:~/testfiles$ hadoop distcp file:///home/rajesh/testfiles /user/rajesh
16/07/18 07:00:50 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[file:/home/rajesh/testfiles], targetPath=/user/rajesh, targetPathExists=true, preserveRawXattrs=false}
16/07/18 07:00:52 INFO impl.TimelineClientImpl: Timeline service address: http://namenode1.rajesh.com:8188/ws/v1/timeline/
16/07/18 07:00:52 INFO client.RMProxy: Connecting to ResourceManager at namenode1.rajesh.com/192.168.0.100:8050
16/07/18 07:01:28 INFO impl.TimelineClientImpl: Timeline service address: http://namenode1.rajesh.com:8188/ws/v1/timeline/
16/07/18 07:01:28 INFO client.RMProxy: Connecting to ResourceManager at namenode1.rajesh.com/192.168.0.100:8050
16/07/18 07:01:31 INFO mapreduce.JobSubmitter: number of splits:21
16/07/18 07:01:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468835763123_0002
16/07/18 07:01:33 INFO impl.YarnClientImpl: Submitted application application_1468835763123_0002
16/07/18 07:01:33 INFO mapreduce.Job: The url to track the job: http://namenode1.rajesh.com:8088/proxy/application_1468835763123_0002/
16/07/18 07:01:33 INFO tools.DistCp: DistCp job-id: job_1468835763123_0002
16/07/18 07:01:33 INFO mapreduce.Job: Running job: job_1468835763123_0002
16/07/18 07:01:57 INFO mapreduce.Job: Job job_1468835763123_0002 running in uber mode : false
16/07/18 07:01:57 INFO mapreduce.Job: map 0% reduce 0%
16/07/18 07:02:29 INFO mapreduce.Job: map 1% reduce 0%
16/07/18 07:02:32 INFO mapreduce.Job: map 2% reduce 0%
16/07/18 07:02:37 INFO mapreduce.Job: map 3% reduce 0%
16/07/18 07:02:38 INFO mapreduce.Job: map 4% reduce 0%
16/07/18 07:02:42 INFO mapreduce.Job: map 5% reduce 0%
16/07/18 07:02:45 INFO mapreduce.Job: map 6% reduce 0%
16/07/18 07:02:48 INFO mapreduce.Job: map 7% reduce 0%
16/07/18 07:02:51 INFO mapreduce.Job: map 8% reduce 0%
16/07/18 07:02:53 INFO mapreduce.Job: map 9% reduce 0%
16/07/18 07:02:54 INFO mapreduce.Job: map 10% reduce 0%
16/07/18 07:02:57 INFO mapreduce.Job: map 11% reduce 0%
16/07/18 07:03:00 INFO mapreduce.Job: map 12% reduce 0%
16/07/18 07:03:01 INFO mapreduce.Job: map 13% reduce 0%
16/07/18 07:03:03 INFO mapreduce.Job: map 14% reduce 0%
16/07/18 07:03:26 INFO mapreduce.Job: map 15% reduce 0%
16/07/18 07:03:31 INFO mapreduce.Job: map 16% reduce 0%
16/07/18 07:03:37 INFO mapreduce.Job: map 17% reduce 0%
16/07/18 07:03:40 INFO mapreduce.Job: map 18% reduce 0%
16/07/18 07:03:46 INFO mapreduce.Job: map 19% reduce 0%
16/07/18 07:03:49 INFO mapreduce.Job: map 20% reduce 0%
16/07/18 07:03:52 INFO mapreduce.Job: map 21% reduce 0%
16/07/18 07:03:55 INFO mapreduce.Job: map 22% reduce 0%
16/07/18 07:03:58 INFO mapreduce.Job: map 23% reduce 0%
16/07/18 07:04:01 INFO mapreduce.Job: map 24% reduce 0%
16/07/18 07:04:02 INFO mapreduce.Job: map 25% reduce 0%
16/07/18 07:04:05 INFO mapreduce.Job: map 26% reduce 0%
16/07/18 07:04:08 INFO mapreduce.Job: map 27% reduce 0%
16/07/18 07:04:10 INFO mapreduce.Job: map 28% reduce 0%
16/07/18 07:04:12 INFO mapreduce.Job: map 29% reduce 0%
16/07/18 07:04:44 INFO mapreduce.Job: map 30% reduce 0%
16/07/18 07:04:47 INFO mapreduce.Job: map 31% reduce 0%
16/07/18 07:04:50 INFO mapreduce.Job: map 32% reduce 0%
16/07/18 07:04:53 INFO mapreduce.Job: map 33% reduce 0%
16/07/18 07:04:56 INFO mapreduce.Job: map 34% reduce 0%
16/07/18 07:05:01 INFO mapreduce.Job: map 35% reduce 0%
16/07/18 07:05:02 INFO mapreduce.Job: map 36% reduce 0%
16/07/18 07:05:05 INFO mapreduce.Job: map 37% reduce 0%
16/07/18 07:05:08 INFO mapreduce.Job: map 38% reduce 0%
16/07/18 07:05:10 INFO mapreduce.Job: map 39% reduce 0%
16/07/18 07:05:13 INFO mapreduce.Job: map 40% reduce 0%
16/07/18 07:05:15 INFO mapreduce.Job: map 41% reduce 0%
16/07/18 07:05:18 INFO mapreduce.Job: map 42% reduce 0%
16/07/18 07:05:20 INFO mapreduce.Job: map 43% reduce 0%
16/07/18 07:05:51 INFO mapreduce.Job: map 44% reduce 0%
16/07/18 07:05:55 INFO mapreduce.Job: map 45% reduce 0%
16/07/18 07:05:58 INFO mapreduce.Job: map 46% reduce 0%
16/07/18 07:06:01 INFO mapreduce.Job: map 47% reduce 0%
16/07/18 07:06:04 INFO mapreduce.Job: map 48% reduce 0%
16/07/18 07:06:05 INFO mapreduce.Job: map 49% reduce 0%
16/07/18 07:06:08 INFO mapreduce.Job: map 50% reduce 0%
16/07/18 07:06:10 INFO mapreduce.Job: map 51% reduce 0%
16/07/18 07:06:13 INFO mapreduce.Job: map 52% reduce 0%
16/07/18 07:06:14 INFO mapreduce.Job: map 53% reduce 0%
16/07/18 07:06:17 INFO mapreduce.Job: map 54% reduce 0%
16/07/18 07:06:19 INFO mapreduce.Job: map 55% reduce 0%
16/07/18 07:06:22 INFO mapreduce.Job: map 56% reduce 0%
16/07/18 07:06:23 INFO mapreduce.Job: map 57% reduce 0%
16/07/18 07:06:52 INFO mapreduce.Job: map 58% reduce 0%
16/07/18 07:06:55 INFO mapreduce.Job: map 59% reduce 0%
16/07/18 07:06:59 INFO mapreduce.Job: map 60% reduce 0%
16/07/18 07:07:02 INFO mapreduce.Job: map 61% reduce 0%
16/07/18 07:07:05 INFO mapreduce.Job: map 62% reduce 0%
16/07/18 07:07:08 INFO mapreduce.Job: map 63% reduce 0%
16/07/18 07:07:11 INFO mapreduce.Job: map 64% reduce 0%
16/07/18 07:07:14 INFO mapreduce.Job: map 65% reduce 0%
16/07/18 07:07:16 INFO mapreduce.Job: map 66% reduce 0%
16/07/18 07:07:19 INFO mapreduce.Job: map 67% reduce 0%
16/07/18 07:07:20 INFO mapreduce.Job: map 68% reduce 0%
16/07/18 07:07:23 INFO mapreduce.Job: map 69% reduce 0%
16/07/18 07:07:27 INFO mapreduce.Job: map 70% reduce 0%
16/07/18 07:07:30 INFO mapreduce.Job: map 71% reduce 0%
16/07/18 07:07:50 INFO mapreduce.Job: map 72% reduce 0%
16/07/18 07:07:57 INFO mapreduce.Job: map 73% reduce 0%
16/07/18 07:08:03 INFO mapreduce.Job: map 74% reduce 0%
16/07/18 07:08:10 INFO mapreduce.Job: map 75% reduce 0%
16/07/18 07:08:13 INFO mapreduce.Job: map 76% reduce 0%
16/07/18 07:08:17 INFO mapreduce.Job: map 77% reduce 0%
16/07/18 07:08:20 INFO mapreduce.Job: map 78% reduce 0%
16/07/18 07:08:22 INFO mapreduce.Job: map 79% reduce 0%
16/07/18 07:08:25 INFO mapreduce.Job: map 80% reduce 0%
16/07/18 07:08:28 INFO mapreduce.Job: map 81% reduce 0%
16/07/18 07:08:31 INFO mapreduce.Job: map 82% reduce 0%
16/07/18 07:08:34 INFO mapreduce.Job: map 83% reduce 0%
16/07/18 07:08:39 INFO mapreduce.Job: map 84% reduce 0%
16/07/18 07:08:43 INFO mapreduce.Job: map 85% reduce 0%
16/07/18 07:08:55 INFO mapreduce.Job: map 86% reduce 0%
16/07/18 07:09:05 INFO mapreduce.Job: map 87% reduce 0%
16/07/18 07:09:12 INFO mapreduce.Job: map 88% reduce 0%
16/07/18 07:09:18 INFO mapreduce.Job: map 89% reduce 0%
16/07/18 07:09:21 INFO mapreduce.Job: map 90% reduce 0%
16/07/18 07:09:22 INFO mapreduce.Job: map 91% reduce 0%
16/07/18 07:09:24 INFO mapreduce.Job: map 92% reduce 0%
16/07/18 07:09:27 INFO mapreduce.Job: map 94% reduce 0%
16/07/18 07:09:30 INFO mapreduce.Job: map 95% reduce 0%
16/07/18 07:09:33 INFO mapreduce.Job: map 96% reduce 0%
16/07/18 07:09:34 INFO mapreduce.Job: map 97% reduce 0%
16/07/18 07:09:35 INFO mapreduce.Job: map 98% reduce 0%
16/07/18 07:09:38 INFO mapreduce.Job: map 99% reduce 0%
16/07/18 07:09:41 INFO mapreduce.Job: map 100% reduce 0%
16/07/18 07:09:43 INFO mapreduce.Job: Job job_1468835763123_0002 completed successfully
16/07/18 07:09:43 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=7228656
FILE: Number of bytes written=2873945
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=559335
HDFS: Number of bytes written=7228656
HDFS: Number of read operations=22093
HDFS: Number of large read operations=0
HDFS: Number of write operations=6308
Job Counters
Launched map tasks=21
Other local map tasks=21
Total time spent by all maps in occupied slots (ms)=1341918
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1341918
Total vcore-seconds taken by all map tasks=1341918
Total megabyte-seconds taken by all map tasks=1374124032
Map-Reduce Framework
Map input records=3133
Map output records=0
Input split bytes=2457
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=9853
CPU time spent (ms)=240430
Physical memory (bytes) snapshot=4918603776
Virtual memory (bytes) snapshot=70847594496
Total committed heap usage (bytes)=3467640832
File Input Format Counters
Bytes Read=556878
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=7228656
BYTESEXPECTED=7228656
COPY=3133
Validating the files copied:
rajesh@namenode1:~/testfiles$ hadoop fs -ls /user/rajesh/testfiles/
Found 3132 items
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:06 /user/rajesh/testfiles/1.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:07 /user/rajesh/testfiles/10001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:04 /user/rajesh/testfiles/10001x_0001x_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:07 /user/rajesh/testfiles/10001x_0001xy2z.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001xy2z_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001xy3z.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001xy1000z_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:05 /user/rajesh/testfiles/10001xy1000z_0001x_0001x.txt
But it can also be used to copy the files from local file system to hadoop hdfs.
To test this i have created around 3000+ files in my files system.
My local filesytem : /home/rajesh/testfiles
rajesh@namenode1:~/testfiles$ ls -lrt |wc -l
3133
HDFS Directory: (I haven't created the folder in hdfs)
rajesh@namenode1:~/testfiles$ hadoop fs -ls /user/rajesh
Found 5 items
drwx------ - rajesh hdfs 0 2016-07-18 06:59 /user/rajesh/.Trash
drwx------ - rajesh hdfs 0 2016-07-18 06:06 /user/rajesh/.staging
-rw-r--r-- 3 rajesh hdfs 428959 2016-07-05 07:54 /user/rajesh/Hadoop_Tuning_Guide-Version5.pdf
drwxr-xr-x - rajesh hdfs 0 2016-07-05 07:27 /user/rajesh/hive
Command to Copy:
hadoop distcp file:///home/rajesh/testfiles /user/rajesh
Logs:
rajesh@namenode1:~/testfiles$ hadoop distcp file:///home/rajesh/testfiles /user/rajesh
16/07/18 07:00:50 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[file:/home/rajesh/testfiles], targetPath=/user/rajesh, targetPathExists=true, preserveRawXattrs=false}
16/07/18 07:00:52 INFO impl.TimelineClientImpl: Timeline service address: http://namenode1.rajesh.com:8188/ws/v1/timeline/
16/07/18 07:00:52 INFO client.RMProxy: Connecting to ResourceManager at namenode1.rajesh.com/192.168.0.100:8050
16/07/18 07:01:28 INFO impl.TimelineClientImpl: Timeline service address: http://namenode1.rajesh.com:8188/ws/v1/timeline/
16/07/18 07:01:28 INFO client.RMProxy: Connecting to ResourceManager at namenode1.rajesh.com/192.168.0.100:8050
16/07/18 07:01:31 INFO mapreduce.JobSubmitter: number of splits:21
16/07/18 07:01:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1468835763123_0002
16/07/18 07:01:33 INFO impl.YarnClientImpl: Submitted application application_1468835763123_0002
16/07/18 07:01:33 INFO mapreduce.Job: The url to track the job: http://namenode1.rajesh.com:8088/proxy/application_1468835763123_0002/
16/07/18 07:01:33 INFO tools.DistCp: DistCp job-id: job_1468835763123_0002
16/07/18 07:01:33 INFO mapreduce.Job: Running job: job_1468835763123_0002
16/07/18 07:01:57 INFO mapreduce.Job: Job job_1468835763123_0002 running in uber mode : false
16/07/18 07:01:57 INFO mapreduce.Job: map 0% reduce 0%
16/07/18 07:02:29 INFO mapreduce.Job: map 1% reduce 0%
16/07/18 07:02:32 INFO mapreduce.Job: map 2% reduce 0%
16/07/18 07:02:37 INFO mapreduce.Job: map 3% reduce 0%
16/07/18 07:02:38 INFO mapreduce.Job: map 4% reduce 0%
16/07/18 07:02:42 INFO mapreduce.Job: map 5% reduce 0%
16/07/18 07:02:45 INFO mapreduce.Job: map 6% reduce 0%
16/07/18 07:02:48 INFO mapreduce.Job: map 7% reduce 0%
16/07/18 07:02:51 INFO mapreduce.Job: map 8% reduce 0%
16/07/18 07:02:53 INFO mapreduce.Job: map 9% reduce 0%
16/07/18 07:02:54 INFO mapreduce.Job: map 10% reduce 0%
16/07/18 07:02:57 INFO mapreduce.Job: map 11% reduce 0%
16/07/18 07:03:00 INFO mapreduce.Job: map 12% reduce 0%
16/07/18 07:03:01 INFO mapreduce.Job: map 13% reduce 0%
16/07/18 07:03:03 INFO mapreduce.Job: map 14% reduce 0%
16/07/18 07:03:26 INFO mapreduce.Job: map 15% reduce 0%
16/07/18 07:03:31 INFO mapreduce.Job: map 16% reduce 0%
16/07/18 07:03:37 INFO mapreduce.Job: map 17% reduce 0%
16/07/18 07:03:40 INFO mapreduce.Job: map 18% reduce 0%
16/07/18 07:03:46 INFO mapreduce.Job: map 19% reduce 0%
16/07/18 07:03:49 INFO mapreduce.Job: map 20% reduce 0%
16/07/18 07:03:52 INFO mapreduce.Job: map 21% reduce 0%
16/07/18 07:03:55 INFO mapreduce.Job: map 22% reduce 0%
16/07/18 07:03:58 INFO mapreduce.Job: map 23% reduce 0%
16/07/18 07:04:01 INFO mapreduce.Job: map 24% reduce 0%
16/07/18 07:04:02 INFO mapreduce.Job: map 25% reduce 0%
16/07/18 07:04:05 INFO mapreduce.Job: map 26% reduce 0%
16/07/18 07:04:08 INFO mapreduce.Job: map 27% reduce 0%
16/07/18 07:04:10 INFO mapreduce.Job: map 28% reduce 0%
16/07/18 07:04:12 INFO mapreduce.Job: map 29% reduce 0%
16/07/18 07:04:44 INFO mapreduce.Job: map 30% reduce 0%
16/07/18 07:04:47 INFO mapreduce.Job: map 31% reduce 0%
16/07/18 07:04:50 INFO mapreduce.Job: map 32% reduce 0%
16/07/18 07:04:53 INFO mapreduce.Job: map 33% reduce 0%
16/07/18 07:04:56 INFO mapreduce.Job: map 34% reduce 0%
16/07/18 07:05:01 INFO mapreduce.Job: map 35% reduce 0%
16/07/18 07:05:02 INFO mapreduce.Job: map 36% reduce 0%
16/07/18 07:05:05 INFO mapreduce.Job: map 37% reduce 0%
16/07/18 07:05:08 INFO mapreduce.Job: map 38% reduce 0%
16/07/18 07:05:10 INFO mapreduce.Job: map 39% reduce 0%
16/07/18 07:05:13 INFO mapreduce.Job: map 40% reduce 0%
16/07/18 07:05:15 INFO mapreduce.Job: map 41% reduce 0%
16/07/18 07:05:18 INFO mapreduce.Job: map 42% reduce 0%
16/07/18 07:05:20 INFO mapreduce.Job: map 43% reduce 0%
16/07/18 07:05:51 INFO mapreduce.Job: map 44% reduce 0%
16/07/18 07:05:55 INFO mapreduce.Job: map 45% reduce 0%
16/07/18 07:05:58 INFO mapreduce.Job: map 46% reduce 0%
16/07/18 07:06:01 INFO mapreduce.Job: map 47% reduce 0%
16/07/18 07:06:04 INFO mapreduce.Job: map 48% reduce 0%
16/07/18 07:06:05 INFO mapreduce.Job: map 49% reduce 0%
16/07/18 07:06:08 INFO mapreduce.Job: map 50% reduce 0%
16/07/18 07:06:10 INFO mapreduce.Job: map 51% reduce 0%
16/07/18 07:06:13 INFO mapreduce.Job: map 52% reduce 0%
16/07/18 07:06:14 INFO mapreduce.Job: map 53% reduce 0%
16/07/18 07:06:17 INFO mapreduce.Job: map 54% reduce 0%
16/07/18 07:06:19 INFO mapreduce.Job: map 55% reduce 0%
16/07/18 07:06:22 INFO mapreduce.Job: map 56% reduce 0%
16/07/18 07:06:23 INFO mapreduce.Job: map 57% reduce 0%
16/07/18 07:06:52 INFO mapreduce.Job: map 58% reduce 0%
16/07/18 07:06:55 INFO mapreduce.Job: map 59% reduce 0%
16/07/18 07:06:59 INFO mapreduce.Job: map 60% reduce 0%
16/07/18 07:07:02 INFO mapreduce.Job: map 61% reduce 0%
16/07/18 07:07:05 INFO mapreduce.Job: map 62% reduce 0%
16/07/18 07:07:08 INFO mapreduce.Job: map 63% reduce 0%
16/07/18 07:07:11 INFO mapreduce.Job: map 64% reduce 0%
16/07/18 07:07:14 INFO mapreduce.Job: map 65% reduce 0%
16/07/18 07:07:16 INFO mapreduce.Job: map 66% reduce 0%
16/07/18 07:07:19 INFO mapreduce.Job: map 67% reduce 0%
16/07/18 07:07:20 INFO mapreduce.Job: map 68% reduce 0%
16/07/18 07:07:23 INFO mapreduce.Job: map 69% reduce 0%
16/07/18 07:07:27 INFO mapreduce.Job: map 70% reduce 0%
16/07/18 07:07:30 INFO mapreduce.Job: map 71% reduce 0%
16/07/18 07:07:50 INFO mapreduce.Job: map 72% reduce 0%
16/07/18 07:07:57 INFO mapreduce.Job: map 73% reduce 0%
16/07/18 07:08:03 INFO mapreduce.Job: map 74% reduce 0%
16/07/18 07:08:10 INFO mapreduce.Job: map 75% reduce 0%
16/07/18 07:08:13 INFO mapreduce.Job: map 76% reduce 0%
16/07/18 07:08:17 INFO mapreduce.Job: map 77% reduce 0%
16/07/18 07:08:20 INFO mapreduce.Job: map 78% reduce 0%
16/07/18 07:08:22 INFO mapreduce.Job: map 79% reduce 0%
16/07/18 07:08:25 INFO mapreduce.Job: map 80% reduce 0%
16/07/18 07:08:28 INFO mapreduce.Job: map 81% reduce 0%
16/07/18 07:08:31 INFO mapreduce.Job: map 82% reduce 0%
16/07/18 07:08:34 INFO mapreduce.Job: map 83% reduce 0%
16/07/18 07:08:39 INFO mapreduce.Job: map 84% reduce 0%
16/07/18 07:08:43 INFO mapreduce.Job: map 85% reduce 0%
16/07/18 07:08:55 INFO mapreduce.Job: map 86% reduce 0%
16/07/18 07:09:05 INFO mapreduce.Job: map 87% reduce 0%
16/07/18 07:09:12 INFO mapreduce.Job: map 88% reduce 0%
16/07/18 07:09:18 INFO mapreduce.Job: map 89% reduce 0%
16/07/18 07:09:21 INFO mapreduce.Job: map 90% reduce 0%
16/07/18 07:09:22 INFO mapreduce.Job: map 91% reduce 0%
16/07/18 07:09:24 INFO mapreduce.Job: map 92% reduce 0%
16/07/18 07:09:27 INFO mapreduce.Job: map 94% reduce 0%
16/07/18 07:09:30 INFO mapreduce.Job: map 95% reduce 0%
16/07/18 07:09:33 INFO mapreduce.Job: map 96% reduce 0%
16/07/18 07:09:34 INFO mapreduce.Job: map 97% reduce 0%
16/07/18 07:09:35 INFO mapreduce.Job: map 98% reduce 0%
16/07/18 07:09:38 INFO mapreduce.Job: map 99% reduce 0%
16/07/18 07:09:41 INFO mapreduce.Job: map 100% reduce 0%
16/07/18 07:09:43 INFO mapreduce.Job: Job job_1468835763123_0002 completed successfully
16/07/18 07:09:43 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=7228656
FILE: Number of bytes written=2873945
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=559335
HDFS: Number of bytes written=7228656
HDFS: Number of read operations=22093
HDFS: Number of large read operations=0
HDFS: Number of write operations=6308
Job Counters
Launched map tasks=21
Other local map tasks=21
Total time spent by all maps in occupied slots (ms)=1341918
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=1341918
Total vcore-seconds taken by all map tasks=1341918
Total megabyte-seconds taken by all map tasks=1374124032
Map-Reduce Framework
Map input records=3133
Map output records=0
Input split bytes=2457
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=9853
CPU time spent (ms)=240430
Physical memory (bytes) snapshot=4918603776
Virtual memory (bytes) snapshot=70847594496
Total committed heap usage (bytes)=3467640832
File Input Format Counters
Bytes Read=556878
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=7228656
BYTESEXPECTED=7228656
COPY=3133
Validating the files copied:
rajesh@namenode1:~/testfiles$ hadoop fs -ls /user/rajesh/testfiles/
Found 3132 items
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:06 /user/rajesh/testfiles/1.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:07 /user/rajesh/testfiles/10001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:04 /user/rajesh/testfiles/10001x_0001x_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:07 /user/rajesh/testfiles/10001x_0001xy2z.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001xy2z_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001x_0001xy3z.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:08 /user/rajesh/testfiles/10001xy1000z_0001x.txt
-rw-r--r-- 3 rajesh hdfs 2308 2016-07-18 07:05 /user/rajesh/testfiles/10001xy1000z_0001x_0001x.txt
Side note: Test File creation for this test
I have created this 3000+ files in my windows machine. Then scp to linux box.
Since it is a copy of the same files in windows the file names has blank space and "copy" word in every file name.
I used the rename command to rename the files which i found very useful for bulk rename operations like this.
rename -n 's/ copy/xyz[001]/' *.txt
Hi Rajesh,
ReplyDeleteAre you sure we can do distcp from local Linux machine to HDFS? I am trying the same you mentioned but i am getting error that
Caused by: org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException: java.io.FileNotFoundException: File file:/home/koti.karri/Sample/file1.txt does not exist
Can you please help where i am doing wrong?