hadoop yarn single node performance tuning -
i have hadoop 2.5.2 single mode installation on ubuntu vm, is: 4-core, 3ghz per core; 4g memory. vm not production, demo , learning.
then, wrote vey simple map-reduce application using python, , use application process 49 xmls. these xml files small-size, hundreds of lines each. so, expected quick process. but, big22 surprise me, took more 20 minutes finish job (the output of job correct.). below output metrics :
14/12/15 19:37:55 info client.rmproxy: connecting resourcemanager @ /0.0.0.0:8032
14/12/15 19:37:57 info client.rmproxy: connecting resourcemanager @ /0.0.0.0:8032
14/12/15 19:38:03 info mapred.fileinputformat: total input paths process : 49
14/12/15 19:38:06 info mapreduce.jobsubmitter: number of splits:49
14/12/15 19:38:08 info mapreduce.jobsubmitter: submitting tokens job: job_1418368500264_0005
14/12/15 19:38:10 info impl.yarnclientimpl: submitted application application_1418368500264_0005
14/12/15 19:38:10 info mapreduce.job: running job: job_1418368500264_0005
14/12/15 19:38:59 info mapreduce.job: job job_1418368500264_0005 running in uber mode : false
14/12/15 19:38:59 info mapreduce.job: map 0% reduce 0%
14/12/15 19:39:42 info mapreduce.job: map 2% reduce 0%
14/12/15 19:40:05 info mapreduce.job: map 4% reduce 0%
14/12/15 19:40:28 info mapreduce.job: map 6% reduce 0%
14/12/15 19:40:49 info mapreduce.job: map 8% reduce 0%
14/12/15 19:41:10 info mapreduce.job: map 10% reduce 0%
14/12/15 19:41:29 info mapreduce.job: map 12% reduce 0%
14/12/15 19:41:50 info mapreduce.job: map 14% reduce 0%
14/12/15 19:42:08 info mapreduce.job: map 16% reduce 0%
14/12/15 19:42:28 info mapreduce.job: map 18% reduce 0%
14/12/15 19:42:49 info mapreduce.job: map 20% reduce 0%
14/12/15 19:43:08 info mapreduce.job: map 22% reduce 0%
14/12/15 19:43:28 info mapreduce.job: map 24% reduce 0%
14/12/15 19:43:48 info mapreduce.job: map 27% reduce 0%
14/12/15 19:44:09 info mapreduce.job: map 29% reduce 0%
14/12/15 19:44:29 info mapreduce.job: map 31% reduce 0%
14/12/15 19:44:49 info mapreduce.job: map 33% reduce 0%
14/12/15 19:45:09 info mapreduce.job: map 35% reduce 0%
14/12/15 19:45:28 info mapreduce.job: map 37% reduce 0%
14/12/15 19:45:49 info mapreduce.job: map 39% reduce 0%
14/12/15 19:46:09 info mapreduce.job: map 41% reduce 0%
14/12/15 19:46:29 info mapreduce.job: map 43% reduce 0%
14/12/15 19:46:49 info mapreduce.job: map 45% reduce 0%
14/12/15 19:47:09 info mapreduce.job: map 47% reduce 0%
14/12/15 19:47:29 info mapreduce.job: map 49% reduce 0%
14/12/15 19:47:49 info mapreduce.job: map 51% reduce 0%
14/12/15 19:48:08 info mapreduce.job: map 53% reduce 0%
14/12/15 19:48:28 info mapreduce.job: map 55% reduce 0%
14/12/15 19:48:48 info mapreduce.job: map 57% reduce 0%
14/12/15 19:49:09 info mapreduce.job: map 59% reduce 0%
14/12/15 19:49:29 info mapreduce.job: map 61% reduce 0%
14/12/15 19:49:55 info mapreduce.job: map 63% reduce 0%
14/12/15 19:50:23 info mapreduce.job: map 65% reduce 0%
14/12/15 19:50:53 info mapreduce.job: map 67% reduce 0%
14/12/15 19:51:22 info mapreduce.job: map 69% reduce 0%
14/12/15 19:51:50 info mapreduce.job: map 71% reduce 0%
14/12/15 19:52:18 info mapreduce.job: map 73% reduce 0%
14/12/15 19:52:48 info mapreduce.job: map 76% reduce 0%
14/12/15 19:53:18 info mapreduce.job: map 78% reduce 0%
14/12/15 19:53:48 info mapreduce.job: map 80% reduce 0%
14/12/15 19:54:18 info mapreduce.job: map 82% reduce 0%
14/12/15 19:54:48 info mapreduce.job: map 84% reduce 0%
14/12/15 19:55:19 info mapreduce.job: map 86% reduce 0%
14/12/15 19:55:48 info mapreduce.job: map 88% reduce 0%
14/12/15 19:56:16 info mapreduce.job: map 90% reduce 0%
14/12/15 19:56:44 info mapreduce.job: map 92% reduce 0%
14/12/15 19:57:14 info mapreduce.job: map 94% reduce 0%
14/12/15 19:57:45 info mapreduce.job: map 96% reduce 0%
14/12/15 19:58:15 info mapreduce.job: map 98% reduce 0%
14/12/15 19:58:46 info mapreduce.job: map 100% reduce 0%
14/12/15 19:59:20 info mapreduce.job: map 100% reduce 100%
14/12/15 19:59:28 info mapreduce.job: job job_1418368500264_0005 completed successfully
14/12/15 19:59:30 info mapreduce.job: counters: 49
file system counters
file: number of bytes read=17856
file: number of bytes written=5086434
file: number of read operations=0
file: number of large read operations=0
file: number of write operations=0
hdfs: number of bytes read=499030
hdfs: number of bytes written=10049
hdfs: number of read operations=150
hdfs: number of large read operations=0
hdfs: number of write operations=2
job counters
launched map tasks=49
launched reduce tasks=1
data-local map tasks=49
total time spent maps in occupied slots (ms)=8854232
total time spent reduces in occupied slots (ms)=284672
total time spent map tasks (ms)=1106779
total time spent reduce tasks (ms)=35584
total vcore-seconds taken map tasks=1106779
total vcore-seconds taken reduce tasks=35584
total megabyte-seconds taken map tasks=1133341696
total megabyte-seconds taken reduce tasks=36438016
map-reduce framework
map input records=9352
map output records=296
map output bytes=17258
map output materialized bytes=18144
input split bytes=6772
combine input records=0
combine output records=0
reduce input groups=53
reduce shuffle bytes=18144
reduce input records=296
reduce output records=52
spilled records=592
shuffled maps =49
failed shuffles=0
merged map outputs=49
gc time elapsed (ms)=33590
cpu time spent (ms)=191390
physical memory (bytes) snapshot=13738057728
virtual memory (bytes) snapshot=66425016320
total committed heap usage (bytes)=10799808512
shuffle errors
bad_id=0
connection=0
io_error=0
wrong_length=0
wrong_map=0
wrong_reduce=0
file input format counters
bytes read=492258
file output format counters
bytes written=10049
14/12/15 19:59:30 info streaming.streamjob: output directory: /data_output/sb50projs_1_output
as newbie hadoop, crazy unreasonable performance, have several questions:
- how configure hadoop/yarn/mapreduce make whole environment more convenient trial usage?
i understand hadoop designed huge-data , big files. trial environment, files small , data limited, default configuration items should change? have changed "dfs.blocksize" of hdfs-site.xml smaller value match small files, seems no big enhancements. know there jvm configuration items in yarn-site.xml , mapred-site.xml, not sure how adjust them.
- how read hadoop logs
under logs folder, there separate log files nodemanager/resourcemanager/namenode/datanode. tried read these files understand how 20 minutes spent during process, it's not easy newbie me. wonder there tool/ui me analyze logs.
- basic performance tuning tools
actually have googled around question, , got bunch of names ganglia/nagios/vaidya/ambari. want know, tool best analyse issue , "why took 20 minutes such simple job?".
- big number of hadoop processes
even if there no job running on hadoop, found around 100 hadoop processes on vm, below (i using htop, , sort result memory). normal hadoop ? or incorrect environment configuration?
- you don't have change anything.
the default configuration done small environment. may change if grow environment. ant there lot of params , lot of time fine tuning.
but admit configuration smaller usual ones tests.
- the log have read isn't services ones job ones. find them in /var/log/hadoop-yarn/containers/
if want better view of mr, use web interface on http://127.0.0.1:8088/
. see job's progression in real time.
imo, basic tuning = use hadoop web interfaces. there plenty available natively.
i think find problem. can nomal, or not.
but quickly, yarn launch mr use available memory :
- available memory set in yarn-site.xml : yarn.nodemanager.resource.memory-mb (default 8 gio).
- memory task defined in mapred-site.xml or in task property : mapreduce.map.memory.mb (default 1536 mio)
so :
- change available memory nodemanager (to 3gio, in order let 1 gio system)
- change memory available hadoop services (-xmx in hadoop-env.sh, yarn-env.sh) (system + each hadoop services (namenode / datanode / ressourcemanager / nodemanager) < 1 gio.
- change memory map tasks (512 mio ?). lesser is, more task can executed in same time.
- change yarn.scheduler.minimum-allocation-mb 512 in yarn-site.xml allow mappers less 1 gio of memory.
i hope you.
Comments
Post a Comment