Hadoopクラスタの統計情報の取得方法
Hadoopクラスタでメトリックス情報(統計情報)を外部サーバーから取得する方法は幾つかありますが、以下ではmetricsサーブレットを有効化する方法について解説します。
Hadoopクラスタの各ノードの/etc/hadoop/conf/hadoop-metrics.propertiesに以下の情報を追記します(既にパラメータが定義されている場合は置き換えます)。
dfs.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
dfs.period=10
mapred.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
mapred.period=10
jvm.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
jvm.period=10
rpc.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
rpc.period=10
ugi.class=org.apache.hadoop.metrics.spi.NullContext
その後、Hadoopクラスタ全体でHadoop関連のサービスを再起動させます。
これにより、外部からのHTTP通信でmetrics情報が取得できるようになります。具体的には、以下のような形式で取れます(以下、CDH4.2のTaskTrackerの例)。
$ curl http://example.com:50060/metrics
jvm
metrics
{hostName=tasktest,processName=TaskTracker,sessionId=}:
gcCount=358419
gcTimeMillis=685173
logError=0
logFatal=0
logInfo=12325
logWarn=690
maxMemoryM=888.9375
memHeapCommittedM=7.6875
memHeapUsedM=5.921051
memNonHeapCommittedM=32.0
memNonHeapUsedM=31.877632
threadsBlocked=0
threadsNew=0
threadsRunnable=6
threadsTerminated=0
threadsTimedWaiting=9
threadsWaiting=36
mapred
shuffleOutput
{hostName=tasktest,sessionId=}:
shuffle_exceptions_caught=0
shuffle_failed_outputs=0
shuffle_handler_busy_percent=0.0
shuffle_output_bytes=188662242
shuffle_success_outputs=929
tasktracker
{hostName=tasktest,sessionId=}:
failedDirs=0
mapTaskSlots=12
maps_running=0
reduceTaskSlots=8
reduces_running=0
tasks_completed=670
tasks_failed_ping=0
tasks_failed_timeout=0
これだと、ちょっと整形しにくいので、JSON形式で取得することも可能です。
$ curl http://tasktest:50060/metrics?format=json
{"jvm":{"metrics":[[{"hostName":"tasktest","processName":"TaskTracker","sessionId":""},{"gcCount":358436,"gcTimeMillis":685198,"logError":0,"logFatal":0,"logInfo":12325,"logWarn":690,"maxMemoryM":888.9375,"memHeapCommittedM":7.625,"memHeapUsedM":7.1313553,"memNonHeapCommittedM":32.0,"memNonHeapUsedM":31.877632,"threadsBlocked":0,"threadsNew":0,"threadsRunnable":6,"threadsTerminated":0,"threadsTimedWaiting":9,"threadsWaiting":36}]]},"mapred":{"shuffleOutput":[[{"hostName":"tasktest","sessionId":""},{"shuffle_exceptions_caught":0,"shuffle_failed_outputs":0,"shuffle_handler_busy_percent":0.0,"shuffle_output_bytes":188662242,"shuffle_success_outputs":929}]],"tasktracker":[[{"hostName":"tasktest","sessionId":""},{"failedDirs":0,"mapTaskSlots":12,"maps_running":0,"reduceTaskSlots":8,"reduces_running":0,"tasks_completed":670,"tasks_failed_ping":0,"tasks_failed_timeout":0}]]}}
JSON形式ならば、PerlやPHPなどで扱うのも楽になりますね。
Hadoopメトリックスで取得できる値
Hadoopメトリックスでは以下の値を取得することができます。
namenode
dfs.namenode.AddBlockOps
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.FilesRenamed
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesInGetListingOps
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops
dfs.namenode.fsImageLoadTime
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops>
HDFS
dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB
dfs.FSNamesystem.CorruptBlocks
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks>
DataNode
dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops
dfs.datanode.readBlockOp_avg_time
dfs.datanode.readBlockOp_num_ops
dfs.datanode.reads_from_local_client
dfs.datanode.reads_from_remote_client
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.volumeFailures
dfs.datanode.writeBlockOp_avg_time
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client>
mapred.jobtracker
mapred.jobtracker.blacklisted_maps
mapred.jobtracker.blacklisted_reduces
mapred.jobtracker.heartbeats
mapred.jobtracker.jobs_completed
mapred.jobtracker.jobs_failed
mapred.jobtracker.jobs_killed
mapred.jobtracker.jobs_preparing
mapred.jobtracker.jobs_running
mapred.jobtracker.jobs_submitted
mapred.jobtracker.map_slots
mapred.jobtracker.maps_completed
mapred.jobtracker.maps_failed
mapred.jobtracker.maps_killed
mapred.jobtracker.maps_launched
mapred.jobtracker.occupied_map_slots
mapred.jobtracker.occupied_reduce_slots
mapred.jobtracker.reduce_slots
mapred.jobtracker.reduces_completed
mapred.jobtracker.reduces_failed
mapred.jobtracker.reduces_killed
mapred.jobtracker.reduces_launched
mapred.jobtracker.reserved_map_slots
mapred.jobtracker.reserved_reduce_slots
mapred.jobtracker.running_maps
mapred.jobtracker.running_reduces
mapred.jobtracker.trackers
mapred.jobtracker.trackers_blacklisted
mapred.jobtracker.trackers_decommissioned
mapred.jobtracker.waiting_maps
mapred.jobtracker.waiting_reduces
mapred.tasktracker
mapred.tasktracker.mapTaskSlots
mapred.tasktracker.maps_running
mapred.tasktracker.reduceTaskSlots
mapred.tasktracker.reduces_running
mapred.tasktracker.tasks_completed
mapred.tasktracker.tasks_failed_ping
mapred.tasktracker.tasks_failed_timeout
mapred.shuffleOutput
mapred.shuffleOutput.shuffle_failed_outputs
mapred.shuffleOutput.shuffle_handler_busy_percent
mapred.shuffleOutput.shuffle_output_bytes
mapred.shuffleOutput.shuffle_success_outputs
jvm.metrics namenode
jvm.NameNode.metrics.gcCount
jvm.NameNode.metrics.gcTimeMillis
jvm.NameNode.metrics.logError
jvm.NameNode.metrics.logFatal
jvm.NameNode.metrics.logInfo
jvm.NameNode.metrics.logWarn
jvm.NameNode.metrics.maxMemoryM
jvm.NameNode.metrics.memHeapCommittedM
jvm.NameNode.metrics.memHeapUsedM
jvm.NameNode.metrics.memNonHeapCommittedM
jvm.NameNode.metrics.memNonHeapUsedM
jvm.NameNode.metrics.threadsBlocked
jvm.NameNode.metrics.threadsNew
jvm.NameNode.metrics.threadsRunnable
jvm.NameNode.metrics.threadsTerminated
jvm.NameNode.metrics.threadsTimedWaiting
jvm.NameNode.metrics.threadsWaiting>
jvm.metrics datanode>
jvm.DataNode.metrics.gcCount
jvm.DataNode.metrics.gcTimeMillis
jvm.DataNode.metrics.logError
jvm.DataNode.metrics.logFatal
jvm.DataNode.metrics.logInfo
jvm.DataNode.metrics.logWarn
jvm.DataNode.metrics.maxMemoryM
jvm.DataNode.metrics.memHeapCommittedM
jvm.DataNode.metrics.memHeapUsedM
jvm.DataNode.metrics.memNonHeapCommittedM
jvm.DataNode.metrics.memNonHeapUsedM
jvm.DataNode.metrics.threadsBlocked
jvm.DataNode.metrics.threadsNew
jvm.DataNode.metrics.threadsRunnable
jvm.DataNode.metrics.threadsTerminated
jvm.DataNode.metrics.threadsTimedWaiting
jvm.DataNode.metrics.threadsWaiting
jvm.metrics JobTracker
jvm.JobTracker.metrics.gcCount
jvm.JobTracker.metrics.gcTimeMillis
jvm.JobTracker.metrics.logError
jvm.JobTracker.metrics.logFatal
jvm.JobTracker.metrics.logInfo
jvm.JobTracker.metrics.logWarn
jvm.JobTracker.metrics.maxMemoryM
jvm.JobTracker.metrics.memHeapCommittedM
jvm.JobTracker.metrics.memHeapUsedM
jvm.JobTracker.metrics.memNonHeapCommittedM
jvm.JobTracker.metrics.memNonHeapUsedM
jvm.JobTracker.metrics.threadsBlocked
jvm.JobTracker.metrics.threadsNew
jvm.JobTracker.metrics.threadsRunnable
jvm.JobTracker.metrics.threadsTerminated
jvm.JobTracker.metrics.threadsTimedWaiting
jvm.JobTracker.metrics.threadsWaiting
jvm.metrics TaskTracker
jvm.TaskTracker.metrics.gcCount
jvm.TaskTracker.metrics.gcTimeMillis
jvm.TaskTracker.metrics.logError
jvm.TaskTracker.metrics.logFatal
jvm.TaskTracker.metrics.logInfo
jvm.TaskTracker.metrics.logWarn
jvm.TaskTracker.metrics.maxMemoryM
jvm.TaskTracker.metrics.memHeapCommittedM
jvm.TaskTracker.metrics.memHeapUsedM
jvm.TaskTracker.metrics.memNonHeapCommittedM
jvm.TaskTracker.metrics.memNonHeapUsedM
jvm.TaskTracker.metrics.threadsBlocked
jvm.TaskTracker.metrics.threadsNew
jvm.TaskTracker.metrics.threadsRunnable
jvm.TaskTracker.metrics.threadsTerminated
jvm.TaskTracker.metrics.threadsTimedWaiting
jvm.TaskTracker.metrics.threadsWaiting
jvm.metrics Secondary Namenode
jvm.SecondaryNameNode.metrics.gcCount
jvm.SecondaryNameNode.metrics.gcTimeMillis
jvm.SecondaryNameNode.metrics.logError
jvm.SecondaryNameNode.metrics.logFatal
jvm.SecondaryNameNode.metrics.logInfo
jvm.SecondaryNameNode.metrics.logWarn
jvm.SecondaryNameNode.metrics.maxMemoryM
jvm.SecondaryNameNode.metrics.memHeapCommittedM
jvm.SecondaryNameNode.metrics.memHeapUsedM
jvm.SecondaryNameNode.metrics.memNonHeapCommittedM
jvm.SecondaryNameNode.metrics.memNonHeapUsedM
jvm.SecondaryNameNode.metrics.threadsBlocked
jvm.SecondaryNameNode.metrics.threadsNew
jvm.SecondaryNameNode.metrics.threadsRunnable
jvm.SecondaryNameNode.metrics.threadsTerminated
jvm.SecondaryNameNode.metrics.threadsTimedWaiting
jvm.SecondaryNameNode.metrics.threadsWaiting
rpc.metrics Namenode
rpc.metrics.NumOpenConnections
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses
rpc.metrics Datanode
rpc.metrics.NumOpenConnections
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses
rpc.metrics JobTracker
rpc.metrics.NumOpenConnections
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes (送信バイト数)
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses
rpc.metrics TaskTracker
rpc.metrics.NumOpenConnections
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses
rpc.metrics detail
rpc.detailed-metrics.blockReceived_num_opts
rpc.detailed-metrics.canCommit_avg_time
rpc.detailed-metrics.canCommit_num_ops
rpc.detailed-metrics.commitPending_avg_time
rpc.detailed-metrics.commitPending_num_ops
rpc.detailed-metrics.done_avg_time
rpc.detailed-metrics.done_num_ops
rpc.detailed-metrics.getMapCompletionEvents_avg_time
rpc.detailed-metrics.getMapCompletionEvents_num_ops
rpc.detailed-metrics.getProtocolVersion_avg_time
rpc.detailed-metrics.getProtocolVersion_num_ops
rpc.detailed-metrics.getSystemDir_num_ops
rpc.detailed-metrics.getTask_avg_time
rpc.detailed-metrics.getTask_num_ops
rpc.detailed-metrics.ping_avg_time
rpc.detailed-metrics.ping_num_ops
rpc.detailed-metrics.rollFsImage_num_opts
rpc.detailed-metrics.statusUpdate_avg_time
rpc.detailed-metrics.statusUpdate_num_ops
HBaseのメトリックス取得
HBaseも同様にメトリックス情報を取得することができます。/etc/hbase/conf/hadoop-metrics.propertiesに以下を追記します。
# Configuration of the "hbase" context for null
hbase.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
hbase.period=60
# Configuration of the "jvm" context for null
jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
jvm.period=60
# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
rpc.period=60
これでHBaseに関する統計情報も取得できるようになります。
hbase.regionserver.blockCacheCount
hbase.regionserver.blockCacheFree
hbase.regionserver.blockCacheHitRatio
hbase.regionserver.blockCacheSize
hbase.regionserver.fsReadLatency_avg_time
hbase.regionserver.fsReadLatency_num_ops
hbase.regionserver.fsSyncLatency_avg_time
hbase.regionserver.fsSyncLatency_num_ops
hbase.regionserver.fsWriteLatency_avg_time
hbase.regionserver.fsWriteLatency_num_ops
hbase.regionserver.memstoreSizeMB
hbase.regionserver.regions
hbase.regionserver.requests
hbase.regionserver.storeFileIndexSizeMB
hbase.regionserver.stores
参考:【 #hadoopAC11jp 】10日目・HBaseのメトリクス取得でハマった件とJSONでのメトリクス取得、Hadoop メトリクス一覧と統計情報の出力、HBaseのメトリックス