Hadoopクラスタでメトリックス情報を取得する

Hadoopクラスタの統計情報の取得方法

Hadoopクラスタでメトリックス情報(統計情報)を外部サーバーから取得する方法は幾つかありますが、以下ではmetricsサーブレットを有効化する方法について解説します。

Hadoopクラスタの各ノードの/etc/hadoop/conf/hadoop-metrics.propertiesに以下の情報を追記します(既にパラメータが定義されている場合は置き換えます)。

dfs.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
dfs.period=10
mapred.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
mapred.period=10
jvm.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
jvm.period=10
rpc.class=org.apache.hadoop.metrics.spi.NoEmitMetricsContext
rpc.period=10
ugi.class=org.apache.hadoop.metrics.spi.NullContext

その後、Hadoopクラスタ全体でHadoop関連のサービスを再起動させます。

これにより、外部からのHTTP通信でmetrics情報が取得できるようになります。具体的には、以下のような形式で取れます(以下、CDH4.2のTaskTrackerの例)。

$ curl http://example.com:50060/metrics
jvm
  metrics
    {hostName=tasktest,processName=TaskTracker,sessionId=}:
      gcCount=358419
      gcTimeMillis=685173
      logError=0
      logFatal=0
      logInfo=12325
      logWarn=690
      maxMemoryM=888.9375
      memHeapCommittedM=7.6875
      memHeapUsedM=5.921051
      memNonHeapCommittedM=32.0
      memNonHeapUsedM=31.877632
      threadsBlocked=0
      threadsNew=0
      threadsRunnable=6
      threadsTerminated=0
      threadsTimedWaiting=9
      threadsWaiting=36
mapred
  shuffleOutput
    {hostName=tasktest,sessionId=}:
      shuffle_exceptions_caught=0
      shuffle_failed_outputs=0
      shuffle_handler_busy_percent=0.0
      shuffle_output_bytes=188662242
      shuffle_success_outputs=929
  tasktracker
    {hostName=tasktest,sessionId=}:
      failedDirs=0
      mapTaskSlots=12
      maps_running=0
      reduceTaskSlots=8
      reduces_running=0
      tasks_completed=670
      tasks_failed_ping=0
      tasks_failed_timeout=0

これだと、ちょっと整形しにくいので、JSON形式で取得することも可能です。

$ curl http://tasktest:50060/metrics?format=json
{"jvm":{"metrics":[[{"hostName":"tasktest","processName":"TaskTracker","sessionId":""},{"gcCount":358436,"gcTimeMillis":685198,"logError":0,"logFatal":0,"logInfo":12325,"logWarn":690,"maxMemoryM":888.9375,"memHeapCommittedM":7.625,"memHeapUsedM":7.1313553,"memNonHeapCommittedM":32.0,"memNonHeapUsedM":31.877632,"threadsBlocked":0,"threadsNew":0,"threadsRunnable":6,"threadsTerminated":0,"threadsTimedWaiting":9,"threadsWaiting":36}]]},"mapred":{"shuffleOutput":[[{"hostName":"tasktest","sessionId":""},{"shuffle_exceptions_caught":0,"shuffle_failed_outputs":0,"shuffle_handler_busy_percent":0.0,"shuffle_output_bytes":188662242,"shuffle_success_outputs":929}]],"tasktracker":[[{"hostName":"tasktest","sessionId":""},{"failedDirs":0,"mapTaskSlots":12,"maps_running":0,"reduceTaskSlots":8,"reduces_running":0,"tasks_completed":670,"tasks_failed_ping":0,"tasks_failed_timeout":0}]]}}

JSON形式ならば、PerlやPHPなどで扱うのも楽になりますね。

Hadoopメトリックスで取得できる値

Hadoopメトリックスでは以下の値を取得することができます。

namenode

dfs.namenode.AddBlockOps
dfs.namenode.GetBlockLocations
dfs.namenode.GetListingOps
dfs.namenode.JournalTransactionsBatchedInSync
dfs.namenode.Transactions_avg_time
dfs.namenode.Transactions_num_ops
dfs.namenode.FilesRenamed
dfs.namenode.CreateFileOps
dfs.namenode.DeleteFileOps
dfs.namenode.FileInfoOps
dfs.namenode.FilesAppended
dfs.namenode.FilesCreated
dfs.namenode.FilesInGetListingOps
dfs.namenode.Syncs_avg_time
dfs.namenode.Syncs_num_ops 
dfs.namenode.fsImageLoadTime
dfs.namenode.blockReport_avg_time
dfs.namenode.blockReport_num_ops>

HDFS

dfs.FSDirectory.files_deleted
dfs.FSNamesystem.BlockCapacity
dfs.FSNamesystem.BlocksTotal
dfs.FSNamesystem.CapacityRemainingGB
dfs.FSNamesystem.CapacityTotalGB
dfs.FSNamesystem.CapacityUsedGB 
dfs.FSNamesystem.CorruptBlocks 
dfs.FSNamesystem.ExcessBlocks
dfs.FSNamesystem.FilesTotal 
dfs.FSNamesystem.MissingBlocks
dfs.FSNamesystem.PendingDeletionBlocks
dfs.FSNamesystem.PendingReplicationBlocks
dfs.FSNamesystem.ScheduledReplicationBlocks
dfs.FSNamesystem.TotalLoad
dfs.FSNamesystem.UnderReplicatedBlocks>

DataNode

dfs.datanode.blockChecksumOp_avg_time
dfs.datanode.blockChecksumOp_num_ops
dfs.datanode.blockReports_avg_time
dfs.datanode.blockReports_num_ops
dfs.datanode.block_verification_failures
dfs.datanode.blocks_read
dfs.datanode.blocks_removed
dfs.datanode.blocks_replicated
dfs.datanode.blocks_verified
dfs.datanode.blocks_written
dfs.datanode.bytes_read
dfs.datanode.bytes_written
dfs.datanode.copyBlockOp_avg_time
dfs.datanode.copyBlockOp_num_ops
dfs.datanode.heartBeats_avg_time
dfs.datanode.heartBeats_num_ops 
dfs.datanode.readBlockOp_avg_time 
dfs.datanode.readBlockOp_num_ops
dfs.datanode.reads_from_local_client 
dfs.datanode.reads_from_remote_client 
dfs.datanode.replaceBlockOp_avg_time
dfs.datanode.replaceBlockOp_num_ops
dfs.datanode.volumeFailures
dfs.datanode.writeBlockOp_avg_time 
dfs.datanode.writeBlockOp_num_ops
dfs.datanode.writes_from_local_client
dfs.datanode.writes_from_remote_client>

mapred.jobtracker

mapred.jobtracker.blacklisted_maps
mapred.jobtracker.blacklisted_reduces
mapred.jobtracker.heartbeats 
mapred.jobtracker.jobs_completed 
mapred.jobtracker.jobs_failed
mapred.jobtracker.jobs_killed
mapred.jobtracker.jobs_preparing
mapred.jobtracker.jobs_running
mapred.jobtracker.jobs_submitted
mapred.jobtracker.map_slots
mapred.jobtracker.maps_completed
mapred.jobtracker.maps_failed
mapred.jobtracker.maps_killed
mapred.jobtracker.maps_launched
mapred.jobtracker.occupied_map_slots 
mapred.jobtracker.occupied_reduce_slots
mapred.jobtracker.reduce_slots
mapred.jobtracker.reduces_completed 
mapred.jobtracker.reduces_failed
mapred.jobtracker.reduces_killed
mapred.jobtracker.reduces_launched
mapred.jobtracker.reserved_map_slots
mapred.jobtracker.reserved_reduce_slots
mapred.jobtracker.running_maps 
mapred.jobtracker.running_reduces
mapred.jobtracker.trackers
mapred.jobtracker.trackers_blacklisted
mapred.jobtracker.trackers_decommissioned
mapred.jobtracker.waiting_maps
mapred.jobtracker.waiting_reduces

mapred.tasktracker

mapred.tasktracker.mapTaskSlots 
mapred.tasktracker.maps_running
mapred.tasktracker.reduceTaskSlots
mapred.tasktracker.reduces_running 
mapred.tasktracker.tasks_completed
mapred.tasktracker.tasks_failed_ping
mapred.tasktracker.tasks_failed_timeout

mapred.shuffleOutput

mapred.shuffleOutput.shuffle_failed_outputs
mapred.shuffleOutput.shuffle_handler_busy_percent
mapred.shuffleOutput.shuffle_output_bytes
mapred.shuffleOutput.shuffle_success_outputs

jvm.metrics namenode

jvm.NameNode.metrics.gcCount 
jvm.NameNode.metrics.gcTimeMillis
jvm.NameNode.metrics.logError
jvm.NameNode.metrics.logFatal
jvm.NameNode.metrics.logInfo
jvm.NameNode.metrics.logWarn
jvm.NameNode.metrics.maxMemoryM
jvm.NameNode.metrics.memHeapCommittedM
jvm.NameNode.metrics.memHeapUsedM 
jvm.NameNode.metrics.memNonHeapCommittedM
jvm.NameNode.metrics.memNonHeapUsedM
jvm.NameNode.metrics.threadsBlocked
jvm.NameNode.metrics.threadsNew
jvm.NameNode.metrics.threadsRunnable
jvm.NameNode.metrics.threadsTerminated
jvm.NameNode.metrics.threadsTimedWaiting
jvm.NameNode.metrics.threadsWaiting>

jvm.metrics datanode>

jvm.DataNode.metrics.gcCount 
jvm.DataNode.metrics.gcTimeMillis
jvm.DataNode.metrics.logError 
jvm.DataNode.metrics.logFatal
jvm.DataNode.metrics.logInfo
jvm.DataNode.metrics.logWarn
jvm.DataNode.metrics.maxMemoryM
jvm.DataNode.metrics.memHeapCommittedM
jvm.DataNode.metrics.memHeapUsedM 
jvm.DataNode.metrics.memNonHeapCommittedM
jvm.DataNode.metrics.memNonHeapUsedM
jvm.DataNode.metrics.threadsBlocked
jvm.DataNode.metrics.threadsNew
jvm.DataNode.metrics.threadsRunnable 
jvm.DataNode.metrics.threadsTerminated
jvm.DataNode.metrics.threadsTimedWaiting
jvm.DataNode.metrics.threadsWaiting

jvm.metrics JobTracker

jvm.JobTracker.metrics.gcCount 
jvm.JobTracker.metrics.gcTimeMillis
jvm.JobTracker.metrics.logError 
jvm.JobTracker.metrics.logFatal
jvm.JobTracker.metrics.logInfo
jvm.JobTracker.metrics.logWarn
jvm.JobTracker.metrics.maxMemoryM
jvm.JobTracker.metrics.memHeapCommittedM
jvm.JobTracker.metrics.memHeapUsedM 
jvm.JobTracker.metrics.memNonHeapCommittedM
jvm.JobTracker.metrics.memNonHeapUsedM
jvm.JobTracker.metrics.threadsBlocked
jvm.JobTracker.metrics.threadsNew
jvm.JobTracker.metrics.threadsRunnable 
jvm.JobTracker.metrics.threadsTerminated
jvm.JobTracker.metrics.threadsTimedWaiting
jvm.JobTracker.metrics.threadsWaiting

jvm.metrics TaskTracker

jvm.TaskTracker.metrics.gcCount 
jvm.TaskTracker.metrics.gcTimeMillis
jvm.TaskTracker.metrics.logError 
jvm.TaskTracker.metrics.logFatal
jvm.TaskTracker.metrics.logInfo
jvm.TaskTracker.metrics.logWarn
jvm.TaskTracker.metrics.maxMemoryM
jvm.TaskTracker.metrics.memHeapCommittedM
jvm.TaskTracker.metrics.memHeapUsedM 
jvm.TaskTracker.metrics.memNonHeapCommittedM
jvm.TaskTracker.metrics.memNonHeapUsedM
jvm.TaskTracker.metrics.threadsBlocked
jvm.TaskTracker.metrics.threadsNew
jvm.TaskTracker.metrics.threadsRunnable 
jvm.TaskTracker.metrics.threadsTerminated
jvm.TaskTracker.metrics.threadsTimedWaiting
jvm.TaskTracker.metrics.threadsWaiting

jvm.metrics Secondary Namenode

jvm.SecondaryNameNode.metrics.gcCount 
jvm.SecondaryNameNode.metrics.gcTimeMillis
jvm.SecondaryNameNode.metrics.logError 
jvm.SecondaryNameNode.metrics.logFatal
jvm.SecondaryNameNode.metrics.logInfo
jvm.SecondaryNameNode.metrics.logWarn
jvm.SecondaryNameNode.metrics.maxMemoryM
jvm.SecondaryNameNode.metrics.memHeapCommittedM
jvm.SecondaryNameNode.metrics.memHeapUsedM
jvm.SecondaryNameNode.metrics.memNonHeapCommittedM
jvm.SecondaryNameNode.metrics.memNonHeapUsedM
jvm.SecondaryNameNode.metrics.threadsBlocked
jvm.SecondaryNameNode.metrics.threadsNew
jvm.SecondaryNameNode.metrics.threadsRunnable 
jvm.SecondaryNameNode.metrics.threadsTerminated
jvm.SecondaryNameNode.metrics.threadsTimedWaiting
jvm.SecondaryNameNode.metrics.threadsWaiting

rpc.metrics Namenode

rpc.metrics.NumOpenConnections 
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes 
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses 
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses

rpc.metrics Datanode

rpc.metrics.NumOpenConnections 
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes 
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses 
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses

rpc.metrics JobTracker

rpc.metrics.NumOpenConnections 
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes (送信バイト数)
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses 
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses

rpc.metrics TaskTracker

rpc.metrics.NumOpenConnections 
rpc.metrics.ReceivedBytes
rpc.metrics.RpcProcessingTime_avg_time
rpc.metrics.RpcProcessingTime_num_ops
rpc.metrics.RpcQueueTime_avg_time
rpc.metrics.RpcQueueTime_num_ops
rpc.metrics.SentBytes 
rpc.metrics.callQueueLen
rpc.metrics.rpcAuthenticationFailures
rpc.metrics.rpcAuthenticationSuccesses
rpc.metrics.rpcAuthorizationFailures
rpc.metrics.rpcAuthorizationSuccesses

rpc.metrics detail

rpc.detailed-metrics.blockReceived_num_opts
rpc.detailed-metrics.canCommit_avg_time
rpc.detailed-metrics.canCommit_num_ops
rpc.detailed-metrics.commitPending_avg_time
rpc.detailed-metrics.commitPending_num_ops
rpc.detailed-metrics.done_avg_time
rpc.detailed-metrics.done_num_ops
rpc.detailed-metrics.getMapCompletionEvents_avg_time
rpc.detailed-metrics.getMapCompletionEvents_num_ops
rpc.detailed-metrics.getProtocolVersion_avg_time 
rpc.detailed-metrics.getProtocolVersion_num_ops
rpc.detailed-metrics.getSystemDir_num_ops 
rpc.detailed-metrics.getTask_avg_time
rpc.detailed-metrics.getTask_num_ops
rpc.detailed-metrics.ping_avg_time
rpc.detailed-metrics.ping_num_ops
rpc.detailed-metrics.rollFsImage_num_opts 
rpc.detailed-metrics.statusUpdate_avg_time
rpc.detailed-metrics.statusUpdate_num_ops

HBaseのメトリックス取得

HBaseも同様にメトリックス情報を取得することができます。/etc/hbase/conf/hadoop-metrics.propertiesに以下を追記します。

# Configuration of the "hbase" context for null
hbase.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
hbase.period=60

# Configuration of the "jvm" context for null
jvm.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
jvm.period=60

# Configuration of the "rpc" context for null
rpc.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread
rpc.period=60

これでHBaseに関する統計情報も取得できるようになります。

hbase.regionserver.blockCacheCount
hbase.regionserver.blockCacheFree
hbase.regionserver.blockCacheHitRatio
hbase.regionserver.blockCacheSize
hbase.regionserver.fsReadLatency_avg_time
hbase.regionserver.fsReadLatency_num_ops
hbase.regionserver.fsSyncLatency_avg_time
hbase.regionserver.fsSyncLatency_num_ops
hbase.regionserver.fsWriteLatency_avg_time
hbase.regionserver.fsWriteLatency_num_ops
hbase.regionserver.memstoreSizeMB
hbase.regionserver.regions
hbase.regionserver.requests
hbase.regionserver.storeFileIndexSizeMB
hbase.regionserver.stores

参考:【 #hadoopAC11jp 】10日目・HBaseのメトリクス取得でハマった件とJSONでのメトリクス取得Hadoop メトリクス一覧と統計情報の出力HBaseのメトリックス