环境安装

[TOC]

版本选择

flume-ng-1.6.0-cdh5.8.0.tar.gz
hadoop-2.6.0-cdh5.8.0.tar.gz
hbase-1.2.0-cdh5.8.0.tar.gz
hbase-solr-1.5-cdh5.8.0.tar.gz
hive-1.1.0-cdh5.8.0.tar.gz
hue-3.9.0-cdh5.8.0.tar.gz
oozie-4.1.0-cdh5.8.0.tar.gz
pig-0.12.0-cdh5.8.0.tar.gz
solr-4.10.3-cdh5.8.0.tar.gz
spark-1.6.0-cdh5.8.0.tar.gz
sqoop-1.4.6-cdh5.8.0.tar.gz
sqoop2-1.99.5-cdh5.8.0.tar.gz
zookeeper-3.4.5-cdh5.8.0.tar.gz

准备

10.19.138.198   thadoop-uelrcx-host1                     namenode              resourcemanager     hmaster        hiveserver2        master
10.19.134.88    thadoop-uelrcx-host2    zk journalnode   namenode/datanode  nodemanager         regionserver                    worker
10.19.164.182   thadoop-uelrcx-host3    zk journalnode   datanode              nodemanager         regionserver                    worker
10.19.78.105    thadoop-uelrcx-host4    zk journalnode   datanode              nodemanager         regionserver                    worker

设置免密码登录

zookeeper 集群安装配置

修改zk配置文件

mv zoo_sample.cfg zoo.cfg

修改zk参数

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
server.2=thadoop-uelrcx-host2:2888:3888
server.3=thadoop-uelrcx-host3:2888:3888
server.4=thadoop-uelrcx-host4:2888:3888

拷贝zk到thadoop-uelrcx-host2，thadoop-uelrcx-host3，thadoop-uelrcx-host3三台机器上

在三台机器的/tmp/zookeeper下创建myid, 并分别填充2,3,4

分别启动zk实例

bin/zkServer.sh start

验证是否启动成功

bin/zkServer.sh status

连接zk服务

bin/zkCli.sh -server *********

hadoop 安装

修改core-site.xml

fs.defaultFShdfs://thadoopclusterio.file.buffer.size131072ha.zookeeper.quorumthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181hadoop.tmp.dir/data/hadoop/tmpAbase for other temporary directories.hadoop.proxyuser.hadoop.hosts*hadoop.proxyuser.hadoop.groups*

修改hdfs-site.xml

dfs.nameservicesthadoopclusterdfs.ha.namenodes.thadoopclusternn1, nn2dfs.namenode.rpc-address.thadoopcluster.nn1thadoop-uelrcx-host1:9000dfs.namenode.rpc-address.thadoopcluster.nn2thadoop-uelrcx-host2:9000dfs.namenode.http-address.thadoopcluster.nn1thadoop-uelrcx-host1:50070dfs.namenode.http-address.thadoopcluster.nn2thadoop-uelrcx-host2:50070dfs.namenode.shared.edits.dirqjournal://thadoop-uelrcx-host2:8485;thadoop-uelrcx-host3:8485;thadoop-uelrcx-host4:8485/thadoopclusterdfs.client.failover.proxy.provider.thadoopclusterorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderdfs.ha.fencing.methodssshfencedfs.ha.fencing.ssh.private-key-files/home/hadoop/.ssh/id_rsadfs.journalnode.edits.dir/data/hadoop/journal/node/local/datadfs.ha.automatic-failover.enabledtruedfs.namenode.name.dir/data/hadoop/namenodedfs.datanode.data.dir/data/hadoop/datanodedfs.replication3dfs.webhdfs.enabledtrue

修改mapred-site.xml

mapreduce.framework.nameyarn

修改yarn-site.xml

yarn.resourcemanager.connect.retry-interval.ms2000yarn.resourcemanager.ha.enabledtrueyarn.resourcemanager.ha.rm-idsrm1,rm2yarn.resourcemanager.ha.automatic-failover.enabledtrueyarn.resourcemanager.hostname.rm1thadoop-uelrcx-host1yarn.resourcemanager.hostname.rm2thadoop-uelrcx-host2yarn.resourcemanager.ha.idrm1If we want to launch more than one RM in single node, we need this configurationyarn.resourcemanager.recovery.enabledtrueyarn.resourcemanager.zk-state-store.addressthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181yarn.resourcemanager.store.classorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStoreyarn.resourcemanager.zk-addressthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181yarn.resourcemanager.cluster-idthadoopcluster-yarnyarn.app.mapreduce.am.scheduler.connection.wait.interval-ms5000yarn.resourcemanager.address.rm1thadoop-uelrcx-host1:8132yarn.resourcemanager.scheduler.address.rm1thadoop-uelrcx-host1:8130yarn.resourcemanager.webapp.address.rm1thadoop-uelrcx-host1:8188yarn.resourcemanager.resource-tracker.address.rm1thadoop-uelrcx-host1:8131yarn.resourcemanager.admin.address.rm1thadoop-uelrcx-host1:8033yarn.resourcemanager.ha.admin.address.rm1thadoop-uelrcx-host1:23142yarn.resourcemanager.address.rm2thadoop-uelrcx-host2:8132yarn.resourcemanager.scheduler.address.rm2thadoop-uelrcx-host2:8130yarn.resourcemanager.webapp.address.rm2thadoop-uelrcx-host2:8188yarn.resourcemanager.resource-tracker.address.rm2thadoop-uelrcx-host2:8131yarn.resourcemanager.admin.address.rm2thadoop-uelrcx-host2:8033yarn.resourcemanager.ha.admin.address.rm2thadoop-uelrcx-host2:23142yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.aux-services.mapreduce.shuffle.classorg.apache.hadoop.mapred.ShuffleHandleryarn.nodemanager.local-dirs/data/hadoop/yarn/localyarn.nodemanager.log-dirs/data/hadoop/logmapreduce.shuffle.port23080yarn.client.failover-proxy-providerorg.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvideryarn.resourcemanager.ha.automatic-failover.zk-base-path/yarn-leader-electionOptional setting. The default value is /yarn-leader-election

修改slaves

thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4

将hadoop安装文件分发到四台服务器上

启动journalnode

在thadoop-uelrcx-host1执行

sbin/hadoop-daemons.sh start journalnode

或者单独进入thadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4 中分别执行

sbin/hadoop-daemon.sh start journalnode

jps检查是否有journalnode 进程

格式化HDFS

在thadoop-uelrcx-host1执行

bin/hadoop namenode -format

启动namenode

sbin/hadoop-daemon.sh start namenode

在thadoop-uelrcx-host2执行下面命令，完成准备节点同步信息

bin/hdfs namenode -bootstrapStandby

格式化ZK

bin/hdfs zkfc -formatZK

启动hdfs

在thadoop-uelrcx-host1执行下列命令，启动dfs

sbin/start-dfs.sh

启动yarn

在thadoop-uelrcx-host1执行下列命令，启动yarn

sbin/start-yarn.sh

HDFS 支持多地址网络

目前，很多情况下，hadoop都运行在多地址网络环境下，集群内部通过内网IP联通，集群外部则通过外网IP访问集群功能。这样做有很多有点：

安全：集群内部通讯的网络和对外通讯的网络相隔离，保证数据的安全
性能：内网集群可以采用很高的网络带宽，如光纤，宽带，或者千兆网
Failover/Redundancy: 节点可以在多网络环境下应对网络的适配的失败

多网络地址环境下的hadoop配置修改

Ensuring HDFS Daemons Bind All Interfaces

默认情况下,hdfs 节点既可以使用hostname，也可以使用IP。无论哪种情况，hdfs 进程都只会绑定一个单独的ip，以保证其他网络无法访问。在多网络地址环境下的解决方式，强制服务节点绑定IP网段 0.0.0.0, 不设置端口。

dfs.namenode.rpc-bind-host0.0.0.0 The actual address the RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.

dfs.namenode.servicerpc-bind-host0.0.0.0 The actual address the service RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.

dfs.namenode.http-bind-host0.0.0.0 The actual adress the HTTP server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.http-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node HTTP server listen on all interfaces by setting it to 0.0.0.0.

dfs.namenode.https-bind-host0.0.0.0 The actual adress the HTTPS server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.https-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node HTTPS server listen on all interfaces by setting it to 0.0.0.0.

Clients use Hostnames when connecting to DataNodes

默认情况下 HDFS 客户端通过namenode提供的IP地址来访问datanode，然而这个ip有可能是客户端无法访问的。解决方案就是通过datanode的hostname，经由DNS来访问datanode。

dfs.client.use.datanode.hostnametrueWhether clients should use datanode hostnames when connecting to datanodes.

DataNodes use HostNames when connecting to other DataNodes

特殊情况下， namanode无法通过ip来访问datanode，此时可以配置hostname，由DNS来访问datanode

dfs.datanode.use.datanode.hostnametrueWhether datanodes should use datanode hostnames when connecting to other datanodes for data transfer.

Ensuring yarn Daemons Bind All Interfaces

yarn.nodemanager.bind-host0.0.0.0

yarn-timeline-service.bind-host0.0.0.0

yarn.resourcemanager.bind-host0.0.0.0

HBASE 安装配置

禁用hbase自带的zk

修改conf/hbase-env.sh

export HBASE_MANAGES_ZK=false

修改hbase-site.xml

hbase.rootdirhdfs://thadoop-uelrcx-host1:9000/hbasehbase.cluster.distributedtruehbase.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4hbase.zookeeper.property.dataDir/data/hadoop/log/hbase/zookeeper

修改regionservers

thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4

将hbase安装目录分发了四台机器上

在thadoop-uelrcx-host1启动hbase

bin/start-hbase.sh

hive 安装配置

修改hive-env.sh，设置HADOOP_HOME

HADOOP_HOME=/Users/junjie.cheng/Developers/hadoop-2.6.0-cdh5.8.0

设置hive-site.xml

javax.jdo.option.ConnectionURLjdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&autoReconnect=true&characterEncoding=UTF-8the URL of the MySQL database

hive.jobname.length30

javax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driver

javax.jdo.option.ConnectionUserNamehive

javax.jdo.option.ConnectionPasswordhive

datanucleus.autoCreateSchemafalsedatanucleus.fixedDatastoretrue

datanucleus.autoStartMechanismSchemaTable

hive.metastore.warehouse.dir/user/hive/warehousehive.support.concurrencytruehive.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4hive.zookeeper.client.port2181

hive.server2.thrift.port10000

hive.aux.jars.pathfile:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-json-serde.jar,file:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-contrib.jar,file:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-serde.jar

hbase.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4

添加必要的jar包

mysql-connector-java-3.1.14-bin.jar
hbase-client-1.2.0-cdh5.8.0.jar, hbase-common-1.2.0-cdh5.8.0.jar, hbase-hadoop-compat-1.2.0-cdh5.8.0.jar, hbase-hadoop2-compat-1.2.0-cdh5.8.0.jar
netty-all-4.0.23.Final.jar
metrics-core-2.2.0.jar

启动hiveserver2

通过schematool初始化数据源
$HIVE_HOME/bin/schematool -dbType mysql -initSchema
启动hiveserver2
$HIVE_HOME/bin/hiveserver2
通过beeline 连接hive
$HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT

spark 安装及配置

spark on yarn

修改spark-env.sh, 设置

export JAVA_HOME=/opt/jdk1.7.0_79
export HADOOP_DIR=/data/hadoop/hadoop-2.6.0-cdh5.8.0
export HADOOP_CONF_DIR=/data/hadoop/hadoop-2.6.0-cdh5.8.0/etc/hadoop
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/lib/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/lib/*:/data/hadoop/spark-1.6.0-cdh5.8.0/lib/*:/data/hadoop/hive-1.1.0-cdh5.8.0/lib/*:/data/hadoop/hbase-1.2.0-cdh5.8.0/lib/*

直接提交任务到yarn上执行

spark standalone

修改slaves

thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4

启动spark集群

sbin/start-all.sh

PreviousHadoop Nextshuffle参数调优

Last updated 6 years ago

Was this helpful?