环境安装
[TOC]
版本选择
flume-ng-1.6.0-cdh5.8.0.tar.gz
hadoop-2.6.0-cdh5.8.0.tar.gz
hbase-1.2.0-cdh5.8.0.tar.gz
hbase-solr-1.5-cdh5.8.0.tar.gz
hive-1.1.0-cdh5.8.0.tar.gz
hue-3.9.0-cdh5.8.0.tar.gz
oozie-4.1.0-cdh5.8.0.tar.gz
pig-0.12.0-cdh5.8.0.tar.gz
solr-4.10.3-cdh5.8.0.tar.gz
spark-1.6.0-cdh5.8.0.tar.gz
sqoop-1.4.6-cdh5.8.0.tar.gz
sqoop2-1.99.5-cdh5.8.0.tar.gz
zookeeper-3.4.5-cdh5.8.0.tar.gz
准备
10.19.138.198 thadoop-uelrcx-host1 namenode resourcemanager hmaster hiveserver2 master
10.19.134.88 thadoop-uelrcx-host2 zk journalnode namenode/datanode nodemanager regionserver worker
10.19.164.182 thadoop-uelrcx-host3 zk journalnode datanode nodemanager regionserver worker
10.19.78.105 thadoop-uelrcx-host4 zk journalnode datanode nodemanager regionserver worker
设置免密码登录
zookeeper 集群安装配置
修改zk配置文件
mv zoo_sample.cfg zoo.cfg
修改zk参数
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
server.2=thadoop-uelrcx-host2:2888:3888
server.3=thadoop-uelrcx-host3:2888:3888
server.4=thadoop-uelrcx-host4:2888:3888
拷贝zk到thadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host3三台机器上
在三台机器的/tmp/zookeeper下创建myid, 并分别填充2,3,4
分别启动zk实例
bin/zkServer.sh start
验证是否启动成功
bin/zkServer.sh status
连接zk服务
bin/zkCli.sh -server *********
hadoop 安装
修改core-site.xml
fs.defaultFShdfs://thadoopclusterio.file.buffer.size131072ha.zookeeper.quorumthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181hadoop.tmp.dir/data/hadoop/tmpAbase for other temporary directories.hadoop.proxyuser.hadoop.hosts*hadoop.proxyuser.hadoop.groups*
修改hdfs-site.xml
dfs.nameservicesthadoopclusterdfs.ha.namenodes.thadoopclusternn1, nn2dfs.namenode.rpc-address.thadoopcluster.nn1thadoop-uelrcx-host1:9000dfs.namenode.rpc-address.thadoopcluster.nn2thadoop-uelrcx-host2:9000dfs.namenode.http-address.thadoopcluster.nn1thadoop-uelrcx-host1:50070dfs.namenode.http-address.thadoopcluster.nn2thadoop-uelrcx-host2:50070dfs.namenode.shared.edits.dirqjournal://thadoop-uelrcx-host2:8485;thadoop-uelrcx-host3:8485;thadoop-uelrcx-host4:8485/thadoopclusterdfs.client.failover.proxy.provider.thadoopclusterorg.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderdfs.ha.fencing.methodssshfencedfs.ha.fencing.ssh.private-key-files/home/hadoop/.ssh/id_rsadfs.journalnode.edits.dir/data/hadoop/journal/node/local/datadfs.ha.automatic-failover.enabledtruedfs.namenode.name.dir/data/hadoop/namenodedfs.datanode.data.dir/data/hadoop/datanodedfs.replication3dfs.webhdfs.enabledtrue
修改mapred-site.xml
mapreduce.framework.nameyarn
修改yarn-site.xml
yarn.resourcemanager.connect.retry-interval.ms2000yarn.resourcemanager.ha.enabledtrueyarn.resourcemanager.ha.rm-idsrm1,rm2yarn.resourcemanager.ha.automatic-failover.enabledtrueyarn.resourcemanager.hostname.rm1thadoop-uelrcx-host1yarn.resourcemanager.hostname.rm2thadoop-uelrcx-host2yarn.resourcemanager.ha.idrm1If we want to launch more than one RM in single node, we need this configurationyarn.resourcemanager.recovery.enabledtrueyarn.resourcemanager.zk-state-store.addressthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181yarn.resourcemanager.store.classorg.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStoreyarn.resourcemanager.zk-addressthadoop-uelrcx-host2:2181,thadoop-uelrcx-host3:2181,thadoop-uelrcx-host4:2181yarn.resourcemanager.cluster-idthadoopcluster-yarnyarn.app.mapreduce.am.scheduler.connection.wait.interval-ms5000yarn.resourcemanager.address.rm1thadoop-uelrcx-host1:8132yarn.resourcemanager.scheduler.address.rm1thadoop-uelrcx-host1:8130yarn.resourcemanager.webapp.address.rm1thadoop-uelrcx-host1:8188yarn.resourcemanager.resource-tracker.address.rm1thadoop-uelrcx-host1:8131yarn.resourcemanager.admin.address.rm1thadoop-uelrcx-host1:8033yarn.resourcemanager.ha.admin.address.rm1thadoop-uelrcx-host1:23142yarn.resourcemanager.address.rm2thadoop-uelrcx-host2:8132yarn.resourcemanager.scheduler.address.rm2thadoop-uelrcx-host2:8130yarn.resourcemanager.webapp.address.rm2thadoop-uelrcx-host2:8188yarn.resourcemanager.resource-tracker.address.rm2thadoop-uelrcx-host2:8131yarn.resourcemanager.admin.address.rm2thadoop-uelrcx-host2:8033yarn.resourcemanager.ha.admin.address.rm2thadoop-uelrcx-host2:23142yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.aux-services.mapreduce.shuffle.classorg.apache.hadoop.mapred.ShuffleHandleryarn.nodemanager.local-dirs/data/hadoop/yarn/localyarn.nodemanager.log-dirs/data/hadoop/logmapreduce.shuffle.port23080yarn.client.failover-proxy-providerorg.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvideryarn.resourcemanager.ha.automatic-failover.zk-base-path/yarn-leader-electionOptional setting. The default value is /yarn-leader-election
修改slaves
thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4
将hadoop安装文件分发到四台服务器上
启动journalnode
在thadoop-uelrcx-host1执行
sbin/hadoop-daemons.sh start journalnode
或者单独进入thadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4 中分别执行
sbin/hadoop-daemon.sh start journalnode
jps检查是否有journalnode 进程
格式化HDFS
在thadoop-uelrcx-host1执行
bin/hadoop namenode -format
启动namenode
sbin/hadoop-daemon.sh start namenode
在thadoop-uelrcx-host2执行下面命令,完成准备节点同步信息
bin/hdfs namenode -bootstrapStandby
格式化ZK
bin/hdfs zkfc -formatZK
启动hdfs
在thadoop-uelrcx-host1执行下列命令, 启动dfs
sbin/start-dfs.sh
启动yarn
在thadoop-uelrcx-host1执行下列命令, 启动yarn
sbin/start-yarn.sh
HDFS 支持多地址网络
目前,很多情况下,hadoop都运行在多地址网络环境下,集群内部通过内网IP联通,集群外部则通过外网IP访问集群功能。 这样做有很多有点:
安全: 集群内部通讯的网络和对外通讯的网络相隔离,保证数据的安全
性能: 内网集群可以采用很高的网络带宽,如光纤,宽带,或者千兆网
Failover/Redundancy: 节点可以在多网络环境下应对网络的适配的失败
多网络地址环境下的hadoop配置修改
Ensuring HDFS Daemons Bind All Interfaces
默认情况下,hdfs 节点既可以使用hostname, 也可以使用IP。 无论哪种情况,hdfs 进程都只会绑定一个单独的ip,以保证其他网络无法访问。 在多网络地址环境下的解决方式,强制服务节点绑定IP网段 0.0.0.0, 不设置端口。
dfs.namenode.rpc-bind-host0.0.0.0 The actual address the RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.rpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.servicerpc-bind-host0.0.0.0 The actual address the service RPC server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.http-bind-host0.0.0.0 The actual adress the HTTP server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.http-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node HTTP server listen on all interfaces by setting it to 0.0.0.0.
dfs.namenode.https-bind-host0.0.0.0 The actual adress the HTTPS server will bind to. If this optional address is set, it overrides only the hostname portion of dfs.namenode.https-address. It can also be specified per name node or name service for HA/Federation. This is useful for making the name node HTTPS server listen on all interfaces by setting it to 0.0.0.0.
Clients use Hostnames when connecting to DataNodes
默认情况下 HDFS 客户端通过namenode提供的IP地址来访问datanode, 然而这个ip有可能是客户端无法访问的。 解决方案就是通过datanode的hostname,经由DNS来访问datanode。
dfs.client.use.datanode.hostnametrueWhether clients should use datanode hostnames when connecting to datanodes.
DataNodes use HostNames when connecting to other DataNodes
特殊情况下, namanode无法通过ip来访问datanode, 此时可以配置hostname,由DNS来访问datanode
dfs.datanode.use.datanode.hostnametrueWhether datanodes should use datanode hostnames when connecting to other datanodes for data transfer.
Ensuring yarn Daemons Bind All Interfaces
yarn.nodemanager.bind-host0.0.0.0
yarn-timeline-service.bind-host0.0.0.0
yarn.resourcemanager.bind-host0.0.0.0
HBASE 安装配置
禁用hbase自带的zk
修改conf/hbase-env.sh
export HBASE_MANAGES_ZK=false
修改hbase-site.xml
hbase.rootdirhdfs://thadoop-uelrcx-host1:9000/hbasehbase.cluster.distributedtruehbase.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4hbase.zookeeper.property.dataDir/data/hadoop/log/hbase/zookeeper
修改regionservers
thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4
将hbase安装目录分发了四台机器上
在thadoop-uelrcx-host1启动hbase
bin/start-hbase.sh
hive 安装配置
修改hive-env.sh,设置HADOOP_HOME
HADOOP_HOME=/Users/junjie.cheng/Developers/hadoop-2.6.0-cdh5.8.0
设置hive-site.xml
javax.jdo.option.ConnectionURLjdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&autoReconnect=true&characterEncoding=UTF-8the URL of the MySQL database
hive.jobname.length30
javax.jdo.option.ConnectionDriverNamecom.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserNamehive
javax.jdo.option.ConnectionPasswordhive
datanucleus.autoCreateSchemafalsedatanucleus.fixedDatastoretrue
datanucleus.autoStartMechanismSchemaTable
hive.metastore.warehouse.dir/user/hive/warehousehive.support.concurrencytruehive.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4hive.zookeeper.client.port2181
hive.server2.thrift.port10000
hive.aux.jars.pathfile:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-json-serde.jar,file:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-contrib.jar,file:///data/hadoop/hive-1.1.0-cdh5.8.0/lib/hive-serde.jar
hbase.zookeeper.quorumthadoop-uelrcx-host2,thadoop-uelrcx-host3,thadoop-uelrcx-host4
添加必要的jar包
mysql-connector-java-3.1.14-bin.jar
hbase-client-1.2.0-cdh5.8.0.jar, hbase-common-1.2.0-cdh5.8.0.jar, hbase-hadoop-compat-1.2.0-cdh5.8.0.jar, hbase-hadoop2-compat-1.2.0-cdh5.8.0.jar
netty-all-4.0.23.Final.jar
metrics-core-2.2.0.jar
启动hiveserver2
通过schematool初始化数据源
$HIVE_HOME/bin/schematool -dbType mysql -initSchema
启动hiveserver2
$HIVE_HOME/bin/hiveserver2
通过beeline 连接hive
$HIVE_HOME/bin/beeline -u jdbc:hive2://$HS2_HOST:$HS2_PORT
spark 安装及配置
spark on yarn
修改spark-env.sh, 设置
export JAVA_HOME=/opt/jdk1.7.0_79
export HADOOP_DIR=/data/hadoop/hadoop-2.6.0-cdh5.8.0
export HADOOP_CONF_DIR=/data/hadoop/hadoop-2.6.0-cdh5.8.0/etc/hadoop
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/common/lib/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/*:/data/hadoop/hadoop-2.6.0-cdh5.8.0/share/hadoop/yarn/lib/*:/data/hadoop/spark-1.6.0-cdh5.8.0/lib/*:/data/hadoop/hive-1.1.0-cdh5.8.0/lib/*:/data/hadoop/hbase-1.2.0-cdh5.8.0/lib/*
直接提交任务到yarn上执行
spark standalone
修改slaves
thadoop-uelrcx-host2
thadoop-uelrcx-host3
thadoop-uelrcx-host4
启动spark集群
sbin/start-all.sh
Last updated
Was this helpful?