日志 - 201906 | Robin Min's Blog

2019-06-17 旧金山

How to install hadoop on macOS

Install hadoop via brew

brew search hadoop
brew install hadoop

Go to hadoop base directory,(/usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop) and under this folder,

cd /usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop

Change file hadoop-env.sh:

sudo code hadoop-env.sh

Change from

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export JAVA_HOME="$(/usr/libexec/java_home)"

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home"

Then configure HDFS address and port number, open core-site.xml, input following content in configuration tag:

sudo code core-site.xml

<configuration>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
		<description>A base for other temporary directories.</description>
	</property>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:8020</value>
	</property>
</configuration>

Configure mapreduce to use YARN, first copy mapred-site.xml.template to mapred-site.xml, and open mapred-site.xml, add

sudo code mapred-site.xml

<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
	<property>
		<name>mapreduce.map.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
	<property>
		<name>mapreduce.reduce.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
</configuration>

Set HDFS default backup, the default value is 3, we should change to 1, open hdfs-site.xml, add

sudo code hdfs-site.xml

<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>

To walk through the org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist, we need to modify the file,

sudo code yarn-site.xml

<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>

Setup a pass-phraseless SSH and Authorize the generate SSH keys:

ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

- Finally, Enable remote user:
```bash
sudo systemsetup -setremotelogin on
sudo systemsetup -getremotelogin

Test ssh at localhost: It should not prompt for a password

ssh localhost

Format the distributed file system with the below command before starting the hadoop daemons. So that we can put our data sources into the hdfs file system while performing the map-reduce job

hdfs namenode -format

We need to provide aliases to start and stop Hadoop Daemons. For the purpose, we edit ~/.zshrc and add

alias hstart="/usr/local/Cellar/hadoop/3.1.2/sbin/start-all.sh"
alias hstop="/usr/local/Cellar/hadoop/3.1.2/sbin/stop-all.sh"

$ source ~/.zshrc

Start hadoop using alias

hstart

Check status:
- Hadoop NameNode started on port 9870 default. Access your server on port 9870: http://localhost:9870/
- Access port 8042 for getting the information about the cluster and all applications: http://localhost:8042/
- Access port 9864 to get details about your Hadoop node: http://localhost:9864/datanode.html
Stop hadoop using alias

hstop

How to install hive on macOS

Check status

which hadoop
brew info hive|grep hive:

Install hive

export HIVE_VERSION=3.1.1

DOTFILES_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
echo "Setting up from $DOTFILES_DIR ..."

[ ! -d /usr/local/Cellar/hive/$HIVE_VERSION/ ] && brew install hive
[ ! -d /usr/local/Cellar/hive/$HIVE_VERSION/ ] && echo "Can not find Hive installation in /usr/local/Cellar/hive/$HIVE_VERSION, run 'brew install hive' first."

cd /usr/local/Cellar/hive/$HIVE_VERSION/
export HIVE_HOME=$(pwd)

hdfs dfs -mkdir /tmp
hdfs dfs -mkdir /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse

mkdir -p hcatalog/var/log
touch hcatalog/var/log/hcat.out

ln -f -s $DOTFILES_DIR/hive/conf/hive-site.xml $HIVE_HOME/libexec/conf/hive-site.xml

schematool -initSchema -dbType derby

echo "Done."

Execute the following command to add hive variables:

## hive variables
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1
code $HIVE_HOME/libexec/conf/hive-site.xml

Change the hive-site.xml as:

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=/usr/local/Cellar/hive/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
</configuration>

Create and load the test data:

code /tmp/physicists.csv
# Albert,Einstein
# Marie,Curie

hadoop fs -mkdir /user/data
hadoop fs -put /tmp/physicists.csv /user/data
hadoop fs -ls /user/data

run the hive command to test the hive:

CREATE EXTERNAL TABLE physicists(first string, last string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/data';
SELECT * FROM physicists;

Shift to mysql to store the meta data. Go to mysql page and download the latest jdbc (sign up is required).

tar zxvf ~/Downloads/mysql-connector-java-8.0.16.tar.gz
sudo cp mysql-connector-java-8.0.16/mysql-connector-java-8.0.16-bin.jar $HIVE_HOME/libexec/lib/

mysql
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'@'localhost';
mysql> flush privileges;
mysql> quit;

schematool -initSchema -dbType mysql

How to install spark on macOS

Refer to the instruction to install spark.
Link previously installed hive with spark(Spark comes with its own “standalone” hive, too).

ln -s $HIVE_HOME/libexec/conf/hive-site.xml $SPARK_PATH/libexec/conf/hive-site.xml

2019-06-17 旧金山

游戏身份证：司空坚 235402 19940712 1617 23 男