日志 - 201906

未知

2019-06-17 旧金山

How to install hadoop on macOS
  • Install hadoop via brew
brew search hadoop
brew install hadoop
  • Go to hadoop base directory,(/usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop) and under this folder,
cd /usr/local/Cellar/hadoop/3.1.2/libexec/etc/hadoop
  • Change file hadoop-env.sh:
sudo code hadoop-env.sh

Change from

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export JAVA_HOME="$(/usr/libexec/java_home)"

to

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home"
  • Then configure HDFS address and port number, open core-site.xml, input following content in configuration tag:
sudo code core-site.xml
<configuration>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
		<description>A base for other temporary directories.</description>
	</property>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:8020</value>
	</property>
</configuration>
  • Configure mapreduce to use YARN, first copy mapred-site.xml.template to mapred-site.xml, and open mapred-site.xml, add
sudo code mapred-site.xml
<configuration>
	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
	<property>
		<name>mapreduce.map.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
	<property>
		<name>mapreduce.reduce.env</name>
		<value>HADOOP_MAPRED_HOME=/Users/Masser/hadoop</value>
	</property>
</configuration>
  • Set HDFS default backup, the default value is 3, we should change to 1, open hdfs-site.xml, add
sudo code hdfs-site.xml
<configuration>
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
</configuration>
  • To walk through the org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist, we need to modify the file,
sudo code yarn-site.xml
<configuration>
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>
</configuration>
  • Setup a pass-phraseless SSH and Authorize the generate SSH keys:
ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

- Finally, Enable remote user:
```bash
sudo systemsetup -setremotelogin on
sudo systemsetup -getremotelogin
  • Test ssh at localhost: It should not prompt for a password
ssh localhost
  • Format the distributed file system with the below command before starting the hadoop daemons. So that we can put our data sources into the hdfs file system while performing the map-reduce job
hdfs namenode -format
  • We need to provide aliases to start and stop Hadoop Daemons. For the purpose, we edit ~/.zshrc and add
alias hstart="/usr/local/Cellar/hadoop/3.1.2/sbin/start-all.sh"
alias hstop="/usr/local/Cellar/hadoop/3.1.2/sbin/stop-all.sh"

$ source ~/.zshrc

  • Start hadoop using alias
hstart
hstop
How to install hive on macOS
  • Check status
which hadoop
brew info hive|grep hive:
  • Install hive
export HIVE_VERSION=3.1.1

DOTFILES_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/.." && pwd )"
echo "Setting up from $DOTFILES_DIR ..."

[ ! -d /usr/local/Cellar/hive/$HIVE_VERSION/ ] && brew install hive
[ ! -d /usr/local/Cellar/hive/$HIVE_VERSION/ ] && echo "Can not find Hive installation in /usr/local/Cellar/hive/$HIVE_VERSION, run 'brew install hive' first."

cd /usr/local/Cellar/hive/$HIVE_VERSION/
export HIVE_HOME=$(pwd)

hdfs dfs -mkdir /tmp
hdfs dfs -mkdir /user/hive/warehouse
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /user/hive/warehouse

mkdir -p hcatalog/var/log
touch hcatalog/var/log/hcat.out

ln -f -s $DOTFILES_DIR/hive/conf/hive-site.xml $HIVE_HOME/libexec/conf/hive-site.xml

schematool -initSchema -dbType derby

echo "Done."
  • Execute the following command to add hive variables:
## hive variables
export HIVE_HOME=/usr/local/Cellar/hive/3.1.1
code $HIVE_HOME/libexec/conf/hive-site.xml

Change the hive-site.xml as:

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:derby:;databaseName=/usr/local/Cellar/hive/metastore_db;create=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
</configuration>
  • Create and load the test data:
code /tmp/physicists.csv
# Albert,Einstein
# Marie,Curie

hadoop fs -mkdir /user/data
hadoop fs -put /tmp/physicists.csv /user/data
hadoop fs -ls /user/data
  • run the hive command to test the hive:
CREATE EXTERNAL TABLE physicists(first string, last string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/data';
SELECT * FROM physicists;
  • Shift to mysql to store the meta data. Go to mysql page and download the latest jdbc (sign up is required).
tar zxvf ~/Downloads/mysql-connector-java-8.0.16.tar.gz
sudo cp mysql-connector-java-8.0.16/mysql-connector-java-8.0.16-bin.jar $HIVE_HOME/libexec/lib/

mysql
mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'password';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'@'localhost';
mysql> flush privileges;
mysql> quit;

schematool -initSchema -dbType mysql
How to install spark on macOS
  • Refer to the instruction to install spark.
  • Link previously installed hive with spark(Spark comes with its own “standalone” hive, too).
ln -s $HIVE_HOME/libexec/conf/hive-site.xml $SPARK_PATH/libexec/conf/hive-site.xml

2019-06-17 旧金山

  • 游戏身份证 :司空坚 235402 19940712 1617 23 男