Commit f13caf70 authored by Ioannis Tsafaras's avatar Ioannis Tsafaras
Browse files

Merge remote-tracking branch 'upstream/devel' into batch-hashtag-wordcount-timestamp

parents 3cac2d93 a1264e98
......@@ -60,3 +60,4 @@ docs/_build/
target/
ansible/hosts
MANIFEST
# Ansible Cluster Setup
# Lambda Cluster Ansible Setup
## Things to do before deployment
- Install python
- Add the public key of the machine running the playbook to all the nodes.
- Install python on every node.
- Add the public key of the machine running the playbooks to all the nodes.
- Create a private network among the machines.
- Modify ansible/hosts file to include the master-node and all the slave nodes.
- Modify ansible/host_vars/master-node file. Concretely, change the internal_ip variable.
- For each slave in ansible/hosts, create a file in ansible/host_vars (the name of the slave in ansible/hosts should be the name of the file in ansible/host_vars). Inside
the file, define the internal_ip variable and the id variable. Each slave should have a unique id.
- Make sure that the firewall of the master-node is off.
- Make sure that the values in vars/main.yml in each role, are correctly defined. For example,
if you have 2 slaves, you should create, at most, 2 TaskManagers in flink and not 3.
## Prerequisites
- Deploy against Debian 8.0 node
- Make sure `python` is installed on the target nodes
- Ansible version used is `1.9.1`
- Deployed against Debian 8.0 node.
- Ansible version used is 1.9.1.
- Currently, the playbooks should be run from an external machine to setup both the master and the slave nodes. In future version, they will be run from the master node to setup
both the master and the slaves.
## VM packages and variable setup
## Playbooks and Roles
Contains Ansible playbook for the installation of required packages and variables. The play is split into three (3) tasks:
- install packages and run basic commands on newly created vms.
- fetch public ssh key from master.
- distribute public key to all slave nodes.
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy master and slave nodes.
### How to deploy
```bash
$ ansible-playbook -v playbooks/install.yml
```
There are four (4) roles and five (5) playbooks. These are:
- common role, run from common playbook.
- apache-hadoop role, run from apache-hadoop playbook.
- apache-kafka role, run from apache-kafka playbook.
- apache-flink role, run from apache-flink playbook.
- cluster-install playbook which runs all the roles with the above sequence.
## Hadoop services (HDFS & YARN) deployment
## Role Explanation
Contains Ansible playbook for the deployment of the Hadoop services required for Flink (HDFS and YARN services). The play is split into five (5) tasks:
- install (downloads and untars hadoop into /usr/local, makes softlink to /usr/local/hadoop)
- config (creates and copies appropriate hadoop configuration, using the master and slaves defined in the inventory)
- hdfs_format (initial format of hdfs)
- hdfs_dirs (create of appropriate hdfs directories, currently for user root)
- start (start hdfs & yarn demons on the cluster nodes)
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy the slave nodes.
### How to deploy
### common
```bash
$ ansible-playbook -v playbooks/hadoop.yml
```
- Installs all the packages that are needed in order for the cluster to run.
- Creates the needed environment variables.
- Configures the /etc/hosts file.
## Apache Flink deployment
### apache-hadoop
- Downloads and installs Apache Hadoop.
- Formats HDFS.
- Starts HDFS.
- Creates the required directories on HDFS.
- Starts Yarn.
Contains Ansible playbook for the deployment of Apache Flink. The playbook is split into five (5) tasks:
- Download Apache Flink, Yarn version(downloads Apache Flink into /root).
- Uncompress Apache Flink(uncompresses Apache Flink into /usr/local).
- Create softlink for Apache Flink(creates /usr/local/flink softlink).
- Configure Apache Flink(copies pre-created Apache Flink configuration files into /usr/local/flink/conf).
- Start Apache Flink(starts an Apache Yarn session with 2 TaskManagers and 512 MB of RAM each).
Apache Flink needs to be installed only on master node. Information about the architecture of the cluster(number of slaves, etc...) are found through Apache Yarn.
### apache-kafka
- Downloads and installs Apache Kafka.
- Starts Apache Zookeeper on master node.
- Starts an Apache Kafka server on every node.
- Creates the needed input and output topics.
### How to deploy
```bash
$ansible-playbook -v playbooks/apache-flink/flink-install.yml
```
### apache-flink
- Downloads and installs Apache Flink on master node.
- Starts and Apache Flink, Yarn session.
## Apache Kafka deployment
## How to deploy
Contains Ansible playbook for the deployment of Apache kafka. The playbook is split into eleven (11) tasks:
- Download Apache Kafka(downloads Apache Kafka into /root).
- Uncompress Apache Kafka(uncompresses Apache Kafka into /usr/local).
- Create softlink for Apache Kafka(creates /usr/local/kafka softlink).
- Configure Apache kafka(copies pre-created Apache Kafka configuration files to /usr/local/kafka/config).
- Start Apache Zookeeper server(starts an Apache Zookeeper server which is a prerequisite for Apache Kafka server).
- Wait for Apache Zookeeper to become available.
- Start Apache Kafka server(starts an Apache Kafka server).
- Wait for Apache Kafka server to become available.
- Create Apache Kafka input topic(creates an Apache Kafka topic, named "input", to store input data).
- Create Apache Kafka batch output topic(creates an Apache Kafka topic, named "batch-output", to store the output data of the batch job).
- Create Apache Kafka stream output topic(creates an Apache Kafka topic, named "stream-output", to store the output data of the stream job).
You can deploy the whole cluster by running the cluster-install playbook:
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy the slave nodes.
```bash
$ ansible-playbook playbooks/cluster/cluster-install.yml -i hosts
```
### How to deploy
or, you can run a single playbook, e.g.:
```bash
$ansible-playbook -v playbooks/apache-kafka/kafka-install.yml
$ ansible-playbook playbooks/apache-hadoop/hadoop-install.yml -i hosts
```
export JAVA_HOME=/usr/
export HADOOP_HOME=/usr/local/hadoop
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export FLINK_HOME=/usr/local/flink
export KAFKA_HOME=/usr/local/kafka
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/hadoop/bin:/usr/local/hadoop/sbin:/usr/local/flink/bin"
internal_ip: "192.168.0.3"
internal_ip: "192.168.0.2"
## Apache Kafka variables.
id: 1
internal_ip: "192.168.0.4"
## Apache Kafka variables.
id: 2
[master]
master-node
[slaves]
slave-1
slave-2
---
- hosts: master
user: root
tasks:
- name: Download Apache Flink, Yarn version.
get_url: url=http://apache.tsl.gr/flink/flink-0.8.1/flink-0.8.1-bin-hadoop2-yarn.tgz dest=/root/flink-0.8.1-bin-hadoop2-yarn.tgz
tags:
- download
- name: Uncompress Apache Flink.
unarchive: src=/root/flink-0.8.1-bin-hadoop2-yarn.tgz dest=/usr/local copy=no
tags:
- uncompress
- name: Create softlink for Apache Flink.
file: src=/usr/local/flink-yarn-0.8.1 dest=/usr/local/flink state=link
tags:
- uncompress
- name: Configure Apache Flink.
copy: src=../files/usr/local/flink/conf/flink-conf.yaml dest=/usr/local/flink/conf/flink-conf.yaml owner=root group=root mode=0644
tags:
- configure
- name: Start Apache Flink.
shell: /usr/local/flink/bin/yarn-session.sh -n 2 -tm 512
async: 31536000 # Stay alive for a year(1 year = 31536000 seconds).
poll: 0
tags:
- start
roles:
- ../roles/apache-flink
---
- hosts: all
user: root
roles:
- ../roles/apache-hadoop
---
- hosts: master
- hosts: all
user: root
tasks:
- name: Download Apache Kafka.
get_url: url=http://mirrors.myaegean.gr/apache/kafka/0.8.2.1/kafka_2.10-0.8.2.1.tgz dest=/root/kafka_2.10-0.8.2.1.tgz
tags:
- download
- name: Uncompress Apache Kafka.
unarchive: src=/root/kafka_2.10-0.8.2.1.tgz dest=/usr/local copy=no
tags:
- uncompress
- name: Create softlink for Apache Kafka.
file: src=/usr/local/kafka_2.10-0.8.2.1 dest=/usr/local/kafka state=link
tags:
- uncompress
- name: Configure Apache kafka.
copy: src=../files/usr/local/kafka/config/server.properties dest=/usr/local/kafka/config/server.properties owner=root group=root mode=0644
tags:
- configure-kafka
- name: Start Apache Zookeeper server.
shell: /usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
async: 31536000 # Stay alive for a year(1 year = 31536000 seconds).
poll: 0
tags:
- start-zookeeper
- name: Wait for Apache Zookeeper to become available.
wait_for: port=2181
tags:
- start-zookeeper
- name: Start Apache Kafka server.
shell: /usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
async: 31536000 # Stay alive for a year(1 year = 31536000 seconds).
poll: 0
tags:
- start-kafka
- name: Wait for Apache Kafka server to become available.
wait_for: port=9092
tags:
- start-kafka
roles:
- ../roles/apache-kafka
---
- hosts: all
user: root
roles:
- ../roles/common
- ../roles/apache-hadoop
- ../roles/apache-kafka
- ../roles/apache-flink
---
- hosts: all
user: root
roles:
- ../roles/common
../files/
\ No newline at end of file
---
- hosts: all
user: root
tasks:
- include: ../tasks/hadoop/install.yml tags=install
- include: ../tasks/hadoop/config.yml tags=config
- hosts: master
user: root
tasks:
- include: ../tasks/hadoop/hdfs_format.yml tags=format
- include: ../tasks/hadoop/start.yml tags=start
- include: ../tasks/hadoop/hdfs_dirs.yml tags=mkdir
---
- hosts: vms
user: root
tasks:
# aptitude packages
- include: ../tasks/aptitude/upgrade.yml tags=install_scripts
- include: ../tasks/aptitude/java.yml tags=install_scripts
- include: ../tasks/aptitude/vim.yml tags=install_scripts
- include: ../tasks/aptitude/python.yml tags=install_scripts
# copy /etc/hosts file
- include: ../tasks/install/hosts.yml tags=install_scripts
# copy /etc/environment file
- include: ../tasks/install/environment.yml tags=install_scripts
# run ssh-keygen
- include: ../tasks/ssh/ssh-keygen.yml tags=install_scripts
- hosts: master
user: root
tasks:
# copy ssh public key of master
- include: ../tasks/ssh/fetch-key.yml
- hosts: vms
user: root
tasks:
# distribute ssh key to all nodes
- include: ../tasks/ssh/authorized-key.yml
../roles/
\ No newline at end of file
---
- name: Download Apache Flink, Yarn version.
get_url: url="{{ mirror_url }}/flink-{{ version }}/flink-{{ version }}-{{ version_for }}.tgz" dest="{{ download_path }}/flink-{{ version }}-{{ version_for }}.tgz"
tags:
- download
- name: Uncompress Apache Flink.
unarchive: src="{{ download_path }}/flink-{{ version }}-{{ version_for }}.tgz" dest="{{ installation_path }}" copy=no
tags:
- uncompress
- name: Create softlink for Apache Flink.
file: src="{{ installation_path }}/flink-{{ version }}" dest="{{ installation_path }}/flink" state=link
tags:
- uncompress
- name: Configure Apache Flink.
template: src=flink-conf.j2 dest="{{ installation_path }}/flink/conf/flink-conf.yaml" owner=root group=root mode=0644
tags:
- configure
- name: Start Apache Flink.
shell: "{{ installation_path }}/flink/bin/yarn-session.sh -n {{ number_of_taskmanagers }} -tm {{ ram_per_task_manager }}"
async: 31536000 # Stay alive for a year(1 year = 31536000 seconds).
poll: 0
tags:
- start
......@@ -21,17 +21,17 @@
# Common
#==============================================================================
jobmanager.rpc.address: master
jobmanager.rpc.address: {{ internal_ip }}
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 512
jobmanager.heap.mb: {{ jobmanager_heap_mb }}
taskmanager.heap.mb: 512
taskmanager.heap.mb: {{ taskmanager_heap_mb }}
taskmanager.numberOfTaskSlots: 2
taskmanager.numberOfTaskSlots: {{ taskmanager_numberOfTaskSlots }}
parallelization.degree.default: 2
parallelization.degree.default: {{ parallelization_degree_default }}
#==============================================================================
# Web Frontend
......@@ -43,7 +43,7 @@ webclient.port: 8080
#==============================================================================
# Advanced
#==============================================================================
# The number of buffers for the network stack.
#
......
---
mirror_url: "http://mirrors.myaegean.gr/apache/flink"
version: "0.9.0"
version_for: "bin-hadoop27"
download_path: "/root"
installation_path: "/usr/local"
jobmanager_heap_mb: 256
taskmanager_heap_mb: 512
taskmanager_numberOfTaskSlots: 2
parallelization_degree_default: 2
number_of_taskmanagers: 2
ram_per_task_manager: 768
---
- name: Include common tasks.
include: setup.yml
tags:
- setup
- name: Include tasks for master.
include: master.yml
when: "'master' in group_names"
tags:
- master-install
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment