README.md 3.79 KB
Newer Older
1
# Ansible Cluster Setup
2
3


4
## Things to do before deployment
5
6
7
8

- Install python
- Add the public key of the machine running the playbook to all the nodes.

9
10

## Prerequisites
11
12
13
14
15

- Deploy against Debian 8.0 node
- Make sure `python` is installed on the target nodes
- Ansible version used is `1.9.1`

16

17
## VM packages and variable setup
18
19
20
21
22
23
24

Contains Ansible playbook for the installation of required packages and variables. The play is split into three (3) tasks:
- install packages and run basic commands on newly created vms.
- fetch public ssh key from master.
- distribute public key to all slave nodes.
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy master and slave nodes.
	
25
26
27
28
29
### How to deploy

```bash
$ ansible-playbook -v playbooks/install.yml
```
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46


## Hadoop services (HDFS & YARN) deployment

Contains Ansible playbook for the deployment of the Hadoop services required for Flink (HDFS and YARN services). The play is split into five (5) tasks:
- install (downloads and untars hadoop into /usr/local, makes softlink to /usr/local/hadoop)
- config (creates and copies appropriate hadoop configuration, using the master and slaves defined in the inventory)
- hdfs_format (initial format of hdfs)
- hdfs_dirs (create of appropriate hdfs directories, currently for user root)
- start (start hdfs & yarn demons on the cluster nodes)
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy the slave nodes.

### How to deploy

```bash
$ ansible-playbook -v playbooks/hadoop.yml
```
47
48
49
50
51
52
53
54
55
56


## Apache Flink deployment

Contains Ansible playbook for the deployment of Apache Flink. The playbook is split into five (5) tasks:
- Download Apache Flink, Yarn version(downloads Apache Flink into /root).
- Uncompress Apache Flink(uncompresses Apache Flink into /usr/local).
- Create softlink for Apache Flink(creates /usr/local/flink softlink).
- Configure Apache Flink(copies pre-created Apache Flink configuration files into /usr/local/flink/conf).
- Start Apache Flink(starts an Apache Yarn session with 2 TaskManagers and 512 MB of RAM each).
57

58
59
60
61
62
63
64
65
66
Apache Flink needs to be installed only on master node. Information about the architecture of the cluster(number of slaves, etc...) are found through Apache Yarn.

### How to deploy

```bash
$ansible-playbook -v playbooks/apache-flink/flink-install.yml
```


67
## Apache Kafka deployment
68
69
70
71
72
73
74
75
76
77
78
79
80

Contains Ansible playbook for the deployment of Apache kafka. The playbook is split into eleven (11) tasks:
- Download Apache Kafka(downloads Apache Kafka into /root).
- Uncompress Apache Kafka(uncompresses Apache Kafka into /usr/local).
- Create softlink for Apache Kafka(creates /usr/local/kafka softlink).
- Configure Apache kafka(copies pre-created Apache Kafka configuration files to /usr/local/kafka/config).
- Start Apache Zookeeper server(starts an Apache Zookeeper server which is a prerequisite for Apache Kafka server).
- Wait for Apache Zookeeper to become available.
- Start Apache Kafka server(starts an Apache Kafka server).
- Wait for Apache Kafka server to become available.
- Create Apache Kafka input topic(creates an Apache Kafka topic, named "input", to store input data).
- Create Apache Kafka batch output topic(creates an Apache Kafka topic, named "batch-output", to store the output data of the batch job).
- Create Apache Kafka stream output topic(creates an Apache Kafka topic, named "stream-output", to store the output data of the stream job).
81

82
83
84
85
86
87
88
89
Currently, the playbooks are run from an external node, and deploy both master and slave nodes. In future version, they will run from the master node to deploy the slave nodes.

### How to deploy

```bash
$ansible-playbook -v playbooks/apache-kafka/kafka-install.yml
```