Installing Greenplum CE (Community Edition) on Virtual Box.

Hi again, this time I’m showing how to install and configure Greenplum CE in a single node. Also you will be available to download the .ova file, in order to import the appliance and have it installed in a Centos 6.5 64 bits.

First of all, a little bit of about Greenplum 🙂

* Greenplum Database is a massively parallel processing (MPP) database server based on PostgreSQL open-source technology. MPP (also known as a shared nothing architecture) refers to systems with two or more processors which cooperate to carry out an operation – each processor with its own memory, operating system and disks.
* Greenplum Database is an array of individual databases based upon PostgreSQL 8.2 working together to present a single database image.
* Greenplum Database stores and processes large amounts of data by distributing the data and processing workload across several servers or hosts.
* The master is the entry point to the Greenplum Database system. and coordinates its work with the other database instances in the system, called segments. It is the segments where data is stored and the majority of query processing takes place.

— Learn more here:
http://www.technology-mania.com/2011/04/understanding-greenplum.html
http://www.greenplumdba.com/greenplum-database

— IMPORTANT: If you are a lazy DBA and you don’t want to do the installation, download the the virtual box appliance we made for you 🙂 with Greenplum ready to use. Pass for root is qwe123 😉
Link: Greenplum Virtual Box appliance

Now we are in the same page, lets install Greenplum CE in our Virtual box machine.

Right now, i’m using Virtual Box 4.3.6 and I have installed Centos 6.5 64 bits. This part is not included in the how to 🙂

1) Once we have the OS ready, we need to install the YUM packages for XFS support:

 yum install xfsprogs xfsdump 

2) We need another disk for the data storage. Add a new disk to the virtual machine and the create the xfs partition:

 mkfs -t xfs /dev/sdb 

3) Add a line to the /etc/fstab to mount the partition onboot:

 vi /etc/fstab 

/dev/sdb /greenplum xfs noatime 0 0

4) Now we create the Greenplum data store and mount the partition on the /greenplum mount point.

 mkdir /greenplum
 mount /greenplum 

5) Before running the installation we need to tweak some system parameters in /etc/sysctl.conf :

 vi /etc/sysctl.conf 

xfs_mount_options = rw,noatime,inode64,allocsize=16m
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
vm.overcommit_memory = 2

6) And also in /etc/security/limits.conf

 vi /etc/security/limits.conf 

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

7) We set the disk access policy for the linux disk I/O scheduler:

 echo deadline > /sys/block/sdb/queue/scheduler 

8) Finally we set the disk device file value for read-ahead (blockdev) to 16384 and we are done with SO tweaking!

 /sbin/blockdev --setra 16385 /dev/sdb 

9) You can get the installation file from the Pivotal site (http://www.gopivotal.com/products/pivotal-greenplum-database). You will get a zip file. Unzippit and then run the binary. After the installation is done, you will find the DB installed on /usr/local/greenplum-db

 unzip /root/greenplum-db-4.2.2.4-build-1-CE-RHEL5-x86_64.zip
 /bin/bash greenplum-db-4.2.2.4-build-1-CE-RHEL5-x86_64.bin 

10) Now we need to create the OS user to admin the Greenplum DB, and of course, chown the master and segments directories to the user.

 useradd gpadmin
 passwd gpadmin&lt
 mkdir -p /greenplum/master
 mkdir -p /greenplum/segment1
 mkdir -p /greenplum/segment2
 chown -R gpadmin:gpadmin /greenplum 

11) From now on, all the you have to do will be with gpadmin user, so, sudo su – gpadmin 🙂

12) We set the environment for gpadmin by copying the following to the .bashrc file:

 vi /home/gpadmin/.bashrc 

##########################################################################

GPHOME=/usr/local/greenplum-db-4.2.2.4

# Replace with symlink path if it is present and correct
if [ -h ${GPHOME}/../greenplum-db ]; then
GPHOME_BY_SYMLINK=`(cd ${GPHOME}/../greenplum-db/ && pwd -P)`
if [ x”${GPHOME_BY_SYMLINK}” = x”${GPHOME}” ]; then
GPHOME=`(cd ${GPHOME}/../greenplum-db/ && pwd -L)`/.
fi
unset GPHOME_BY_SYMLINK
fi
PATH=$GPHOME/bin:$GPHOME/ext/python/bin:$PATH
LD_LIBRARY_PATH=$GPHOME/lib:$GPHOME/ext/python/lib:$LD_LIBRARY_PATH
PYTHONPATH=$GPHOME/lib/python
PYTHONHOME=$GPHOME/ext/python
OPENSSL_CONF=$GPHOME/etc/openssl.cnf
export GPHOME
export PATH
export LD_LIBRARY_PATH
export PYTHONPATH
export PYTHONHOME
export OPENSSL_CONF

##########################################################################

13) We need to edit the single_host_file and add the localhost entry, in order to crate the keys:

 vi ~/single_host_file 

localhost

 gpssh-exkeys -f ~/single_host_file 

14) We need to create a configuration file to be used by the init process:

 vi ~/gp_init_config 

#####################################################################

ARRAY_NAME=”Greenplum”
MACHINE_LIST_FILE=/home/gpadmin/single_host_file
SEG_PREFIX=gp
PORT_BASE=50000
declare -a DATA_DIRECTORY=(/greenplum/segment1 /greenplum/segment2)
MASTER_HOSTNAME=localhost
MASTER_DIRECTORY=/greenplum/master
MASTER_PORT=5432
ENCODING=UNICODE

#####################################################################

15) That’s all! Now start the database up!

 gpinitsystem -c ~/gp_init_config 

16) Is it working? test it!

 psql template1 

— Sources and help:
http://www.greenplumdba.com/installing-the-greenplum-database
http://blog.2ndquadrant.com/installing_greenplum_sne_ec2/

Leave a Reply