Machine Learning & Big Data Blog

How to Setup a Cassandra Cluster

Mongosh commands.
3 minute read
Walker Rowe

Here we show how to set up a Cassandra cluster. We will use two machines, 172.31.47.43 and 172.31.46.15. First, open these firewall ports on both:

7000
7001
7199
9042
9160
9142

Then follow this document to install Cassandra and get familiar with its basic concepts. Make sure to install Cassandra on each node.

(This article is part of our Cassandra Guide. Use the right-hand menu to navigate.)

Configure Cluster Setting

There is no central master in a Cassandra cluster. Instead you just make each one aware of the others and they work together.

First we will edit /etc/cassandra/cassandra.yaml on both machines set the the values as shown in the table below. Don’t change the cluster name yet. We will do that later.

    • seeds—set the IP address on one machine to be the seed. It is not necessary that all machines be seeds. Seeds are nodes that Cassandra nodes use when you start Cassandra start to find other nodes.
    • listen_address—the IP address for Cassandra to run.
    • endpoint_snitch—this is used to determine where to route data and send replicas. We use the default below. There are several. The others are rack-aware, meaning they would not put a replica on the same physical storage rack as another. If you did that and the whole rack failed the data could be lost. There is even one (Ec2Snitch) designed for Amazon EC2 that can spread data across Amazon Zones.
machine 172.31.46.15 settings machine 172.31.47.43 settings
endpoint_snitch: SimpleSnitch
- seeds: "seeds: 172.31.47.43"
listen_address: 172.31.46.15
endpoint_snitch: SimpleSnitch
- seeds: "seeds: 172.31.47.43"
listen_address: 172.31.47.43

Now run on both machines:

sudo service cassandra start

Then wait a few seconds for discovery to work and then run on both machines:

nodetool status

It should show both nodes:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.46.15  245.99 KiB  256          100.0%            fb1d89bb-cbe2-488f-b2e7-da145bd2dde7  rack1
UN  172.31.47.43  196.01 KiB  256          100.0%            472fd4f0-9bb3-48a3-a933-9c9b07f7a9f6  rack1

If you get any kind of error message look in /var/log/cassandra/system.log

Now let’s change the name of the cluster from the defaut. Run cqlsh and then paste in the SQL below. Cassandra does not replicate this system change across the cluster so you have to run this on both machines.

UPDATE system.local SET cluster_name = 'Walker Cluster' where key='local';

Now edit /etc/cassandra/cassandra.yaml and change the cluster name to whatever you want. It should be the same on both machines:

cluster_name: 'Walker Cluster'

Then:

sudo service cassandra restart

Run this check again:

nodetool status 
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.31.46.15  312.4 KiB  256          100.0%            fb1d89bb-cbe2-488f-b2e7-da145bd2dde7  rack1
UN  172.31.47.43  294.71 KiB  256          100.0%            472fd4f0-9bb3-48a3-a933-9c9b07f7a9f6  rack1

Now, following these instructions from our introduction to cassandra, let’s create some data. We will see that data entered on one node is replicated to another. Paste these SQL commands into csql:

 CREATE KEYSPACE Library
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
CREATE TABLE Library.book (       
ISBN text, 
copy int, 
title text,  
PRIMARY KEY (ISBN, copy)
);
CREATE TABLE  Library.patron (      
ssn int PRIMARY KEY,  
checkedOut set 
);
INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',1, 'Bible');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',2, 'Bible');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('1234',3, 'Bible');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('5678',1, 'Koran');
INSERT INTO  Library.book (ISBN, copy, title) VALUES('5678',2, 'Koran');

Then logon to the opposite machine and verify that the data has been copied there:

select * from Library.book;
isbn | copy | title
------+------+-------
5678 |    1 | Koran
5678 |    2 | Koran
1234 |    1 | Bible
1234 |    2 | Bible
1234 |    3 | Bible

Learn ML with our free downloadable guide

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.