How I learned to appreciate MongoDB Oracle Headquarters Redwood Shores1 e1698667100526

How I learned to appreciate MongoDB

Last week at our company we organized a session about NoSQL in general and MongoDB in particular, as you can read here. The MongoDB focus was presented by me and I would like to take you with me on my trip to actually appreciating the NoSQL database world (and MongoDB in particular).

Coming from RDBMS I’m used to ACID. Business models reside within the database, data never gets lost, transactions are isolated, you’ve got your read consistency, strictly defined tables, foreign keys et cetera. MongoDB on the other hand I saw as an unstructured pile of data, not stuctured, no such thing as transactions, it’s ‘eventually consistent’ (that sounds like a leap of faith), no joins hence no foreign keys… You get the picture.

I am a DBA. My major concern is to have all data available or at least recoverable, no matter what. But the world is changing. Developers more and more look at the database as a storage engine; business logica is programmed in the application. We, DBA’s, try to teach them to use the database better, but is that really necessary? There’s no law..

The world is changing fast and so are businesses. It’s not unusual to deploy a new release every night. If developers need to redesign the database every few days, then maybe the structure of data is not that important. If we collect the number of hits on our website, is it a disaster if out of 10,000 hits we occasionally miss 1?

It takes a lot of discussion and ‘yes, but..’ for this other look at data to finally settle in. At least, for me it did. What finally won me over was an online course at MongoDB University that sort of mitigated the pain. Because, once you let go of the ACID model, you gain a lot of flexibility in terms of database design and infrastructure design. Scaling out becomes a very easy operation for instance. Resilience against hardware failure is a piece of cake. And due the lack of the RDBMS legacy, the engine can focus almost entirely on reading and writing data which leads to lightning fast performance.

In the next paragraphs i will show some examples of the resilience and general behaviour of MongoDB, losely compared to Oracle. It is handson so I will also get you started, as minimal as possible, with mongoDB in general.

I will not go into the way you read and write data. Only some actions will be shown that are needed for the examples. But in general:

db = database and can be compared to a schema in Oracle.

A db contains collections, which can be compared to tables.

A collections contains documents which can be compared to rows.

Joins are not possible, so all data you need should be in the same collection.

Collections consist of key:value pairs, as many as you like within one collection.

So, you can have the database ‘trades’ with collection ‘customers’ with documents like

{“name”:”Larry”,”company”:”Oracle”,”hobby”:”sailing”}

Getting started.

Go to the MongoDB site here and download the appropriate version. The entire handson can be run on your typical Windows or Linux laptop or Virtual Box. The installation is very easy, no instructions needed from my part.

The software will be installed in C:\Program Files\MongoDB\Server\3.4\bin. Version can change over time off course.

Add that to your path in your environment variables.

In linux, add the path where you unpacked the tar ball, followed by bin, to your $PATH.

Starting an instance is also very easy. Open a command box, create a directory \data\db1 and start the instance with

mongod --dbpath \data\db1

On windows you should leave this box open. When you close it the instance shuts down. Better would be to create a service but for this demonstration, this will do.

Stop the database by pressing ^C.

On linux you can fork the process so you don’t have to open a terminal for every instance:

mongod --dbpath /data/db1 --fork --logpath a.log

End it by killing the process.

From now on I will continue in Windows, Linux users can follow the instructions with minor adjustments, like / instead of \

Also make sure to use a different logpath for each instance.

Resilience against hardware failure.

In Oracle we have 2 kinds of resilience against hardware failure. Instance failure can be mitigated by RAC, storage failure by data guard. Besides, if a single instance crashes you can recover all data, provided you have a decent backup strategy.

MongoDB uses a different approach called replica sets. Each instance (or member as it’s called) has its own storage and can be replicated to another instance (or many) with its own storage too. Only one instance can read and write, that is the primary instance. The others only allow you to read data.

In production this is a no brainer: should a single instance fail, then you can’t recover all data like in Oracle, no matter how often you make backups.

All instances vote who will be the primary. This can be manipulated by setting the priority parameter. I will not go into that here but just demonstrate a simple replica set.

Open a command box and type:

mkdir \data\db1\r1
mkdir \data\db1\r2
mkdir \data\db1\r3
mongod --smallfiles --oplogSize 50 --port 27001 --dbpath \data\db1\r1 --replSet r

Leave it open and open a second Command box and type:

mongod --smallfiles --oplogSize 50 --port 27002 --dbpath \data\db1\r2 --replSet r

Leave it open and open a third Command box and type:

mongod --smallfiles --oplogSize 50 --port 27003 --dbpath \data\db1\r3 --replSet r

Open fourth command box This will be used to actually talk to the database using the mongo shell. We will then initiate the replica set.

mongo –-port 27003
rs.initiate(
           { _id:'r',
             members:[
                     { _id:1, host:'localhost:27001' },
                     { _id:2, host:'localhost:27002', "arbiterOnly" : true },
                     { _id:3, host:'localhost:27003' }
                     ]
           }
)
rs.status()

I introduced a special member, the Arbiter. This is a member without data, it only helps to have an uneven number of members which is necessary to always get a majority of votes when it comes to choosing the Primary member.

In the output you can see that we have 3 members: a Secondary on port 27001, an Arbiter on port 27002 and a Primary on port 27003. You can also see by the prompt that we are connected to the Primary.

We will now create a collection called ‘simple’ and insert some data. Also, the writeConcern phrase makes sure data is written to at least 2 members. If there are more members they will be ‘eventually consistent’, meaning that they will synchronize but not immediately.

db.simple.insert( { _id : 1 }, { writeConcern : { w : 2 } } )
db.simple.insert( { _id : 2 }, { writeConcern : { w : 2 } } )
db.simple.insert( { _id : 3 }, { writeConcern : { w : 2 } } )

Go to your secondary member and try to read the data. This involves giving your self permission to read from the secondary as I’ll show:

exit
mongo --port 27001
r:SECONDARY> db.simple.find()
Error: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
r:SECONDARY> rs.slaveOk()
r:SECONDARY> db.simple.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }

This looks okay. Not featured here: if I stop the Secondary, add data on the Primary and restart the Secondary, it synchronizes as expected. Just one thing: the writeConcern for 2 members can not be used since we only have 1 member.

Now it becomes interesting. I’ll stop the Secondary, write some data on the Primary, stop the Primary and start the Secondary. Would the data written whilst the Secondary was down still be visible? If not, would it be recoverable?

r:SECONDARY> exit
bye

Go to your first box and stop the Secondary with ^C.

Go to the mongoshell box and connect to port 27003, the Primary and add some more data:

mongo --port 27003
MongoDB shell version: 3.0.14
connecting to: 127.0.0.1:27003/test
r:PRIMARY> db.simple.insert( { _id : 4 } )
WriteResult({ "nInserted" : 1 })
r:PRIMARY> db.simple.insert( { _id : 5 } )
WriteResult({ "nInserted" : 1 })
r:PRIMARY> db.simple.insert( { _id : 6 } )
WriteResult({ "nInserted" : 1 })
r:PRIMARY> db.simple.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 4 }
{ "_id" : 5 }
{ "_id" : 6 }

r:PRIMARY> exit
bye

Now stop the primary in your 3rd box with ^C and restart the Secondary in your 1st box. Then go to the mongoshell box and connect to port 27001

mongo --port 27001
MongoDB shell version: 3.0.14
connecting to: 127.0.0.1:27001/test
r:PRIMARY> rs.status()
{
"set" : "r",
"date" : ISODate("2017-03-20T19:12:43.425Z"),
"myState" : 1,
"members" : [
{
"_id" : 1,
"name" : "localhost:27001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 25,
"optime" : Timestamp(1490035617, 1),
"optimeDate" : ISODate("2017-03-20T18:46:57Z"),
"electionTime" : Timestamp(1490037141, 1),
"electionDate" : ISODate("2017-03-20T19:12:21Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "localhost:27002",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 24,
"lastHeartbeat" : ISODate("2017-03-20T19:12:43.354Z"),
"lastHeartbeatRecv" : ISODate("2017-03-20T19:12:43.167Z"),
"pingMs" : 0,
"configVersion" : 1
},
{
"_id" : 3,
"name" : "localhost:27003",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2017-03-20T19:12:43.354Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"configVersion" : -1
}
],
"ok" : 1
}
r:PRIMARY> db.simple.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
r:PRIMARY>db.simple.insert( { _id : 7 } )
WriteResult({ "nInserted" : 1 })
r:PRIMARY> db.simple.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 7 }
r:PRIMARY>

So, member 1 now has become the Primary, but we hit data loss: it never had a chance to synchronize and they do not share any storage to read from.

What would happen if we restart the 3rd member? After all, that one does have the lost data stored, somewhere.

Start up the 3rd member (in the 3rd box)

In the output you will see it transitions to Secondary and it performs a rollback: the lost data is actually rolled back. And the good news: it is stored. Under its data directory \data\db1\r3 it created a directory called rollback which contains a .bson file. This file can be examend and/or imported in the database as I’ll show.

Go to the 4th box and exit mongoshell. Then:

cd \data\db1\r3\rollback
C:\data\db1\r3\rollback>bsondump test.simple.2017-03-20T19-32-31.0.bson
{"_id":4.0}
{"_id":5.0}
{"_id":6.0}
2017-03-20T20:45:06.412+0100 3 objects found
C:\data\db1\r3\rollback>mongorestore --port 27001 --db test --collection simple test.simple.2017-03-20T19-32-31.0.bson
2017-03-20T20:47:59.880+0100 checking for collection data in test.simple.2017-03-20T19-32-31.0.bson
2017-03-20T20:47:59.886+0100 restoring test.simple from file test.simple.2017-03-20T19-32-31.0.bson
2017-03-20T20:48:00.463+0100 no indexes to restore
2017-03-20T20:48:00.463+0100 finished restoring test.simple (3 documents)
2017-03-20T20:48:00.463+0100 done

C:\data\db1\r3\rollback>mongo --port 27001
MongoDB shell version: 3.0.14
connecting to: 127.0.0.1:27001/test
r:PRIMARY> db.simple.find()
{ "_id" : 1 }
{ "_id" : 2 }
{ "_id" : 3 }
{ "_id" : 7 }
{ "_id" : 4 }
{ "_id" : 5 }
{ "_id" : 6 }
r:PRIMARY>

Okay, Oracle would have done everything by itself. But at what cost? It needs to maintain redologs and archived redo logs. It only has 1 member to query, the Primary database. Yes, you can have a read only Standby database with Active Data Guard since 11G, but that’s a licensed option. It’s robust, nevertheless. I only want to say that the alternative is different but not all bad. Not at all.

;

;

Scaling out, also known as sharding

In the previous paragraph we covered HA and recoverablility. Now let’s have a look at scaling out, best compared to RAC.

RAC enables you to add CPU power and memory to a database. It also enables you to distribute different kinds of workloads over different machines, for instance reporting on one node and OLTP on another node. That distribution can be compared to smart using of replica sets explained above.

Adding CPU power and memory is something else. MongoDB heavily relies on memory to perform. And they made it very easy for you to add more nodes to your cluster. This is done by sharding.

Sharding can best be described as range based partitioning. Sharding is done on a per collection base. A shard cluster consists of 1 (!) or more nodes that automatically partitions a collection and distributes it evenly over all cluster members.

Let’s have a closer look.

First of all, each mongod needs to know it is part of a shard cluster. That is accomplished with a –shardsvr startup parameter. It is also very wise to explicitely declare the port number with the –port parameter. Finally it needs its own storage, the –dbpath parameter. Example:

mongod  --shardsvr --port 27020 --dbpath \data\db1\s1
mongod  --shardsvr --port 27021 --dbpath \data\db1\s2

Next we need a config server. This is also a mongodb, but instead of data, it has a special database that contains all information there is to know about the cluster. Especially which members are known and the relation between the members and the partitions, the shards in mongo language.

As of Mongo 3.4 config servers need to be in a replica set instead of standalone. For demonstration of develoment, it is allowed to have a set of only 1 member.

In a production environment you typically create 3 config servers, for now we’ll create just one:

mongod --configsvr --replSet c --dbpath \data\db1\conf --port 27019

Start the mongo shell in another command box so we can configure the relica set “c”:

rs.initiate(
{
_id: "c",
configsvr: true,
members: [
{ _id : 0, host : "localhost:27019" }
]
}
)

Finally we need at least one mongos which is a routing service and serves as the front end to which the users connect. The mongos has no persitant data, it reads the config server and distributes client requests over the shards.

It needs to know where to find the config server so we tell it with a parameter configReplSetName/hostname:port:

mongos --configdb c/localhost:27019

We can now open a mongo shell. It will by default connect to port 27017 and, lo and behold, a mongos automatically runs on port 27017. Since we are all running on the same host, connecting is very easy.

In the shell we will add shard servers to the cluster. Next we will enable sharding for a specific database.

mongo
mongos> sh.addShard( "localhost:27020")
mongos> sh.addShard( "localhost:27021")
mongos> sh.enableSharding("test")

The only thing we have done is enable sharding for a db. But nothing is harded yet. For that to happen we need to decide which collection(s) will be sharded and on what key. This key needs to have an index on the shard key. And then finally nothing needs to be done anymore.

So what did I learn?
Sure, you lack the robustnes of an RDBMS. Sure, you can not join and sure, therefor you store way more bytes then usual. But it’s fast, it’s easy and it serves many purposes. And last but not least, it takes some serious out of the box thinking for a DBA to actually appriciate this new world: you have to let go of some fundamental principles on which your world was based for the last ten, twenty or more years.

And finally a disclaimer: These examples have been over simplified. In the real world you’d use many hosts. You’d use 3 config servers, many mongos instances and off course a replicated shard cluster.
Apart from that, there are many ways to make the behaviour more sophisticated and robust. Chack out the official documentation, it’s quite good in my opinion and challenges you to many experiments.

2 Comments

  1. Tommy H December 29, 2017
    • Pom Bleeksma December 30, 2017