Machine Learning & Big Data Blog

MongoDB Overview: Getting Started with MongoDB

4 minute read
Walker Rowe

Here we provide an overview of the MongoDB database. In subsequent posts we will give more in depth examples of how to use MongoDB.

First, MongoDB is a noSQL big data database. It fits the definition of big data, because it scales (i.e., can be made larger) simply by adding more servers to a distributed system. And it is does not require any schema, like an RDBMS database, such as Oracle.

MongoDB data records are stored in JSON (JavaScript Object Notation) format, which is self-describing, meaning the metadata (i.e., schema) is stored together with the data.

(This article is part of our MongoDB Guide. Use the right-hand menu to navigate.)

Command Line Shell

Mongo has an interactive command shell. JavaScript programmers will love this because the syntax is JavaScript. To open the shell you simply type:

mongo

Concepts

MongoDB records are called documents. Each MongoDB database (You can have many.) includes collections, which are a set of JSON documents. Each collection and document has an ObjectID created by MongoDB or supplied by the programmer.

To illustrate, suppose we have one database called products.

We could have two collections to contain all products grouping them by where they are sold:

Collection Documents and fields
Products sold in Europe EAN (European barcode), weight in kilos
Products sold in the USA UPC (American barcode), weight in pounds

Data storage is cheap and memory and CPU costs more. So, some big data databases, like Cassandra and MongoDB, throw out the idea of a normalized database, which is one of the key principles of an RDBMS database.

For example, with Oracle you would have a product category in a product record. The product category table contains fields common to all of those products. Each product record points to a product category record, so that such common data is not stored more than once:

Product table
Fields: product number, product category

 

Product Category table
Fields: product category, weight, color

But then you have to do a join operation if you want to know the color or weight of a product. But a join is a computationally expensive operation. That takes time. MongoDB would store the data like this:

Product documents
Fields: product number, product category, weight, color

RDBMS programmers say that creates duplication and wastes space. MongoDB programmers would say “yes,” but speed is more important than storage.

In other words, MongoDB records might look like this:

Product 1 Category boy’s diapers small Color blue
Product 2 Category boy’s diapers large Color blue
Product 3 Category girl’s diapers small Color pink

Obviously. When you know the category you know the color.

We will illustrate that by creating the products database and adding some products there. Paste these commands into the MongoDB shell.

First create the products database.

use products
switched to db products

Then these two collections:

> db.createCollection("boyDiapers")
{ "ok" : 1 }
> db.createCollection("girlDiapers")
{ "ok" : 1 }
>

Then add some data:

db.boyDiapers.insert([
{
size: 1,
color: 'blue',
brand: 'toddler tyke',
}
])
db.girlDiapers.insert([
{
size: 1,
color: 'pink',
brand: 'little angel',
}
])

Notice two things. First, we use the format db.(collection).insert to add the document. Second, we use the brackets [], which indicates an array, do that we can add more than one document at a time.

Now create some more data so that we can query for data:

db.boyDiapers.insert([
{
size: 2,
color: 'white',
brand: 'boy large white'
}
])
db.girlDiapers.insert([
{
size: 2,
color: 'while',
brand: 'girl large'
}
])

Selecting Data

If you use find with no arguments it lists all documents. Use pretty to display the results in easy-to-read indented JSON format:

> db.girlDiapers.find().pretty()
{
"_id" : ObjectId("59d1e9d5ccf50b62c5a7af55"),
"size" : 1,
"color" : "pink",
"brand" : "little angel"
}
{
"_id" : ObjectId("59d1f022ccf50b62c5a7af57"),
"size" : 1,
"color" : "while",
"brand" : "girl large"
}
{
"_id" : ObjectId("59d1f565ccf50b62c5a7af59"),
"size" : 2,
"color" : "while",
"brand" : "girl large"
}

Find all girl diapers of size 2 add arguments to the find statement:

db.girlDiapers.find({"size":2})
{ "_id" : ObjectId("59d1f565ccf50b62c5a7af59"), "size" : 2, "color" : "while", "brand" : "girl large" }

Now, you could not search both boy’s and girl’s diapers collections at the same time. MongoDB does not do that. Instead you have to program that in your application that you would code using some driver (See below).

Normalized Documents

We just said that in MongoDB there is no normalization because storage is cheap and computational power expensive. But you can create normalize documents.

For example we can create a sales record for each size 2 girl large document like this with the diaper field pointing to the diaper object. That might make more sense in this case as you would not want the diaper collection to grow many times larger each time you make a sale.

db.girlDiapers.insert([
{ "diaper" : ObjectId("59d1f565ccf50b62c5a7af59"),
"price" : 45.2,
"quanity" : 10,
"sku" : "case"
}
])

MongoDB Drivers

Of course, you probably would not use the command line shell for an application. Instead you would write a program to interact with MongoDB using any of the many drivers available. There are drivers for C++, C#, Java, Node.JS, Scala, Python, and more.

For example, to use Python:

sudo pip install pymongo

Then to query for size 2 diapers across the boy and girl collections:

from pymongo import MongoClient
client = MongoClient()
db = client.products
x=db.collection_names()
for i in range(len(x)):
c=x[i]
d = db.get_collection(c)
for e in d.find({"size": 2}):
print(e)

Outputs:

{'size': 2.0, 'brand': 'boy large white', 'color': 'white', '_id': ObjectId('59d1f564ccf50b62c5a7af58')}
{'size': 2.0, 'brand': 'girl large', 'color': 'while', '_id': ObjectId('59d1f565ccf50b62c5a7af59')}

In the next post we will get into some more advanced MongoDB topics.

Free e-book: The Beginner’s Guide to MongoDB

MongoDB is the most popular NoSQL database today and with good reason. This e-book is a general overview of MongoDB, providing a basic understanding of the database.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.