Machine Learning & Big Data Blog

Matplotlib Logarithmic Scale

4 minute read
Walker Rowe

In this article, we’ll explain how to use the logarithmic scale in Matplotlib.

The logarithmic scale is useful for plotting data that includes very small numbers and very large numbers because the scale plots the data so you can see all the numbers easily, without the small numbers squeezed too closely.

(This article is part of our Data Visualization Guide. Use the right-hand menu to navigate.)

Logarithms

First, let’s review a little high school math. A logarithm is a way to make a large number appear small by looking at it as a power of 10. There are other logarithm bases besides 10, like the natural logarithm used in mathematics, which is given by the constant e=2.718…. But, for our purposes, we will use base 10 logarithms.

In short:

log10x = y means 10 raised to power y equals x, i.e., 10 ** y = x. So log10100=2 because 10**2 = 100.

The logarithmic scale in Matplotlib

A two-dimensional chart in Matplotlib has a yscale and xscale. The scale means the graduations or tick marks along an axis. They can be any of:

  • matplotlib.scale.LinearScale—These are just numbers, like 1, 2, 3.
  • matplotlib.scale.LogScale—These are powers of 10. You could use any base, like 2 or the natural logarithm value, which is given by the number e. Using different bases would narrow or widen the spacing of the plotted elements, making visibility easier.
  • matplotlib.scale.SymmetricalLogScale and matplotlib.scale.LogitScale—These are used for numbers less than 1, in particular very small numbers whose logarithms are very large negative numbers.

Using the logarithmic scale

Let’s plot the revenue of some big companies and some small ones.

Amazon, Alphabet (Google), and Intel are many times larger than the small companies Pete’s, Clock, and Buckey’s BBQ (that we made up). The difference between them relative to Amazon is enormous, so the plot of each smaller company lies on the same vertical line 0 on the linear scale.

Matplotlib picks the scale for the axes if you do not set it explicitly. Here we did not. The little le11 notation at the bottom means that the xaxis is in scientific notation, which in this case means revenue is shown a multiple of 10**11 ($100 billion). Amazon’s revenue of $232,887,000,000 is 2.325*(10**11). While Pete’s $600,000, as a power of 11, is 0.000006*(10**11). This explains why even Pepsi, a large company, is close to 0 on the chart below as well.

So, we fix that issue in the next graph by using the logarithmic scale. See below for an explanation of the code.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = { 
"Pete's":           600000,
"Clock":           1600000,
"Buckey's BBQ":    2600000,
"Pepsi":        6466000000,
"Intel" :      70848000000,	
"Alphabet":    136819000000,
"Amazon":      232887000000	
}
df = pd.DataFrame.from_dict(data,orient='index',columns=['Revenue'])
dg =pd.to_numeric(df['Revenue'])
dc = pd.Series(dg.index.values.tolist()).to_frame('Company') 
dat = df.assign(Company=dc.values)
data = dat.sort_values(by=['Revenue'])
plt.scatter(data['Revenue'],data['Company'])
plt.grid()
plt.show()

Here, everything is the same except we included plt.xscale(“log”). Now we can more easily see the values since they are powers of 10. So, Pete’s small $600,000 revenue is easy to see as it 6*(10**6). And Amazon’s is roughly 100,000 (10**5) times as large.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data = { 
"Pete's":           600000,
"Clock":           1600000,
"Buckey's BBQ":    2600000,
"Pepsi":        6466000000,
"Intel" :      70848000000,	
"Alphabet":    136819000000,
"Amazon":      232887000000	
}
df = pd.DataFrame.from_dict(data,orient='index',columns=['Revenue'])
dg =pd.to_numeric(df['Revenue'])
dc = pd.Series(dg.index.values.tolist()).to_frame('Company') 
dat = df.assign(Company=dc.values)
data = dat.sort_values(by=['Revenue'])
plt.scatter(data['Revenue'],data['Company'])
plt.grid()
plt.xscale("log")
plt.show()

Now you can more clearly see each company’s size by its revenue

Explaining the code

Some of this code may be difficult to understand without a familiarity with data science because it uses Pandas and NumPy. This is ironic, as Pandas was created particularly to make working with table-type data easier.

The first step takes the data we have created as a dictionary and converts it to a Pandas dataframe. The index for this data will be the company name. We said orient=’index’ that means take the first entry as the index value. Then we give it a column name with columns=[‘Revenue’]. The company name has no column name because it’s not a column; it’s a row index.

df = pd.DataFrame.from_dict(data,orient='index',columns=['Revenue'])

Next, we must convert the revenue strings to numbers. Otherwise they will be sorted by matplotlib as letters.

dg =pd.to_numeric(df['Revenue'])

Now we want to create a series, which is a dataframe with only one column. The values will be the index of the previous dataframe. That’s how we get the companies listed as a column of data, named Company.

dc = pd.Series(dg.index.values.tolist()).to_frame('Company')

Next, we add the values of the series we just created as another column in the dat dataframe.

dat = df.assign(Company=dc.values)

Then we plot the scatter chart giving it dataframes for the x and y values. Each has the same shape, (7,), which you can check with data[‘Revenue’].shape. Shape is a sometimes difficult NumPy concept. It basically means the dimension of the array. In an xy plot they must be the same.

plt.scatter(data['Revenue'],data['Company'])

The rest is self-explanatory:

plt.grid()
plt.xscale("log")
plt.show()

Learn ML with our free downloadable guide

This e-book teaches machine learning in the simplest way possible. This book is for managers, programmers, directors – and anyone else who wants to learn machine learning. We start with very basic stats and algebra and build upon that.


These postings are my own and do not necessarily represent BMC's position, strategies, or opinion.

See an error or have a suggestion? Please let us know by emailing blogs@bmc.com.

Business, Faster than Humanly Possible

BMC empowers 86% of the Forbes Global 50 to accelerate business value faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world’s largest organizations so they can seize a competitive advantage.
Learn more about BMC ›

About the author

Walker Rowe

Walker Rowe is an American freelancer tech writer and programmer living in Cyprus. He writes tutorials on analytics and big data and specializes in documenting SDKs and APIs. He is the founder of the Hypatia Academy Cyprus, an online school to teach secondary school children programming. You can find Walker here and here.