Plotting

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I plot my data?

  • How can I save my plot for publishing?

Objectives
  • Create a time series plot showing a single data set.

  • Create a scatter plot showing relationship between two data sets.

matplotlib is the most widely used scientific plotting library in Python.

%matplotlib inline
import matplotlib.pyplot as plt

rc('text', usetex=False)
plt.style.use('ggplot')
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

plt.plot(time, position)
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)')

Simple Position-Time Plot

Plot data directly from a Pandas dataframe.

import pandas

data_all = pandas.read_csv('data/jarvis_all.csv', index_col='formula')
eps = data_all.loc['Si', 'epsx':'epsz']
eps.plot(kind="bar");

GDP plot for Australia

Select and transform data, then plot it.

eps.T.plot(kind="bar")
plt.ylabel('epsX value');

GDP plot for Australia and New Zealand

Data can also be plotted by calling the matplotlib plot function directly.

Get formation energy data from dataframe

form_emp = data_all.form_enp.sort_values()
plt.plot(form_emp.values, 'g--')

GDP formatted plot for Australia

Can plot many sets of data together.

data_sort = data_all.sort_values('op_gap')

plt.plot(data_sort.op_gap.values, 'g-', lw=2, label=r'op_gap')
plt.plot(data_sort.mbj_gap.values, 'b.', ms=1, label=r'mbj_gap')
plt.legend(loc='upper left')
plt.xlabel('Sorted Index')
plt.ylabel('Band gap (eV)');

GDP formatted plot for Australia and New Zealand

lim = [0, 15]
plt.scatter(data_all.mbj_gap, data_all.op_gap, s=1.5, c='g')
plt.xlim(lim)
plt.ylim(lim)
plt.plot(lim, lim, 'k--')
plt.axes().set_aspect('equal')
plt.xlabel('mbj (eV)')
plt.ylabel('op (eV)');

GDP correlation using plt.scatter

Bandgap versus Bulk Modulus

Make a scatter plot of the bandgap versus the bulk modulus (either op_gap or mbj_gap versus either kv or gv). Color the points based on whether the material is metal (bandgap < 0.05), semiconductor (0.05 < bandgap <= 3) or a insulator (bandgap > 3). Use dataframe.plot with the arguments kind='scatter', s=1.5 and color=dataframe['color']. Create an additional column in the dataframe for the color. The s argument controls the size of the points in the scatter plot. The color argument colors the points based on string values in a series or dataframe column (e.g. r for red or g for green).

Solution

df = pandas.read_csv('data/jarvis_all.csv')

df.loc[:, 'color'] = 'r'
df.loc[df.op_gap < 3, 'color'] = 'g'
df.loc[df.op_gap < .05, 'color'] = 'b'

df.plot('op_gap', 'epsx', kind='scatter', s=1.5, color=df['color'])
plt.xlim([-0.2, 6]);

Minima Maxima Solution

Saving your plot to a file

If you are satisfied with the plot you see you may want to save it to a file, perhaps to include it in a publication. There is a function in the matplotlib.pyplot module that accomplishes this: savefig. Calling this function, e.g. with

plt.savefig('my_figure.png')

will save the current figure to the file my_figure.png. The file format will automatically be deduced from the file name extension (other formats are pdf, ps, eps and svg).

Note that functions in plt refer to a global figure variable and after a figure has been displayed to the screen (e.g. with plt.show) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you call plt.savefig before the plot is displayed to the screen, otherwise you may find a file with an empty plot.

When using dataframes, data is often generated and plotted to screen in one line, and plt.savefig seems not to be a possible approach. One possibility to save the figure to file is then to

  • save a reference to the current figure in a local variable (with plt.gcf)
  • call the savefig class method from that varible.
fig = plt.gcf() # get current figure
data.plot(kind='bar')
fig.savefig('my_figure.png')

Key Points

  • matplotlib is the most widely used scientific plotting library in Python.

  • Plot data directly from a Pandas dataframe.

  • Select and transform data, then plot it.

  • Many styles of plot are available: see the Python Graph Gallery for more options.

  • Can plot many sets of data together.