Data Sandbox
Data Sandbox Playing is the most natural way to learn. These are the artifacts in my sandbox playground.

Marvel Cinematic Data Visualization With Plot.ly

I’m a huge sucker for Marvel cinematic and in this article I will do a fun exercise with building a simple interactive 3D network graph based on the relationship between Marvel characters. I will be using one of my favourite plotting libraries in Python, Plot.ly. Plot.ly is very easy to use and the way graphs are constructed is very intuitive. The dataset can be found on my GitHub or at the following link: https://www.kaggle.com/csanhueza/the-marvel-universe-social-network

Setting up and exploring the data

import plotly.plotly as py
from plotly.graph_objs import *
import plotly.offline as offline
import pandas as pd

Key thing to remember with Plot.ly is if you want to build graphs locally on you computer using Jupyter notebooks, you need to initiate offline notebook mode

offline.init_notebook_mode()

The hero-network.csv dataset contains two columns, hero1 and hero2 and represents a connection between the two characters.

heros = pd.read_csv('hero-network.csv')
heros.describe()
hero1 hero2
count 574467 574467
unique 6211 6173
top CAPTAIN AMERICA CAPTAIN AMERICA
freq 8149 8350

Some quick data cleaning to remove empty spaces

for i in range(0,2):
    heros[heros.columns[i]] = heros[heros.columns[i]].map(lambda x: x.rstrip())

If you are playing around with the same code and want to explore additional characters, you can use the following line to explore what type of characters are included in the dataset

#heros[heros['hero1'].str.contains('DAREDEVIL')]['hero1'].unique()

For this exercise, I’m interested in looking at the connection between some of the marvel characters and villains that we’ve see in theatres!

avengers_name = ['HULK/DR. ROBERT BRUC', 'BLACK WIDOW/NATASHA', 'CAPTAIN AMERICA', 'IRON MAN/TONY STARK', 
                 'WAR MACHINE II/PARNE', 'HAWKEYE | MUTANT X-V', 'FALCON/SAM WILSON', 
                 'SCARLET WITCH/WANDA', 'VISION', 'ANT-MAN II/SCOTT HAR', 'SPIDER-MAN/PETER PAR', 
                 "BLACK PANTHER/T'CHAL", 'DR. STRANGE/STEPHEN', 'THOR IV/DARGO', 'FURY, COL. NICHOLAS', 
                 'QUICKSILVER/PIETRO M'
                ]

villain_name = [ 'BUCKY/BUCKY BARNES', 'MALEKITH/MALCOLM KEI', 'THANOS', 'ULTRON',
                 'LOKI [ASGARDIAN]', 'BARON MORDO/KARL MOR', 'DORMAMMU', 'RED SKULL/JOHANN SCH']

all = []
all.extend(avengers_name)
all.extend(villain_name)

First lets just explore The Avenger’s social circle

We do a quick count on how many relationships each avengers have and how many comic books they appeared in

avenger_info = {'hero': [], 'buddies': []}
for i in avengers_name:
    avenger_info['hero'].append(i)
    avenger_info['buddies'].append(len(heros[heros['hero1']==i]))
avenger_df = pd.DataFrame(avenger_info, columns = ['hero', 'buddies', 'comics'])

Building a grouped bar chart in Plotly is very simple. Each set of bar is treated as seperate data, we define the x and y values and the aesthetics for each group of bars. Then we define the layout like axes, chart title etc. Lastly we pass data and layout into Plot.ly’s figure function to build the graph

buddies = Bar(
    x = avenger_df['hero'],
    y = avenger_df['buddies'],
    name = 'buddies',
    marker=dict(
        color='rgba(234, 35, 40, 0.7)',
        line=dict(
            color='rgba(234, 35, 40, 1.0)',
            width=1
        )
    )
)

data = [buddies]
layout = Layout(
    barmode = 'group',
    bargroupgap=0.1,
    title =  'Avengers - Buddies'
)

fig = Figure(data = data, layout = layout)
offline.iplot(fig)

We can see our fellow Captain Steve Rogers is quite popular along with friendly neighborhood Spiderman and Mr. Tony Stark. One weird data point here is Thor who in my mind is should be quite popular… This might be because there are several Thor characters in the dataset, representing different Thors from different universes, as well as the comic book universe being different from cinematics.

Now lets build the network graph for our Avengers and villains

First off, there are a whole lot of characters in our dataset, lets just keep our avengers and villains

def keep_avengers(x, characters):
    if ((x['hero1'] in characters) and (x['hero2'] in characters)):
        return 1
    else:
        return 0
heros['keep'] = heros.apply(lambda x: keep_avengers(x, all), axis=1)
heros = heros[heros['keep']==1].drop('keep', axis=1).reset_index().drop('index', axis=1)

Next we use the igraph libary which is a library for high-performance graph generation and analysis. For more information on installation, visit http://igraph.org/python/

import igraph as ig

One of the requirement to build this network graph is to express the source and destination nodes as integer values. So we start off with encoding our heros into numbers

heros.head(3)
hero1 hero2
0 SPIDER-MAN/PETER PAR HULK/DR. ROBERT BRUC
1 QUICKSILVER/PIETRO M SCARLET WITCH/WANDA
2 QUICKSILVER/PIETRO M IRON MAN/TONY STARK
Edges = []
mapper = {}
character_group = []

unique_heros = pd.concat([heros['hero1'], heros['hero2']]).unique()
num_unique_heros = len(unique_heros)

for i in range(num_unique_heros):
    mapper[unique_heros[i]] = i
heros['hero1_node'] = heros['hero1'].map(lambda x: mapper[x])
heros['hero2_node'] = heros['hero2'].map(lambda x: mapper[x])
heros.head(3)
hero1 hero2 hero1_node hero2_node
0 SPIDER-MAN/PETER PAR HULK/DR. ROBERT BRUC 0 3
1 QUICKSILVER/PIETRO M SCARLET WITCH/WANDA 1 4
2 QUICKSILVER/PIETRO M IRON MAN/TONY STARK 1 2

Next I’d like to seperate our good guys from the bad guys visually, so let’s group them up.

for j in unique_heros:
    character_group.append(1) if (j in avengers_name) else character_group.append(0)

Now were ready to build our nodes and links. We start off with links by putting our (source, destination) for every node into Edges and pass Edges into our igraph. The layout function here defines the overall structure of our network graph and we use ‘sphere’. There are other options like

  • gfr, grid_fr, grid_fruchterman_reingold: grid-based Fruchterman-Reingold layout

  • kk, kamada_kawai: Kamada-Kawai layout

  • kk_3d, kk3d, kamada_kawai_3d: 3D Kamada-Kawai layout

Play around with some of these to see the different structures.

for i in range(len(heros)):
    Edges.append((heros['hero1_node'][i], heros['hero2_node'][i]))
G = ig.Graph(Edges, directed=False)
layt=G.layout('sphere', dim=3)

This part looks a little intimidating and complicated but its not so bad. Layout function helps us define the layout of the network graph, what we are doing here is populating our X, Y, Z coordinates for each node and edge to be placed into our 3D space

Xn=[layt[k][0] for k in range(num_unique_heros)]# x-coordinates of nodes
Yn=[layt[k][1] for k in range(num_unique_heros)]# y-coordinates
Zn=[layt[k][2] for k in range(num_unique_heros)]# z-coordinates
Xe=[]
Ye=[]
Ze=[]
for e in Edges:
    Xe+=[layt[e[0]][0],layt[e[1]][0], None]# x-coordinates of edge ends
    Ye+=[layt[e[0]][1],layt[e[1]][1], None]
    Ze+=[layt[e[0]][2],layt[e[1]][2], None]

Same as how we built the bar chart, we define our data & layout and pass into the Figure function. Instead of bar chart, we are using Scatter3d for our data

trace1=Scatter3d(x=Xe,
               y=Ye,
               z=Ze,
               mode='lines',
               line=Line(color='rgb(125,125,125)', width=1),
               hoverinfo='none'         
            )

trace2=Scatter3d(x=Xn,
               y=Yn,
               z=Zn,
               mode='markers',
               name='actors',
               marker=Marker(symbol='dot',
                             size=13,
                             color=character_group,
                             colorscale=[
                                [0, 'black'],
                                [1, 'red']],
                             line=Line(color='rgb(50,50,50)', width=0.5),
                             opacity = 0.8
                             ),
               text=unique_heros,
               hoverinfo='text'  
            )
axis=dict(showbackground=False,
          showline=False,
          zeroline=False,
          showgrid=False,
          showticklabels=False,
          title='',
          showspikes = False
          )
layout = Layout(
         title="Marvel Cinematic - Social Network",
         width=1000,
         height=700,
         showlegend=False,
         scene=Scene(
         xaxis=XAxis(axis),
         yaxis=YAxis(axis),
         zaxis=ZAxis(axis)
        ),
        hovermode='closest'   
)
data=Data([trace1, trace2])
fig=Figure(data=data, layout=layout)
offline.iplot(fig)

comments powered by Disqus