This is Part 4 of a 4 part series on content analysis and Tableau. Things will make more sense if you start with the first part.
Please note that this is a work in progress and I am more amateur than expert. I welcome questions, comments, and corrections.
Now that your database is ready, it’s time do start crunching numbers! In
this section I will ignore most of the fancy statistical analyses available to you; instead we’ll just
focus on making a simple visualization: how to display the frequency of a given
activity over time using a program called Tableau. Tableau is a professional data visualization tool that has an intuitive
user interface. It is also free for students, which is helpful.
Many people have helped me on the technical side of things. Robin Weiss at
University of Chicago helped visualize my coffeehouse data, putting me on this bumpy road of data visualization. Matt Francis (@Matt_Francis) made some very beautiful
visualizations of my Christmas data. Most importantly for this project,
Benjamin Young on the Tableau forums very helpfully walked me through a number of
methodological difficulties. I would not be here without their help.
First, you need to tell Tableau how to read your data. When you open up
Tableau, click on the big orange ‘connect to data’ button on the upper left.
Now that your data is loaded in, you should see a number of frames. To the
left, sheets shows a list of the
‘fields’ from your database. The top frame, where it says ‘Drag sheets here’,
is where your fields will go. The bottom frame will show you how your data
looks once it is loaded in. If you have ‘flat’ data, that is, a database with
only a single field, you will see only a single item in the sheets frame. If
you have a relational database, you will see a ton of different fields. Go
ahead and start dragging fields from the left frame into the top frame (if you have multiple fields). You can mess up. Tableau is very forgiving—if you make a mistake, you can
simply hit the UNDO keystroke combo of your choice, and your work will be
restored.
Here’s what mine project looks like after dragging two fields into the main frame:
Notice a few things here. ‘YEAR’ and ‘CODE’ are connected by a little venn
diagram symbol. This means that there is a join between them—in other words,
the sheet is reading the different fields as connected. Imagine this is the
‘year’ field:
Instance
|
Year
|
3
|
1741
|
4
|
1690
|
And here is the ‘Code’ field:
Instance
|
Code
|
3
|
DINE
|
4
|
NO
|
When you tell Tableau to join the two fields on ‘instance’, you combine the
two fields thusly:
Instance
|
Code
|
Year
|
3
|
DINE
|
1741
|
4
|
NO
|
1690
|
Another thing to notice is that these data are displayed in a spreadsheet in the bottom pane. This is a great place to check whether Tableau is reading your data the way you think Tableau should be reading your data.
Now, not every field is going to be connected with every other field. Try to load in a field that is not connected to your the fields you already have on screen. Uh oh!
Now, not every field is going to be connected with every other field. Try to load in a field that is not connected to your the fields you already have on screen. Uh oh!
You've made Tableau confused. It’s asking you what you want it to do. How do you want the fields to line up? What do you join on? You don’t actually want to join these two fields (probably) so just undo.
Now load in your data correctly. Here’s what mine looks like.
There’s one more thing to do before your data is ready to go. Remember
those little Venn diagrams connecting your different fields? Take another look
at them. The inner part is shaded. This means that the join is a ‘inside join.’
This means that you are only loading the records which appear in both fields. This is a little tricky to
understand, so let’s take an imaginary example of two fields. One shows the
colors of fruits, the others the colors of elements:
Fruit
|
Color
|
Apple
|
Green
|
Apple
|
Red
|
Banana
|
Yellow
|
And here’s another:
Color
|
Element
|
Red
|
Fire
|
Blue
|
Water
|
An inside join would produce this record:
Fruit
|
Color
|
Element
|
Apple
|
Red
|
Fire
|
Only the records which are present in both fields are pulled.
To load in all your data, you likely want to tell Tableau to make ‘left’
joins. This means that you want it to include all of the records in the
left-hand field, even if there is no corresponding record in the right-hand
field. Do this by clicking on the Venn diagram and selecting ‘left.’ If Tableau
is confused about which fields you want it to join by, just tell it manually by
selecting from the dropdown menu.
You will notice that the left-hand circle of your Venn diagram symbol is
now fully shaded.
Congratulations! Your data is now ready to go. Now for some fun. Click ‘go
to workbook.’ Here’s what you’ll see.
Let’s orient you to the Tableau interface. It looks simple, but there is a
TON going on here. The far left frame is partitioned into four fields. The
first is data—the data you are working with. The second are your dimensions.
Think of these are the data Tableau thinks looks vaguely X-axis-ish. Next is
‘Measures’ These are the data Tableau thinks looks Y-axis-ish. Now you can turn
a measure into a dimension and a dimension into a measure simply by dragging
one field into another. Be careful with this, though—check Tableau is actually
doing what you expect it to do. The fourth field you see is something you
probably don’t have—it’s called a parameter, and it acts as a kind of variable
you can fiddle with. We'll make some of those later.
To the right is big frame with a lot of different panes. The most important
for now are in the center. Take a look at the ‘drop field here’ bit and the
‘columns’ and ‘rows’ bit. Our job right now is to get our data here.
When you’re dealing with straightforward numeric data, this is really easy. Simply drag and drop. Try it
with some of your numeric data. Works beautifully! You can select the kind of
visualization you want by selecting the ‘show me’ tab in the upper right hand
corner. Tableau will show you the visualizations available for the data you’re
selecting. This is where playing around with Tableau really begins to feel fun.
Numeric data may work fine, but content analysis doesn’t count numbers, it
counts concepts, which are expressed in ‘strings.’ A string is what computer
science people call ‘text.’ Try to
display your ‘string’ data—that is, your codes and see what happens.
Chaos!
This is obviously not what you want to do. To make Tableau visualize our
data, we are going to need to do something a little bit more complicated than
drag and drop. DO NOT FEAR! This is relatively simple with help. And after you
do it once, you’ll be able to do it a thousand times without breaking a sweat.
First off, we want to make sure that Tableau understands that our ‘YEAR’
field is actually a date. Tableau wants the date to include the month and day
and my data only includes the year. We'll have to fix that. (You won't need to do this if your date information includes days and months.)
Right click on ‘Year’ and go to ‘Create Calculated Field.’ This tells
Tableau that you want to make a new field out of an old one.
Now enter in “date("01/01/" + str(Year))” where ‘Year’ is the
same name as whatever field contains your date information.
This tells Tableau this: return this date: 01/01/YEAR. Make an evocative
name for your date field. Like ‘date’. Anything that will help you remember
what it is you are calculating. Do not to call it ‘Calculated field 1’ because
then you’re going to be left with a dozen things called 'calculated fields' and you'll feel like a teenager with a messy room looking for the cereal bowl into which he threw his car keys last night. Right
click on this new calculated field to make sure it is being read as a ‘date’ not as anything else weird.
Now drag this new Date field into the ‘columns’ section. Click on the
chevron next to the name, and select ‘year’ from the dropdown menu. (Or
whatever unit you want to organize stuff by.) This tells Tableau to group the data by years, not by months or date.
Next we want to make a parameter which will allow us to select the code we
want to analyze by. Right click on CODE in the dimensions pane (or whatever called the field which stored the your codes) and select ‘create parameter.’ You
will now have an option to play around with which codes you want to display at
any given time. Name this parameter something evocative—‘Code Parameter’ would be nice. Right
click in the main window, and select the parameter you just created to make a new window where you can select parameters.
Now you have a drop-down menu from which you can select your chosen code!
Shiny!
Now we want to tell Tableau to count our string data. We’re going to create
a calculated field from our ‘codes’ field. Do this by right clicking it. Now
paste in this, replacing my nomenclature with yours:
countd(if [Code]=[Christmas Activity] then [Instance] end) / countd([Instance])
So replace [Code] with the place where you stored your codes; Christmas Activity with the parameter where you can choose codes; and instance with the unique identifier you gave to every observation.
This tells Tableau this: count the distinct number of instances where the field code is equal to the Christmas Activity I’ve selected from my dropdown menu. Then divide this by the total number of distinct entries. (Giving us the particular Christmas activity expressed as a percentage of total activities for each year.) Name this calculated field something descriptive—mine is called ‘% of diaries mentioning activity.’ Now drag it to ‘rows.’
This tells Tableau this: count the distinct number of instances where the field code is equal to the Christmas Activity I’ve selected from my dropdown menu. Then divide this by the total number of distinct entries. (Giving us the particular Christmas activity expressed as a percentage of total activities for each year.) Name this calculated field something descriptive—mine is called ‘% of diaries mentioning activity.’ Now drag it to ‘rows.’
You can now play around with the code you’re looking at my selecting it
from your drop-down menu. Don’t you feel like some kind of data super hero? Ready to solve crimes and correct injustice?
If you notice, my graph is really noisy. To deal with this noise, I
smoothed my data by creating a moving average. Right click on the ‘codes’
bubble just below ‘DATE’ and select ‘create table calculation’ and then select
‘moving calculation’ and ‘average’ and choose how many years you want this
moving average to include.
Now your data should be smoother.
Beautiful!
Try to cut your data by some of your demographics. In my example, I want to
look at my data by gender. This is incredibly easy in Tableau. I find my gender dimension,
and I simply drag it onto the graph. Now the graph is split out by gender. You can even play with the colors Tableau chooses for each gender.
And now you’re done! If you’ve followed all these steps, you too have your
very own content analysis project and you’re in a good enough place where you
can play around with Tableau to your heart's content.
That’s it! Get analyzing! And get writing!
1 comment:
If there is anyone who wishes to want to learn how to visualize data then he must take a look at the best online tableau training courses
Post a Comment