Lab 07: Qualitative Data Analysis
Learning Goals
- Learn how to collaboratively perform a thematic analysis on interview transcripts.
- Understand and implement steps for open/axial coding, codebook development, and inter-coder reliability testing.
- Gain practical experience using coding as a tool for identifying patterns in human-robot interaction (HRI) research data.
Working in Groups
For this lab, you will work in groups of ~3 students. Each group will turn in ONE set of deliverables.
Lab 7 Deliverables & Submission
Lab 7 introduces the basics of thematic analysis, adapted from Richards & Hemphill (2018), one kind of qualitative data analysis. You will work in groups to analyze anonymized interview transcripts (lab_07_interview_data
from this Google Drive link), identify recurring themes, develop and refine a shared codebook, and apply that codebook to additional data. You will calculate inter-coder reliability and also write up your results as if you were writing a research paper.
You will be asked to submit:
- The Lab 7 Qualitative Data Analysis Worksheet - the main worksheet for this lab where you will report:
- Your codebook
- Your data analysis approach + inter-rater reliability score
- A writeup of the results of your thematic analysis, similar to what we might see in the "Results" section of an HRI research paper
- A Spreadsheet of Thematic Analysis/Coding on Each Anonymized Interview Transcripts – include themes coded for each transcript
To receive credit for this lab, one of the members of your group will need to submit your completed quantitative data analysis worksheet to Canvas by Friday, May 9 at 6:00pm.
Lab 7 HRI Study, Data, and Your Goal for this Lab
During this lab, you'll be analyzing interview data from the same HRI study we examined during Lab 6. In case it's helpful, here's the study overview again, so you can remind yourself about the study hypotheses, methods, and measures.
To access the data file for Lab 7 lab_07_interview_data
, you can download it using this Google Drive link.
Your goal for this lab is conduct a thematic analysis on the interview transcripts and report your data analysis methods and findings. The outcome of this lab will be a written report that resembles a qualitative "Results" section of the papers we've read in class.
Steps for Thematic Analysis
Phase One: Preparing for the Analysis
This phase involves understanding the context and goals of your analysis. Your goals for this analysis include:
- Collecting evidence for or against the experiment hypotheses (for review, you can look over the study overview again)
- Understanding participants' perceptions of the overall experience
- Understanding participants' perceptions of the robot
You are not required to pursue all three of these goals, these are meant to serve as starting points and guidance for your analysis.
Phase Two: Open and Axial Coding
- Step 1: Each team member reads the first 6 transcripts (representing one pair of participants from each condition) and identifies initial themes and subthemes. Some example themes and subthemes could be:
- Example theme: overall opinion of the robot
- Example subtheme: positive
- Example subtheme: negative
- Example subtheme: neutral
- Example theme: perspective on collaborating with the other human participant
- Example subtheme: other participant was too dominant
- Example subtheme: equal and enjoyable collaboration
- Example subtheme: other participant was disengaged
- Step 2: As a team, discuss and refine your themes iteratively. For the purposes of this lab, please select 2 themes to code, each of which can have 2-4 subthemes. Your group should review at least 12 transcripts (2 pairs of participants from each of the 3 conditions: positive-positive - PP, negative-negative - NN, positive-negative - PN) to refine your themes. Richards & Hemphill (2018) recommend reviewing around 30% of the data during this phase for a full-fledged thematic analysis.
Phase Three: Preliminary Codebook
- Step 1: Create a preliminary codebook based on the discussion above (put your codebook in your Lab 7 Qualitative Data Analysis Worksheet).
- Step 2: The team reviews the draft together. [Optional] You may invite an external researcher familiar with the study (but not part of the coding process) to review it as well.
Phase Four: Pilot Testing the Codebook
- Step 1: All team members independently code the same 2–3 new transcripts using the codebook (e.g., in separate tabs of a Google spreadsheet).
- Step 2: Discuss discrepancies and revise the codebook accordingly until your team is confident in it.
Phase Five: Final Coding and Inter-Coder Reliability
- Step 1: Perform full coding on the dataset using either consensus coding or split coding. For HRI research, we typically use split coding. This means that all team members code an overlap set (typically consisting of ~10% of the data). After a sufficient inter-rater reliability is achieved on the overlap set (see Step 2), the team members split up the rest of the data, where only one team member codes each of the remaining transcripts.
- Step 2: Calculate inter-coder reliability on the transcripts coded by all team members, using the appropriate metric below:
Choosing a Inter-Coder Reliability Metric:
- 2 coders, mutually exclusive themes: Use
Cohen’s Kappa
- 2+ coders, mutually exclusive themes: Use
Krippendorff’s Alpha
or Fleiss’ Kappa
- 2 coders, non-mutually exclusive themes: Use
Cohen’s Kappa
per theme (binary coding)
- 2+ coders, non-mutually exclusive themes: Use
Krippendorff’s Alpha
(nominal)
You can download cohen_kappa.py
, krippendorff_alpha.py
, and fleiss_kappa.py
from the Lab 7 GitHub repository. You may need to install dependencies via pip install scikit-learn statsmodels krippendorff
.
Phase Six: Interpreting and Writing Results
This phase involves drawing conclusions based on your thematic analysis and writing up your results. Rather than prescribing how to write these sections, we recommend that you review good examples of thematic analyses from published HRI papers and emulate the best of what you see from them:
Tips & Resources
- Use color coding or margin notes while reviewing transcripts to help identify themes.
- When building the codebook, be specific about the criteria for each theme.
- Consider using a spreadsheet to compare coded transcripts side-by-side.
- Python scripts to calculate inter-rater reliability will be provided.
- Refer to the original article for examples and guidance:
Richards & Hemphill (2018)
.
Extra Challenge
- Compare how themes shift across different experimental conditions by visualization.