Posts

Hey, I’m Conor. I’m a data scientist and writer living and working in New York.


Practical Psychology for Data Scientists

You aren’t as logical as you think. None of us are. We are susceptible to cognitive biases each and every day. If you have had the pleasure of reading Thinking, Fast and Slow by Daniel Kahneman, then you are more than familiar with this reality. We are imperfect creatures and the world is an imperfect place.

Given the experimental and research-oriented nature of data science, we are more susceptible to these biases than most. We represent the medium between data and insight. We translate strings, ints, and floats into actionable advice, models, and visualizations. 

This role comes with great responsibility. When our thinking is hijacked by cognitive bias, bad things happen. Ill-informed decisions get made and incorrect information is spread. In this post, we’ll learn to avoid these scenarios by addressing common biases, where they might come up, and how to recognize them.

Confirmation Bias

Arguably the poster child for cognitive biases, confirmation bias is the idea that we favor things that confirm our existing beliefs. This is for good reason. Confirmation bias is everywhere. We want to be right. Hell, it’s uncomfortable being wrong. Can you really blame our minds for trying to trick us from time to time?

This is especially dangerous for data scientists. Consider the last time you performed some sort of analysis. If the project was of any interest to you, then you probably went into it with some hypothesis, idea, or favorable outcome that you thought would happen. This is where bias comes into play. 

You are more likely to interpret your findings as supporting evidence for your prior beliefs. We can’t completely avoid this, but if you are aware of it then you can realize when confirmation bias is likely, and calibrate your thoughts accordingly.

Advice: Write down any hypotheses, ideas, and beliefs prior to conducting your analysis. Afterward, revisit this note and double-check your insights to make sure they are sound.

Narrative Fallacy

We love a good story. At least Tyrion thought so. Data science and analysis is no different. Narrative fallacy is the idea that we want to connect the dots to weave a story, even when there isn’t one to tell. We desperately want reality to fit our model of the world, but it often doesn’t. 

Let’s think about an example. You’re a data scientist for a consumer internet company and the number of daily active users grew, but the average time spent on the product went down. How would you interpret this? 

One might report this with a story along the lines of “we brought in low-intent users that engaged less with the product.” This seems reasonable, but do we know this for sure? Without more research to validate this claim, we are simply connecting the dots in a way that seems most plausible. This isn’t always a bad thing; stories are great tools to convey insights, but they should be used wisely and intentionally.

Advice: Is the signal that you’re seeing sufficiently strong to suggest that it isn’t due to noise? If so, you might already have a causal story developed. That’s okay, but take a second to list out other possible causes based on the data that you have. Realize that your story is one of many possible interpretations.

Anchoring Effect

In short, the anchoring effect states that the first thing you judge influences your judgment of what follows. This is why you are offered a more expensive product first when buying something. You anchor to that expensive price point, so when you see something cheaper, it doesn’t seem so bad.

You can see how this would affect data scientists when we have to make comparisons. If you are looking at the impact of some new independent features, how you interpret the first feature will impact how you think about the second feature. 

This is important to keep in mind when you are comparing two groups or using things as a benchmark. 

Advice: Set your anchor beforehand. If you think that an impactful lift for either feature would be ~2% then write that down before your analysis. This allows you to come back to your unbiased benchmark after your judgement has be altered.

Availability Bias

Our thinking is largely based on our past experience. This isn’t groundbreaking information, but availability bias takes it a step further. Availability bias says that we don’t just lean on our experiences to make decisions, we also favor the experiences that come to mind more easily. 

This is apparent when we are developing hypotheses. If you recently read a paper about deep learning techniques, you’ll be more inclined to go that route when brainstorming for your next model. If you did an important project in one domain, then you’ll interpret the next project in that context.

The first thing you think of will be the one that comes to mind easiest. This doesn’t necessarily mean it’s right.

Advice: Utilize your network when making decisions and brainstorming hypotheses. Allow others to bring different perspectives. Bonus points if they are on different teams, working on different problems.

Halo Effect

The halo effect describes the phenomenon of taking one small part of something and applying it to the whole. This cognitive bias is often referred to in a social context when it comes to first impressions. I don’t have to tell you that handshakes and introductions matter.

As data scientists, we often work on projects with stakeholders in order to drive impact. While we would like to think otherwise, the game of influence is real. Doing good work is one thing, but effectively getting others to rally behind your work is another ballgame. It’s a tough nut to crack, but the halo effect is a strong starting point. Start out your project on the right foot and watch things grow from there. 

Advice: Most projects begin with some sort of brief or meeting. During this time, take the lead. This can be difficult as an introvert, but explaining your vision for the project through an optimistic lens goes a long way for developing enthusiasm and organizational support moving forward.

Sunk Cost Fallacy

A personal favorite of mine, sunk cost fallacy hits us when managing timelines and projects. It occurs when you irrationally cling to things that had already cost you something.

We often continue working on a project simply because we have already put time into it, even if it isn’t the most impactful thing that we could be doing. It’s not hard to realize this. You probably already have a gut feeling if it’s happening. The tricky part is making the call.

Quitting a project and moving on is a tough pill to swallow. While other biases on this list are definitely sneakier, sunk cost fallacy might be the most difficult to deal with head-on.

Advice: As difficult as it is, focus on future results. The past is the past. It can’t be changed and shouldn’t be driving your decision. As the project stands right now, is it the best time investment towards future results? If the answer is no, think long and hard about continuing the journey.

Curse of Knowledge

Communication isn’t easy. It’s even less easy when the curse of knowledge gets ahold of us. The curse of knowledge is when you know something, and presume it’s obvious to everyone. For data scientists, this can be a serious Achilles’ heel.

We are technical by nature. We do lots of analysis. We dig into lots of hypotheses. Slowly, we use data to build up a picture of what’s really going on behind the scenes.

Where this often goes wrong, is when we share these insights. We fail to remember that others haven’t done the due diligence that we have. They haven’t developed the same understanding of the problem. They haven’t dug into all of the hypotheses. So when we communicate results, they come off as difficult to interpret or overly complex.

Advice: Put yourself in the shoes of your stakeholders. Does this make sense to someone that doesn’t have as deep of an understanding of the problem? Is my presentation or document overly complex? Communicate as clearly and concisely as possible. Focus on actionable insights first, then go deeper if need be.

Information Bias

As data scientists, we crave information. It’s at the core of what we do. Our attention to detail here is both one of our greatest assets and pitfalls. If you aren’t familiar, information bias is the tendency to continue to seek out information when it doesn’t affect action.

Analysis paralysis is an unavoidable plague. At a certain point, even though it’s important to explore, the diminishing gains of looking further into a problem aren’t going to directly affect the greater outcome of the project. In these cases, your time is best spent elsewhere. 

Advice: Start with the minimum viable analysis. Do as little as possible to get a baseline of information and understanding. Present that to your stakeholder and get their feedback. Surprisingly often, it will be enough.

Summary

There’s something both discomforting and fascinating about the realization that we aren’t quite as in control of our decisions and thinking as we think. With the depth of accessible information in the world, clear and relatively unbiased thinking has turned into an often overlooked superpower. 

Throughout this post, I laid out some practices that I’ve found helpful. These can roughly be broken down into the following points:

  • Write things down beforehand

  • Talk to others with diverse perspectives

  • Ask yourself hard questions

  • Put yourself in the shoes of stakeholders

These four things will take you a long way. Writing things down beforehand will keep you honest. Talking to others will help you check your cognitive blindspots. Hard questions will make you face hard decisions head-on. Empathy for stakeholders will help you produce more meaningful work. 

Cognitive biases are unavoidable, but this doesn’t mean we have to sit idly while they take the wheel. With the right systems in place, we can take steps to tame them. Go forth and conquer.

Conor Dewey