blogs

Is tagging / categorization / coding appropriate for analyzing customer feedback?

over 5 years ago by Ryan Stuart • 3 min read

Open-ended questions are a fantastic way to reduce the length of a survey and increase response rates, but how do you take 10,000 responses to an open-ended question and understand them in a reasonable amount of time?

Recently we wrote about why the second question on a standard NPS survey might be the most valuable question. This doesn’t stop at NPS surveys. Open-ended questions are a fantastic way to reduce the length of a survey and increase response rates, but they also present a problem (like other types of unstructured data). Unlike structured questions that usually return a finite set of responses (a rating, yes/no, or similar) and hence are easy to understand, open-ended responses are much more difficult. How do you take 10,000 responses to an open-ended question and understand them in a reasonable amount of time?

Enter manual coding...

The first approach to solve this problem and arguably still the most popular solution is manual coding. This is the process of assigning 0 or more codes from a set of codes (or code-frame) to each customer verbatim. But what is a code? This is Johnny Saldaña's definition:

“A code in qualitative inquiry is most often a word or short phrase that symbolically assigns a summative, salient, essence-capturing, and/or equivocal attribute for a portion of language-based data.”

If this made your eyes glaze over, perhaps an example will help. Let’s say we we receive a piece of feedback from a customer as follows:

The staff are really friendly and always happy to help, but the prices are a bit too expensive for me.

We might code this verbatim with price and staff. The beauty of this approach is that it adds structure to unstructured data. We can now use standard quantitative techniques to analyse this qualitative data, but there are some fairly significant drawbacks to consider…

Drawbacks of coding

You don’t have to go too far past the surface of coding to run into some issues. Consider the following:

How many codes do you have? Is a price enough in the above example? Or do we need a price expensive code as well? Is that a sub code of price?
Do you need to update the set of codes to use? Will customers consistently use the same language? Or will your organisation want to track the same set of codes?
How hard is it to update a set of codes? Is it a simple process and complex? What resources does it require?

These are just the obvious concerns. When you consider that humans are bad at agreeing on a set of codes for a piece of text (see inter-coder reliability), and that manual coding of data — a human reviewing text and assigning it code(s) — is expensive (especially if you want high inter-coder reliability), then you have to wonder: is there is a better way?

More generally, is trying to transform unstructured qualitative data into structured quantitative data so that we can use traditional analysis techniques even the right approach? Taking our example above, if we stick with the pricecode, what does knowing that customers mention price 25% of the time tell me? Is that true understanding? What are they saying about price?

True understanding requires human intervention

Open-ended customer feedback is arguably the most valuable type of data to any organisation. Having the thoughts of your customers on your product or service is essentially the holy grail for any organisation. Reducing that data to a set of codes and analysing it as if it were any other type of structured is the reason most organisations are missing a massive opportunity to understand and empathize with their customers. As Dale Carnegie puts it:

When dealing with people, remember you are not dealing with creatures of logic, but creatures of emotion.

When humans communicate to each other, they do it through language. They express themselves in a way that often creates an emotive response from the other party. That phenomenon of feeling an emotion in response to something someone has said or done is known as empathy. Humans don’t take what someone has said in a conversation, code it in their brain, then track the trends of those codes over time. That’s not how we function.

Goodbye categories, hello topics

It’s time to move beyond categorization/tagging. Think of it like this: categorization is you going to look for what you think your customers are going to be talking about. Wouldn’t it be better to actually look at everything your customers are saying — not just your preconceived ideas — to find actionable insights? This is a classic Search vs Discovery problem. People have traditionally gone with search because discovery on large open-ended data was unfeasible. Not any more.

At Kapiche we’ve built our core language modelling algorithm to overcome the limitations of categorization by enabling a discovery approach to open ended feedback. Our product gives users the ability to sift through data more efficiently to discover actionable insights. There are three key things that move our language model algorithm beyond just being a set of categories:

Language models emerge directly from the raw data, rather than being prefiltered through the minds of the coders and a predefined set of categories. When you have the data, you’re only a couple of minutes away from insight.
A dynamic and editable model, that allows a human to make necessary adjustments to highlight what is important to answer their specific question. Without on-the-fly editing you’re shackled to a fixed view of the world, which may not be relevant for your data or your organisation.
Understandable visualizations of what the the language model means, with rich contextual representations driven by the words your customers are actually using.

With technology and tools like these, companies of the future shouldn’t bind themselves to a categorical view of the world — they should focus on truly understanding their customers.