<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=437084&amp;fmt=gif">

the smallness of big data


Welcome to the Nerditorium, where we’ll be publishing ebooks, white papers, articles and general brain froth that comes out of our Wordnerds work.

Turning Qual into Quant

There are, of course, thousands of big data companies. They do amazing things, create gorgeous visualisations, and organise the numbers in extraordinary and beautiful new ways.

You can probably feel the “but” coming.

The reason we call big data small is that is concentrates on numbers, when 80% of all data that exists is in the form of unstructured text.

Think of the information that lives in that 80%. The questions it could answer. The problems it could solve. The difference it could make to your organisation. If we could just organize it in the way we do the numbers.

So why don’t we? Well, lots of reasons, but long story short: language is a nightmare.

It’s vast, nebulous, loud, confusing, sarcastic, diverse, surprising, colloquial, fluid, shrtnd and yoof, bruv. Spelling is hit and miss. Data scientists are generally focused on quantative data, and this could not be more qual.

Some not-so-awesome solutions

So how does big data interact with text data? There are four main approaches:

1. Ignore the text data altogether. The simplest solution. Not the most effective.

2. Count the number of times a word is used. Remember that time you got actionable insight out of a word cloud? No, us neither. That’s because the meaning of words comes from their context. Almost every word has more than one meaning. See a word stripped of its context tells you nothing.

3. Sentiment analysis. This one is a little cleverer but has the same problem. Are the words in a sentence happy words or sad words? It depends entirely on how they’re arranged.

4. Try to fit people’s thoughts into a finite number of options. You know the kind of thing. Drop-down menu, multiple-choice questions, choose-your-opinion-from-a-pre-approved-list. When companies can’t get the meaning from unstructured data, they’ll force user opinions into a structure. Which means that they’re no longer telling you what they really think.


data on unstructured text - Wordnerds


So it’s hopeless. Right?

You’d have to be pretty brave or pretty stupid to take this problem. We’re at least one of the two. 

Here are three things we’ve learned about approaching big problems.

1. Find the intersections between your interests, individually and as a group. No one person in our organization could have come up with the solution to these problems.

2. Let go of your darlings. We all came to this problem with different pre-existing ideas. There is received wisdom on approaching this problem in development, Corpus Linguistics, Machine Learning and Social Listening. The solution was in the middle of these disciplines, and everybody had to let go of things they’d always believed.

3. Ask for help. We have received incredible support from all kinds of organisations. We have also been extremely lucky to have been supported by two universities, Sunderland and Durham, who have been instrumental in rapidly increasing our AI capabilities. Sunderland Software City have been incredible, offering advice, support, and opportunities at every turn. Our first customer, Nissan, and our former sister company Daykin & Storey have also been amazing in developing our work. There are all kinds of help and support out there, and you know what they say about shy bairns.

What do you do with a ton of text data? Let us tell you...


Get in touch

Contact a Word Nerd today to see how we can help