Making Sense of Online Reviews: A Machine Learning Approach: An Abstract

Document Type

Book Contribution

Publication Date



It is estimated that 80% of companies’ data is unstructured. Unstructured data, or data that is not predefined by numerical values, continues to grow at a rapid pace. Images, text, videos and voice are all examples of unstructured data. Companies can use this type of data to leverage novel insights unavailable through more easily manageable, structured data. Unstructured data, however, creates a challenge since it often requires substantial coding prior to performing an analysis. The purpose of this study is to describe the steps and introduce computational methods that can be adopted to further explore unstructured, online reviews. The unstructured nature of online reviews requires extensive text analytics processing. This study introduces methods for text analytics including tokenization at the sentence level, lemmatization or stemming to reduce inflectional forms of the words appearing in the text, and ‘bag of n-grams’ approach. We will also introduce lexicon-based feature engineering and methods to develop new lexicons for capturing theoretically established constructs and relationships that are specific to the domain of study. The numeric features generated in the analysis will then be analyzed using machine learning algorithms. This process can be applied to the analysis of other unstructured data such as dyadic information exchange between customer service, salespeople, customers and channel members. Although not a comprehensive set of examples, companies can apply results from unstructured data analysis to examine a variety of outcomes related to customer decisions, managing channels and mitigating potential crisis situations. Understanding interdisciplinary methods of analyzing unstructured data is critical as the availability of this type of data continues to accelerate and enables researchers to develop theoretical contributions within the marketing discipline.