How It Works

The core logic of Linguistic Inquiry and Word Count (LIWC) comes from decades of scientific research demonstrating that people’s language can provide extremely rich insights into their psychological states, including their emotions, thinking styles, and social concerns. Sometimes, these insights are fairly obvious and straight-forward. For example, if someone is using a lot of words like happy, excited, and elated, they are probably feeling happy, and we can use this information to reliably estimate their current emotional state. Oftentimes, however, the relationships between verbal behavior and psychology are much, much less obvious. For example, people who are more confident and higher in social standings tend to use "you" words at relatively high rates, and "me" words at relatively low rates. Here, too, decades of empirical research — particularly research using LIWC as a scientific instrument — provides us with specialized ways of understanding, explaining, and quantifying psychological, social, and behavioral phenomena.

LIWC-22 comes with over 100 built-in dictionaries created to capture people’s social and psychological states. Each dictionary consists of a list of words, word stems, emoticons, and other specific verbal constructions that have been identified to reflect a psychological category of interest. For instance, the "cognitive processes" dictionary includes over 1,000 entries that reflect when a person is actively processing through information, both in general and more specific ways. The "affiliation" dictionary includes over 350 entries that reflect a person's need to connect with others, including words like "community" and "together" among others.

LIWC reads a given text and compares each word in the text to the list of dictionary words and calculates the percentage of total words in the text that match each of the dictionary categories. For example, if LIWC analyzed a single speech containing 1,000 words using the built-in LIWC-22 dictionary, it might find that 50 of those words are related to positive emotions and 10 words related to affiliation. LIWC would convert these numbers to percentages: 5.0% positive emotion and 1.0% affiliation.

Note that many LIWC-22 categories are organized in a hierarchical structure. All anger words, by definition, are categorized as negative emotion words, which are in turn categorized as emotion words. Also note that the same word may be categorized in multiple dictionaries. For instance, the word "celebrate" is in both the positive emotion and achievement dictionaries.

Don’t forget that LIWC, like all text analysis tools, is a relatively crude instrument. It can sometimes make errors in identifying and counting individual words. Consider the word "mad" — a word that is counted in the anger dictionary. Usually, today, the word "mad" does reflect some degree of anger. Sometimes, however, it expresses joy ("he’s mad for her") or mental instability ("mad as a hatter"). Fortunately, this is seldom a problem because LIWC takes advantage of probabilistic models of language use. Yes, in a given sentence, the word "mad" might be used to express positive emotion. However, if the author is actually experiencing positive emotion, they would generally tend to use more than one positive emotion word, and most likely few other anger words, which should result in a high positive emotion score and low anger score. An important thing to remember is that the more words that you analyze, the more trustworthy are the results. A text of 10,000 words yields far more reliable results than one of 100 words. Any text with fewer than 25-50 words should be looked at with a certain degree of skepticism.

Helpful References

  • Boyd, R. L., & Schwartz, H. A. (2021). Natural language analysis and the psychology of verbal behavior: The past, present, and future states of the field. Journal of Language and Social Psychology, 40(1), 21–41. https://doi.org/10.1177/0261927X20967028
  • Boyd, R. L., & Pennebaker, J. W. (2016). A way with words: Using language for psychological science in the modern era. In C. Dimofte, C. P. Haugtvedt, & R. F. Yalch (Eds.), Consumer psychology in a social media world (pp. 222–236). Routledge/Taylor & Francis Group.
  • Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In C. W. Roberts (Ed.), Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts (pp. 117–129). Erlbaum.
  • Kennedy, B., Ashokkumar, A., Boyd, R. L., & Dehghani, M. (2022). Text analysis for Psychology: Methods, principles, and practices. In M. Dehghani & R. L. Boyd (Eds.), The handbook of language analysis in psychology. Guilford Press.
  • Mehl, M. R. (2006). Quantitative text analysis. In M. Eid & E. Diener (Eds.), Handbook of multimethod measurement in psychology (pp. 141–156). American Psychological Association. https://doi.org/10.1037/11383-011
  • Pennebaker, J. W. (2011). The secret life of pronouns: What our words say about us. Bloomsbury.
  • Stone, P. J., Dunphy, D. C., Smith, M. S., & Ogilvie, D. M. (1966). The General Inquirer: A computer approach to content analysis. M.I.T. Press.
  • Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676