Published on

Regional Variation of Slang

Authors

Regional Variation of Slang via Computational Methods

I discovered the Slang Metric, a numerical formula that predicts whether or not a lexeme (word) is slang. This research was part of my senior thesis for Linguistics at Grinnell College.

It's notable, because defining slang in a scientific manner has been very difficult.

Assuming you have a comprehensive dataset of lexeme usage across various regions, the Slang Metric is simply the coefficient of variance of the normalized frequencies of lexemes.

In other words:

  1. We define lexemei\text{lexeme}_i as the absolute count of instances of lexeme\text{lexeme} in the ithi^{th} region.

  2. We define N(lexeme)N(\text{lexeme}) as the normalized counts of instances of lexemes.

  3. σ\sigma is the standard deviation of the word's usage frequency across different regions.

  4. μ\mu is the mean (average) usage frequency of the word across these regions.

  5. SlangMetric(lexeme)=σN(lexeme)μN(lexeme)\text{SlangMetric}(\text{lexeme}) = \frac{\sigma_{N(\text{lexeme})}}{\mu_{N(\text{lexeme})}}

If SlangMetric(lexeme)0.1\text{SlangMetric}(\text{lexeme}) \geq 0.1, your lexeme is slang!

Beautiful, right?!