Regional Variation of Slang via Computational Methods

I discovered the Slang Metric, a numerical formula that predicts whether or not a lexeme (word) is slang. This research was part of my senior thesis for Linguistics at Grinnell College.

It's notable, because defining slang in a scientific manner has been very difficult.

Assuming you have a comprehensive dataset of lexeme usage across various regions, the Slang Metric is simply the coefficient of variance of the normalized frequencies of lexemes.

In other words:

We define $\text{lexeme}_i$ as the absolute count of instances of $\text{lexeme}$ in the $i^{th}$ region.
We define $N(\text{lexeme})$ as the normalized counts of instances of lexemes.
$\sigma$ is the standard deviation of the word's usage frequency across different regions.
$\mu$ is the mean (average) usage frequency of the word across these regions.
$\text{SlangMetric}(\text{lexeme}) = \frac{\sigma_{N(\text{lexeme})}}{\mu_{N(\text{lexeme})}}$

If $\text{SlangMetric}(\text{lexeme}) \geq 0.1$ , your lexeme is slang!

Beautiful, right?!