The word “data” shows up throughout statistics. There are many different classifications of data. Data can be quantitative or qualitative, discrete or continuous. Despite the common use of the word data, it is frequently misused. The primary problem with the use of this term stems from a lack of knowledge about whether the word data is singular or plural.
If data is a singular word, then what is the plural of data? This question is actually the wrong one to ask. This is because the word data is already plural. The real question we should ask is, “What is the singular form of the word data?” The answer to this question is “datum.”
It turns out that this occurs for a very interesting reason. To explain why we will need to go a little deeper into the world of dead languages.
A Little Bit of Latin
We begin with the history of the word datum. The word datum is from the Latin language. Datum is a noun, and in Latin, the term datum means “something given.” This noun is from the second declension in Latin. This means that all nouns of this form that have a singular form ending with -um have a plural form that ends in -a. Although this may seem strange, it is similar to a common rule in English. Most singular nouns are made plural by adding an "s", or perhaps "es," to the end of the word.
What all this Latin grammar means is that the plural of datum is data. So it is correct to speak of one datum and several data.
Data and Datum
Although some treat the word data as a collective noun referring to a collection of information, most writing in statistics recognizes the origin of the word. A single piece of information is a datum, more than one are data. As a consequence of data being a plural word, it is correct to speak and write about “these data” rather than “this data.” Along these same lines, we would say that "the data are… " rather than "the data is… "
One way to dodge this issue is to consider all of the data as a set. Then we can talk about a singular set of data.
Spot the Examples of Misuse
A brief quiz may further help to sort out the correct way to use the term data. Below are five statements. Determine which two are incorrect.
- The data set was used by everyone in the statistics class.
- The data was used by everyone in the statistics class.
- The data were used by everyone in the statistics class.
- The data set were used by everyone in the statistics class.
- The data from the set were used by everyone in the statistics class.
Statement #2 does not treat data as a plural, and so it is incorrect. Statement #4 incorrectly treats the word set as a plural, whereas it is singular. The rest of the statements are correct. Statement #5 is somewhat tricky because the word set is part of the prepositional phrase "from the set."
Grammar and Statistics
There are not many places where the topics of grammar and statistics intersect, but this is one important one. With a little practice, it becomes easy to correctly use the words data and datum.