A codebook is a technical description of the data that was collected for a particular purpose. It describes how the data are arranged in the computer file or files, what the various numbers and letters mean, and any special instructions on how to use the data properly. Like any other kind of “book,” some codebooks are better than others. The best codebooks have:
R comes with many built-in data sets. To see the 100+ data sets that come with R just type data()
in your console and you’ll see a list that looks like:
For any of these built-in data sets you will find the “codebook,” the technical description of the data by typing ?
and then the name of the data set. This will bring up the “codebook” in your Help console. For instance, ?mtcars
will provide you with the technical information regarding the mtcars
built-in data set.
Getting the codebook for data that you are importing is a little more difficult. If you are using organizational data at your employer, this will likely require you to request the codebook from your database engineers. This seemingly simple task will surprise you by illustrating how few people truly understand the technical details underlying organizational data. If you are using online data, which is the emphasis in this course, you may need to do some searching to identify the data. Sometimes codebooks are obviously and explicitly linked on the website, other times you have to do some digging to find the codebook. Some examples of codebooks follow:
The important thing to remember is that you need to identify the documentation that explicitly tells you about the data you are working with. If not then in your analysis you need to state what the implied meaning of the data is; however, you should also state that ambiguity may exist if a codebook can not be identified. With your final project, I expect you to explain the meaning of the source data you analyze and provide a link to the codebook.