Just the cover of the book

Book Review: Data Management in Large-Scale Education Research

Nov 12, 2024

What idiot promised to write a book review the weekend spent franticly packing for the moving truck crew, then cleaning a decade of living in a home... This very exhausted idiot.

A while ago, the author of today's book under review, Crystal Lewis, reached out and asked me to share a link to this book (many thanks for bringing it to my attention!). But as I scanned it, despite it being titled as being for "large-scale education research", I was convinced that this was a good book to read for us practitioners that collect data even if we have nothing to do with doing research, let alone education.

So first, link to the book, which you can read the web version here:

Welcome | Data Management in Large-Scale Education Research
This is the in-progress version of Data Management in Large-Scale Education Research.

Copies of the book can be purchased from CRC PressAmazon, or Barnes and Noble. (I just copied the links verbatim from the book's website and I do not get any sort of compensation if you do choose to buy.)

The most important thing I keep in mind when doing a serious review of a book is whether I am part of the intended target audience. The obvious reason being that the book is intended to teach people, often graduate students, who may not have ever collected any data before on their own. It's coverage is aimed at more academic settings with the familiar bureaucratic institutions and concerns. It also assumes absolutely zero knowledge of working with data, patiently taking time explaining important fundamentals, like organizing data into tables with rows and columns, without being overly pedantic about it.

Obviously, I only have an incidental overlap with the target audience. As a tech industrial researcher, at best I collect data about human behavior. I never have to think about writing articles for publication in journals, writing grant proposals, or dealing with things like the IRB. Moreover, I've been doing my job for 15+ years now, so the concept of having to have a thorough process for collecting data is very familiar to me. The first three chapters of the book were obviously not for me.

But from chapter 4 onwards, things become more interesting because it starts going into details about preparing to collect data. For example, chapter 4 explains to readers why human subjects data has unique concerns about handling personally identifiable information (PII), including potential legal concerns. Even in industry, we need to pay attention to these details and there might not be lawyers or IRBs around to remind us about it.

The rest of the book then goes into the many steps involved in collecting data on a large scale. The emphasis here on "large scale" means that you're probably collecting data with the help of others, over longer periods of time, and at a sufficient complexity that the usual instinct of "just collect all the CSVs and dump it into a folder" will be an invitation to disaster. This was also the part where I found things to be useful even for long-time practitioners.

The overall layout of the data collection and management plan in the book isn't some new and unheard of method. Any practitioner with a couple of years experience in collecting some data about humans in the real world would be able to handwave and recite a decent approximation of the model given in the book. Why of course we make a plan to collect data, have a shared guide book to use to train people on how to collect, input, and analyze the data. We'll obviously never, ever, modify or delete our "raw" data, and we will pick methods that try to minimize human input errors. In fact, I had multiple instances of realizing that something that I feel is perfectly natural like using a consistent system for file names isn't immediately obvious to someone new to this work.

But how many of us have sat down and actually thought about what goes into all of those steps? Have you thought about what are good practices to follow when writing a survey, or storing your data? When you start hiring people to collect data for you, what do you need to teach them? How is the data going to be stored, and what is the file naming convention used? I'm confident all seasoned practitioners would be able to come up with workable answers to any of these questions, but even I'm pretty sure that I'd accidentally forget a couple of items while planning and have to rush to fill in the gaps when I realize something is missing.

And this is why I find this book useful even for people like myself. I don't have to do a big formal study at scale very often, but for the rare times I have to, it is very nice to have something that can act like a checklist to remind me of all the things that I may want to plan ahead for. In fact, the Appendix chapter at the back of the book includes a very brief list of the most important concerns to consider at each stage of the research and data collection process.

So, my recommendation is to go take a look at the later portions of the book, it's free to do so anyways. It's very likely that you'll encounter something that you've never thought about closely enough that you could integrate into your own work. Which is pretty cool since you're probably not a large scale education researcher either.


Standing offer: If you created something and would like me to review or share it w/ the data community — just email me by replying to the newsletter emails.

Guest posts: If you’re interested in writing something, a data-related post to either show off work, share an experience, or want help coming up with a topic, please contact me. You don’t need any special credentials or credibility to do so.

"Data People Writing Stuff" webring: Welcomes anyone with a personal site/blog/newsletter/book/etc that is relevant to the data community.


About this newsletter

I’m Randy Au, Quantitative UX researcher, former data analyst, and general-purpose data and tech nerd. Counting Stuff is a weekly newsletter about the less-than-sexy aspects of data science, UX research and tech. With some excursions into other fun topics.

All photos/drawings used are taken/created by Randy unless otherwise credited.

  • randyau.com — Curated archive of evergreen posts. Under re-construction thanks to *waves at everything

Supporting the newsletter

All Tuesday posts to Counting Stuff are always free. The newsletter is self hosted. Support from subscribers is what makes everything possible. If you love the content, consider doing any of the following ways to support the newsletter:

  • Consider a paid subscription – the self-hosted server/email infra is 100% funded via subscriptions
  • Send a one time tip (feel free to change the amount)
  • Share posts you like with other people!
  • Join the Approaching Significance Discord — where data folk hang out and can talk a bit about data, and a bit about everything else. Randy moderates the discord. We keep a chill vibe.
  • Get merch! If shirts and stickers are more your style — There’s a survivorship bias shirt!