Wednesday, 31 August 2016

Data management - Post 1 - Intro

By Maxine Whitfield

I still get emails from my MSc supervisor, who I have nothing but the utmost respect for, but whose name in my inbox is usually accompanied by a heart-quickening, “fight or flight” response, and feelings of complete terror. 

Until recently, we were preparing manuscripts for publication (thankfully they're all published now), and he usually had a question about how I analysed my data. Despite the fact that I wrapped up the final edits on my dissertation a mere year and a half ago, it often took me a full day of frantic searching in documents scattered to the furthest corners of my laptop, under filenames such as “dissertation final”; “dissertation REALLY final” and “dissertation December 2015 kill me now”, before I could answer his simple question. I spent hours poring through spreadsheets of data. Did I end up including those outliers in the mixed model? I need to hunt down the R file from that analysis… What the heck would I have named it? 



My naming system for folders and files left much to be desired. I shared this Dropbox folder with my supervisor. It must have terrified him.


I had some kind of hunch about the relationship between expected and observed metabolic rates, and spent a crazy afternoon excitedly exploring my data. Yikes. A couple of years down the line and I honestly have no clue what is going on in this spreadsheet.

The point I am trying to make here, is that as a naïve, happy-go-lucky honours graduate entering my Master's degree, I had absolutely no idea just how much data collection, exploration, analysis and writing I would do in the next couple of years, and I had even less of an idea of how to go about managing it all. The point of this blog series, therefore, is to arm the bright-eyed, bushy-tailed future postgrads who read it with the tools to manage their data effectively, so that when publishing their incredible results, they can coolly respond to any co-author’s questions with negligible blood-shed. 

DataONE, an online repository for worldwide ecological and environmental data, has a fantastic set of tutorials on the various aspects of data management. For each of these blog posts, I'll put a link up to their page of tutorials and exercises, and suggest which ones correspond to the topics I'm discussing in the post. See Lesson 01 here.







No comments:

Post a Comment