Monday 8 August 2016

Why do academics cite? And what can the Open Data teach us?

I like thinking and I like my thoughts. I learn by debating and discussing, especially with people who disagree or who approach a topic from a radically different perspective. I want to be able to order and lay out my thoughts to encourage further debate.

I don’t want to cite experts to justify my thinking, as if my thoughts have no value because I am not eminent. I do not want to hide behind experts in an attempt to abdicate personal responsibility for what I’ve written.

I do have to be careful in case the above is an excuse for trying to avoid citing because I find it difficult to develop the discipline of knowing where ideas have come from. I’m pretty eclectic and I’m getting old. I usually couldn’t tell you where I picked up an idea – it might have been a newspaper article, tweet, book I read in 1983, something overheard at a conference, a conversation, a lecture or something I scan-read when looking to see what others have read. And I’m not just eclectic, I also struggle to separate my immediate interaction with someone’s words and the words themselves.

But I digress. Whatever my beliefs and (limited) capability for citing, my thoughts are not my thoughts. They are a constantly evolving product of interaction with other thinkers. I need to keep notes so I can cite others in order to:
  •  give credit to those co-creators (at all stages of research)
  •  provide pointers to how my thinking evolved (when I get to the point of fixing my thinking in a thesis)
  •  signpost people to things I think worth reading (at all stages of research)


We have an established academic route for citation when we interact with texts, with rules for how to cite within our writing and how to reference that citation in the bibliography. That’s fine if the knowing or unknowing co-creators of your thinking are written, disembodied texts. And it will be fine once I’ve learned the discipline of keeping accurate records…

There are newer rules for citing television programmes, videos, websites – almost anything can be cited as long as what we are citing could be accessed by others in the same way that we accessed it.

But what do we do when our co-creators are encountered not as written texts but as conversations, debates, email exchanges, tweets? My reasons for wishing to cite are as valid now as for any encounter I have with a written text.

Back in the 1980s when I wrote a science dissertation, it was permitted to cite the very occasional 'pers comm' if it came from a respected authority. One needed a strong argument for including a personal communication as, although it credited the person and directed you to someone of interest, it did not let other people reflect independently on that communication. In other words, how could others access the information, the data, that was being cited?

It’s at this point that my thoughts moved to the concept and language of Open Data. After all, what is a citation but a way to access data? And in Open Data Institute terms (http://theodi.org/blog/closed-shared-open-data-whats-in-a-name), data come as open, shared and closed. My data are 'things needing citation'
  • Open is available to all on the same basis.
  •  Shared is available to some (like publications behind a pay wall or personal communications).
  •  Closed is private communication so shouldn’t be cited at all.
Some within the Open Data world argue that as much data as possible should be shifted to ‘open’ or ‘closed’, because ‘shared’ is where privileged is maintained through controlled access to information. (As I said, I’ve been ill-disciplined in documenting where I hear things; all I can say is my belief this is true comes from articles I’ve clicked through to from Twitter written by proponents of Open Data, and conversations over dinner before a Gov Camp with someone from an Open Data Institute. If anyone reading this wants to direct me to something I can cite, that would be amazing)

If the language of Open Data applies to things a researcher might want to cite, does Open Data thinking show us a way to move personal communications out of ‘shared’ and into ‘open’? Possibly. One route would be to shorten the time between finding something out or developing a new idea, and publishing about it. It might be fun if academics had a ‘please publish’ button on their blogs to help work out which blogs must urgently need refining from a work-in-progress into a publication. Or perhaps we need to revisit the status and purpose of blogs, but that would be the subject of a different blog.

Maybe, in this world of DOI (Digital Object Identifier; a code that means your information can be found even if the web address changes), there is scope for preparing a mutually acceptable version of any 'pers comm' and deliberately moving it out of the shadows into the ‘open’ by uploading it digitally in a way that complies with guidance on what makes ‘good open data’ (http://theodi.org/guides/what-open-data).

This might make both parties think twice about how transparent they wish their communication to be. But if transparency is a problem, the communication probably needs to be shifted to the ‘closed’ category and therefore not cited at all.

I have no desire to undermine the status of peer reviewed published literature. When I write my thesis, they will take pride of place. But to fulfill my three purposes for citation, I will also need to cite beyond that, and that means I will need 
  • a way to cite people and not just their texts
  • a strategy for making sure anything I wish to cite complies with the principle of others being able to access what I've cited.
Discuss!