Monday, 1 December 2008

'Ownership of data' – uuuughh!

'Ownership of data' – uuuughh! The phrase gives me a shiver just thinking about it. A contentious area rife with complication, confusion and misunderstanding. That said, a super important area, especially for those of us interested in curation and scientific progress in general (perhaps less so for a landscape gardener say).

I was reading the Oaklaw report ‘Building the Infrastructure for Data Access and Reuse in Collaborative Research’ and noticed a couple of interesting pages (‘Chapter 2: Key Concepts’ Paras 2.16 – 2.22 if you’re interested).

The report looked at what is meant by the term “ownership” in relation to data. It identified nine different parties who might claim rights in data. These were:

  1. the creator – the party who creates or generates the data;
  2. the consumer – the party who uses the data;
  3. the compiler – the party who selects and compiles information from different information sources;
  4. the funder – the party who commissions the data to be generated;
  5. the decoder – where informed is protected by encoded formats (e.g. encryption), the party who can unlock the information;
  6. the packager – the party who collects information for a particular use and adds value through formatting it for a particular market or set of consumers;
  7. the reader – the person who reads data added to an information repository;
  8. the subject of the data – the person from whom the data is derived or who the data is about; and
  9. the purchaser or licensee – the party who buys or licences the data.

Your thoughts? What do you think of this list? What does ownership of data mean to you? Do you consider yourself an ‘owner’ of data? And if so, what is your relationship to that data? Am I missing the point and it’s all ridiculously simple?

All input welcomed and appreciated.


Garkbit said...

What happened to "the public" as owner?

But actually, as a former scientist, I don't really recognise these categories. In my former field (astronomy) data can go through a very long pipeline of collection, reduction, interpretation and reuse, so that distinguishing a producer from a consumer is pretty tricky. I think what is needed is probably something more sophisticated than a simple categorisation, and would probably involve ideas about added value - ie that one has some rights in the data (even if it's only the right to have your work acknowledged) if you have added value to the data. But I'm probably not the right person to try to put that on a rigorous legal footing.

(And in my extensive experience, science works best when people distribute their data freely. The people who distribute freely get more citations, co-authorships, and acknowledgements and generate the kind of goodwill that translates into future successful grant proposals.)

Mags said...

Thanks for this Garkbit.

I recognise your point about data going through a "pipeline of collection, reduction, interpretation and reuse". Its a definite difficulty. Multiple agents holding the same role, or multiple roles applying to same agent. Can be confusing in its application. Heard a talk last week where this was discussed and will post on it shortly.

Am also interested in your experience of science working "best where people distribute their data freely". I have to say this philosophy appeals to me but as I am not a scientist myself its genuinely useful to hear the thoughts of someone who has experienced this approach in practice.

Don't get me started on 'public as owner'. That's an excellent point. Please correct me if I'm talking about soemthing different from what you intended here but facts being uncopyrightable is frequently overlooked. Certainly from a legal point of view there is often talk of ownership where none exists and indeed the data belongs to "the public".