The DCC Blawg: December 2008

Wednesday, 31 December 2008

...and a Happy New Year!

I'd like to wish a happy, healthy and peaceful 2009 to everyone out there. Make it a good one!

Image: http://www.flickr.com/photos/nebarnix/854348966/in/set-72157594248654650/ BY-NC-ND

Radical sharing - Science Commons

A bit slow off the mark with this one but thanks for conference contributions also go to John Wilbanks for a fascinating talk on ‘radical sharing’.

Chris Rusbridge has already given a good overview of the talk on the Digital Curation Blog, so I will just mention a few highlights for me. A couple of the really helpful bits were the analogies he used when describing the operation of copyright in the digital world.

The first was a container and its contents. You can think of copyright as protecting the container but not the contents of the container. This is how copyright has long operated (it doesn’t protect ideas but the expression of those ideas). However, some of the licences users are forced to agree to lock that container. So although the copyright still operates in the same way access to the contents is reduced (in this case by contract). Open Access solves the legal problem but not the container problem.

The second analogy was helpful in explaining the differences in the way journals/books can be used now when they are (often) electronic as opposed to paper based. He compared this to the difference between buying and renting a house.

When you buy a paper based copy it is a hard copy and it is your hard copy. You are free to do what you like with it. This is like owning a house. Now that many publications are electronic it’s more similar to the renting/lease model. You can still read the publication (live in the house) but as with renting you are regulated by a contract which will add further conditions/limitations.

A further useful point that he highlighted is that using copyleft or sharealike in the data world actually hinders freedom. If two datasets with two different licences both based on copyleft or share alike (i.e requiring resulting works to be distributed under same terms) are used, when someone integrates that data and wants to put a licence on the resulting data they would be stuck because both licences insist the resulting data be made available under a particular licence. Thus, they can’t distribute that data without breaking the terms of one of the licences. Copyleft may work within communities where there is consensus on licensing terms but if the aim is to make it available outside that community it presents difficulties.

An example he gave of this was WikiPathways who just changed their licence terms from containing a share-alike condition to CC-BY. They’ve given up their right to sue but have given info to the world. They reward people who follow their intention with use of a trademark and ignore the people who don’t. The opposite way round from what we have come to expect.

John talked more about the approach of the Science Commons project to data sharing. I’m not going to go into this further here as it will be covered in the DCC Science Commons Legal Watch Paper in the New Year. I will however mention two of his other comments.

Firstly, that people are reluctant to share their data because they are worried someone else might muck it up. But it may not have occurred to them that someone else might do something brilliant with it. Something different from their emphasis, that they would not have done.

Secondly, looking back through history it can be seen that it is not unnatural to be in the position these advocates of openness find themselves in. Dislodging entrenched processes is hard work and stable systems are resistant to change on multiple levels. If this is to be the way ahead it will take some great effort to make it work and require voluntary action on the part of many.

One final thought - it occurred to me how often it is a contractual issue rather than a strict IP issue causing difficulty here. Although it is often the case (and correct me if you feel I’m wrong) that the contracts are made to seem more reasonable (and therefore more readily agreed to) through a misunderstanding/overstatement of the IP rights that actually exist.

Wednesday, 10 December 2008

Slides now available in relation to previous 'Healthy Consent' post

Just a quickie to say the slides for the very interesting keynote at the 4th International Digital Curation Conference given by Professor David Porteous are now available on the DCC website.

Monday, 8 December 2008

Experiences of data sharing in the CARMEN project

Thanks also to Alistair Knowles for his presentation about data sharing in the CARMEN project.

He talked about the difficulties of citation of data in cases of protracted ‘ownership’ and also cases of unknown ‘ownership’ (the inverted commas are mine).

His experience is that scientists are much more comfortable with informal agreements and are able to sort matters of ownership and citation amongst themselves. But when asked to formalise arrangements they got nervous and more reluctant to act.

He argued that the solution is not a legal one but a social one (agreement within the community), a conclusion also reached by myself and some others in the Legal and Policy Issues session of the Research Data theme at the JISC Innovation Forum in July of this year. For more details see my earlier post. Do you agree?

Healthy Consent

Spent a couple of days last week at the 4^th International Digital Curation conference in Edinburgh. Lots of great speakers and the first day in particular brought up some interesting legal questions.

The keynote on the first day was provided by Prof David J Porteous, the director of the Centre for Molecular Medicine at the University of Edinburgh. He discussed a project called Generation Scotland which he is involved in. Generation Scotland is a partnership between the Scottish University Medical Schools, Biomedical Research Institutes, the NHS in Scotland and the people of Scotland which aims to create more effective treatments based on gene knowledge. The project works with genomic data, which is personal and therefore of interest to me from a data protection perspective.

Fascinating talk. Loads of interesting explanation of the background, why they do what they do and the potential benefits. An overview of the impact of environment, wealth, diet and smoking in life expectancy and brief discussion of how much health is dictated by those environmental factors and how much by nature, which is where the genome comes in.

Health is a major priority for the Scottish Government and it only takes a peremptory look at Prof Porteous’ ‘disease prevalence’ maps of the UK to see why. The DCC has its headquarters in Edinburgh and many staff at HATII in Glasgow – so you could say the future of excellent digital curation relies on the work of Generation Scotland ;-)

The subjects of the research conducted by Generation Scotland are volunteers. As will be immediately apparent the kind of data the project collects and uses is not only personal data but sensitive personal data (as defined by the Data Protection Act 1998) which brings up legal as well as ethical issues. The project addresses these through coding and anonymisation of the data to make it ethically sound and secure.

The discussion around consent was particularly interesting. The subjects give ‘open consent’ instead of the usual ‘informed consent’. My understanding is that, on a practical level, this means something like “Trust us. We don’t know what we will do with your data in the future but we will tell you and you have the right to withdraw if you don’t like it”.

This right to withdraw is important and, as Prof Porteous explained, a guiding principle of the project. Generation Scotland needs the ability to remove a subject’s data should they ask to be withdrawn. However, they can’t go back and change anything that’s happened already. This leads to all sorts of questions about what can still be inferred.

There was also some discussion about who owns and controls the data. As I mentioned the other day this is a tricky area. I learnt something new here. The data is owned by the Scottish Government. Some may be surprised by this, indeed that was my immediate reaction and I can see it argued two ways – but that is for another day (do feel free to remind me). But it must be remembered that the owners of the data may not be the same people as those who have rights or corresponding responsibilities in relation to that data. As I said last Monday – eugh!

Prof Porteous finished by touching on a catch 22 situation found in relation to the Generation Scotland work – the requirement for consent to gain consent. On the one hand this is restricting research, and in the vital area of health at that. On the other hand, moves to relax this have been criticised as threatening patient privacy. Do you have any thoughts on this? This is a new area for me and one I know very little of. But it sounds thought-provoking and ripe for a good debate – so let’s start one! (Another link on this...and another one to some people already having a debate)

Lastly, a question that arose for me in response to the answer to a query from a gentleman in the audience. Should consent requirements be different depending on whether the personal data is to be used on behalf of the nation or for commercial benefit? What do you think?

Well that’s me for now. I’ll be back later today. A big thanks to Prof Porteous for such a fascinating start to the conference.

P.S. He mentioned the a book called ‘The Grim Reaper’s Road Map - An Atlas of Mortality in Britain'. Great title - got to be worth a look on Amazon at least!

P.P.S. shocking fact – although life expectancy has been increasing over time the young of today are predicted to live less long than the generation before them. As an advocate of people taking increased responsibility for their own lives, and in particularly their own health, this to me sounds like a rather loud call to action. It’s not curation, or legal but it is very important.

Monday, 1 December 2008

Next Legal Watch Paper - Science Commons

Next Legal Watch Paper is on the topic of Science Commons.

What would you like to see covered in there? Any burning questions? Now’s your chance!

By the way, if Science Commons is something that interests you, we have John Wilbanks (VP of Science Commons) speaking at the 4th International Digital Curation Conference tomorrow. Perhaps I’ll see you there. And for those who can’t make it – I’ll blawg about it very soon.

'Ownership of data' – uuuughh!

'Ownership of data' – uuuughh! The phrase gives me a shiver just thinking about it. A contentious area rife with complication, confusion and misunderstanding. That said, a super important area, especially for those of us interested in curation and scientific progress in general (perhaps less so for a landscape gardener say).

I was reading the Oaklaw report ‘Building the Infrastructure for Data Access and Reuse in Collaborative Research’ and noticed a couple of interesting pages (‘Chapter 2: Key Concepts’ Paras 2.16 – 2.22 if you’re interested).

The report looked at what is meant by the term “ownership” in relation to data. It identified nine different parties who might claim rights in data. These were:

the creator – the party who creates or generates the data;
the consumer – the party who uses the data;
the compiler – the party who selects and compiles information from different information sources;
the funder – the party who commissions the data to be generated;
the decoder – where informed is protected by encoded formats (e.g. encryption), the party who can unlock the information;
the packager – the party who collects information for a particular use and adds value through formatting it for a particular market or set of consumers;
the reader – the person who reads data added to an information repository;
the subject of the data – the person from whom the data is derived or who the data is about; and
the purchaser or licensee – the party who buys or licences the data.

Your thoughts? What do you think of this list? What does ownership of data mean to you? Do you consider yourself an ‘owner’ of data? And if so, what is your relationship to that data? Am I missing the point and it’s all ridiculously simple?

All input welcomed and appreciated.

The DCC Blawg