There were two facets of data sharing and data release (as manifested by the concept of transparency, but can be expanded to any situation where there is data repurposing) that differ from the traditional view of data.
The first facet is that once the data producer has made the data set freely available, that data producer effectively loses control over the use as well as the longevity of the data. Even attempts to restrict the use are limited. For example, if you recall a few years back when AOL released a data set of web searches that turned out to have exposed protected data? Of course they pulled the data set after that was revealed, but just now I was able to find a copy of that data set for download. Despite the intent to regain control and restrict access, that data set’s lifetime has been extended because it was made freely available. I could summarize this first point as this: once released, a data set can have an extended life time.
The second facet involves understanding that the potential for repurposing should impose changes on the lifecycle processes. If we already know that a data set is going to be used by many different consumers, then perhaps those consumers’ needs should be incorporated into the production scheme for that data.
In most cases there is just no framework for addressing either of these issues. Rather, this is a data governance challenge: can you define a set of data policies that are attached to the use of a released data set? Are there licensing implications, a standard for use? Are there ways of incorporating public response or comment on the information lifecycle? Perhaps we need to envision a method for incorporating the data requirements of an elastic constituency through collaboration and cooperation to develop a happy medium: one where the producer promises to incorporate a broad yet incrementally growing set of consumer requirements in return for consumer promises of observance of usage policies.