It seems that the National Science Foundation will be asking new grant applications to submit a data management plan, apparently including plans for how to make their data available to others.
I have mixed feelings about this. I certainly approve of high-value data sets being made available. I've benefitted a great deal from the wonderful people who put together Penn Tree Bank, VerbNet and similar projects. There are now some useful data sets included in libraries for R as well. I intend to make the summary data from my pronoun studies available when I publish the associated papers.
That said, getting data together in a manner that its interpretable and usable by somebody else is hard. However much I document my own data, whenever I have to go back to look at some old data it takes hours if not days to figure out what I'm looking at. And I'm the one who created it. Fully documenting a data set for someone not associated with the project takes time.
Given that NSF will be paying the salaries of the people who spend the time to document the data sets, it's reasonable to ask whether it's cost-effective. Just how much of a demand is there for data from other labs? I can think of many papers for which I wish I had the original stimuli. The number for which I want the original data are much smaller (though there are some for which it would be really useful).
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
Don't tell me they found Tyrannosaurus rex meat again!2 weeks ago in Genomics, Medicine, and Pseudoscience
-
-
-
Course Corrections4 months ago in Angry by Choice
-
-
The Site is Dead, Long Live the Site2 years ago in Catalogue of Organisms
-
The Site is Dead, Long Live the Site2 years ago in Variety of Life
-
Does mathematics carry human biases?4 years ago in PLEKTIX
-
-
-
-
A New Placodont from the Late Triassic of China5 years ago in Chinleana
-
Posted: July 22, 2018 at 03:03PM6 years ago in Field Notes
-
Bryophyte Herbarium Survey7 years ago in Moss Plants and More
-
Harnessing innate immunity to cure HIV8 years ago in Rule of 6ix
-
WE MOVED!8 years ago in Games with Words
-
-
-
-
post doc job opportunity on ribosome biochemistry!9 years ago in Protein Evolution and Other Musings
-
Growing the kidney: re-blogged from Science Bitez9 years ago in The View from a Microbiologist
-
Blogging Microbes- Communicating Microbiology to Netizens10 years ago in Memoirs of a Defective Brain
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl12 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House13 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs13 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby13 years ago in The Large Picture Blog
-
in The Biology Files
Subscribe to:
Post Comments (Atom)
2 comments:
Yes, I agree! In an ideal world all data would be optimally packaged and documented, and freely available to anyone. But doing this requires valuable resources, and the time spent doing this is time not spent on other things, such as research, teaching and other outreach activities (which the NSF increasingly wants from researchers).
That's a very good point. I have a similar response when I hear people say "let's try to make research software more professional".
You might also mention that it's not just time, but skill -- it takes some practice to define reusable standards, understandable documentation, etc. Just like any other kind of writing. This goes double for coding.
Here's the devil's advocate position: if your data's not in good enough order to distribute, you probably haven't done your experiments cleanly enough to be reliable. And certainly not cleanly enough to be directly replicable.
Post a Comment