Welcome to week 3! Today we’re going to talk about research data and your options for maximising its impact. We’ll be thinking mainly about sharing it but also managing it, storing it, depositing it and taking steps to make sure it’s safe and discoverable.
Data is second only to journal articles in terms of importance to research communication and publishing – it’s the rocks from which diamonds are refined. And as a researcher, chances are you’ve got research data lying around on your hard drive or server. Yet a lot of research data never sees the light of day. It used to be difficult to make data available to others, so researchers didn’t unless required to by journals or funder mandates.
But new research has found that by putting your research data online, you’ll become up to 30% more highly cited than if you kept your data hidden. Open research data also leads to more replicable studies, and is important to the quality of research overall. And advancements in technology have made it easier than ever to cheaply preserve and make your data Open Access.
In today’s challenge, we’ll share three easy ways to make your data available online: Open Repositories (ORs) like Figshare and Zenodo; Disciplinary Repositories (DRs) like Dryad and ICPSR; and Institutional Repositories (IRs) like NOVA here at the University of Newcastle.
So what does your data look like? It might be the results of all those hours you’ve put into qualitative surveys. Or it might include digital photos taken in archives, de-identified interview voice recordings and transcripts, videos of performance, bibliographies and translations of published work as well as data from existing databases. One thing is certain, it’s a lot more than figures in a spreadsheet.
Researchers share all the time through informal networks but now we going to look at the options for sharing data that focus on its preservation, discoverability and re-use.
Why post to a data repository?
A common way for many researchers to share their data over the years has been to submit it as a supplementary file to a journal article. But publishers are beginning to encourage researchers to deposit their data to repositories instead.
Publishers recognise that repositories of all persuasions are fantastic places to post your research data. That’s because of two standard features for most repositories: high-quality preservation options and persistent identifiers for your data.
Preservation is a no-brainer – if you’re entrusting your data to a repository, you want to know that it will be around until you decide to remove it.
Persistent identifiers are important because they allow your data to be found if the URL for your data changes, or it’s transferred to another repository when your repository is shuttered, and so on. And with persistent identifiers like DOIs, it’s easy to track citations, shares, mentions, and other reuse and discussion of your data on the Web.
There are several different types of repository that can host your data depending on your institution and discipline. Let’s dig into the different types of repositories and what each does best.
Open repositories (ORs) like Figshare and Zenodo are repositories that anyone can use, regardless of institutional affiliation, to preserve any type of scholarly output they want. Here are specific advantages and disadvantages of two open repositories.
Figshare offers free deposits for open data up to 250 MB in file size. They issue persistent identifiers for datasets. Users can “version” their data as simply as uploading updated files, and can easily embed Figshare datasets in other websites and blogs by copying and pasting a simple code. Other users can comment on datasets and download citation files to their reference managers for later use.
Figshare offers preservation backed by CLOCKSS, a highly trusted, community-governed archive used by repositories around the world. And you get basic information about the number of views and shares on social media your dataset has gotten to date.
Zenodo also offers free data deposits and issues DOIs for your datasets. Much like Figshare, the non-profit makes citation information for datasets available in BibTeX, EndNote, and a variety of other library and reference manager formats. Users can add highly detailed metadata for their files – much more than Figshare currently allows – which can aid in discoverability. Other Zenodo users can comment on your files. And best of all, Zenodo makes it easy to sign up with your ORCID identifier. (If you don’t have one yet, no worries! We’re going to cover it in an upcoming challenge.)
Both repositories have open APIs, making them very interoperable with other systems, and they are both user-friendly.
For some, Figshare’s funding model is a serious drawback; it’s a for-profit company funded by Digital Science, whose parent company, Macmillian Publishing, is the keeper of the Nature Publishing Group empire.
Zenodo’s preservation plan is less robust than Figshare’s, and currently Zenodo can only host files 2GB or less in size. Zenodo also lacks public page view and download statistics, meaning that you can’t track the popularity or reuse of the data you submit to the archive.
Disciplinary repositories (DRs) offer a way to share specialised research data with relevant communities. They offer many of the same features as IRs and ORs, but often with special features for disciplinary data. To get a better idea of the features of discipline-specific repositories have a browse of the Open Access Directory listing of Discipline based repositories.
Disciplinary repositories like ICPSR (a DR for social sciences data) often allow users to use subject-specific metadata schema that enhance discoverability. They are focal points for their disciplines, meaning that your data will more likely be seen by those understand it. Repositories like those in the DataONE network are interoperable with the software that you and other researchers already use to collect and analyze data, making it super easy to deposit data as part of your regular workflow. Depending on the repository, they might offer DOIs for data you’ve deposited.
Not all disciplinary repositories allow you to deposit large datasets. Some do not offer DOIs. And occasionally, grant-funded subject repositories that don’t have sustainable business models shut down after their funding runs out.
Institutional repositories (IRs) are platforms where a university’s faculty and graduate students can preserve their research data and other scholarly outputs. At UON we have access to NOVA.
NOVA is free to use, allows for the addition of both basic and complex data descriptions, and issues a persistent identifiers that others can use to cite and find your data easily.
Resources in NOVA are harvested via Google and other major search engines including the National Discovery Service, ensuring that your publications and work can be found via the Internet.
And by virtue of being backed by a university and administered by librarians, they’ve got a degree of trust that money can’t buy; many universities have been around for a hundred or more years, librarians have been stewards of the scholarly record since the times of the Ancient Library of Alexandria, and both will likely be around long after the Googles of the world have been shuttered.
Like a lot of repositories NOVA has limitations on how large the data-sets published to it can be, but there are other options for storage within UON’s research support framework. In fact there’s plenty of expertise for you to call on, but we’ll touch more on that when we look at sharing and managing your data in today’s homework.
A user’s guide to sharing
There’s some really important things to be aware of when sharing data: How open and publically available do you want your data to be? Will you mediate it? Can you share it? In addition to some of the drawbacks addressed above, the biggest limitation to the idea of making your data openly available is that not everyone can do it! If you work with sensitive data – defined by the Australian National Data Service (ANDS) as: data that can be used to identify an individual, species, object, or location that introduces a risk of discrimination, harm, or unwanted attention – you often can’t share your data openly online.
That said, some repositories like ICPSR do index sensitive data, making it available to registered users. The availability of a metadata record alone can sometimes be enough to cite sensitive data, and so it’s possible that you can still get cited, even if your data isn’t open access. But we don’t recommend keeping your data behind a login or other barrier if you don’t have to.
Unsure if your data is “sensitive”? Check out ANDS guide to Ethics, consent and data sharing, which can help you identify applicable laws and regulations.
Note: Depositing data with public availability in mind is sometimes a funding requirement. Take the Australian Research Council (ARC) for instance. The ARC does not mandate open data. However, researchers are encouraged to consider the ways in which they can best manage, store, disseminate and re-use data generated through ARC-funded research. Researchers are asked in their grant applications to articulate plans for the management of data generated through the proposed project to make data as openly accessible as possible for the purposes of verification and for the conduct of future research by others.
For today’s homework we’re going to take an in-depth look at preparing to deposit. By now you’ve probably had a good think about your research data and what is most likely to bolster your research impact through being shared. The UON Data Management Toolkit aims to promote good practice in data management and is an absolute wealth of information on data management, sharing and depositing. Take some time to explore the different sections on management, policy, citing data, finding data and sharing it.
Now that you’ve given it some thought and know what’s involved, take advantage of the options UON provides in the areas of data archives, digital repositories and data centres.
If you have data you wish to deposit please contact Vicki Picasso, Senior Librarian Research Support at Vicki.Picasso@newcastle.edu.au. Vicki will be able to recommend the best course of action for sharing your data.
Great! Next time you’ve got a dataset that you want to share with the world, do it!
Tomorrow, we’ll explore publishing Open Access for more citations.