Sunday 2 December 2012

Week 17 and 18, ANDS Project

Prepared for steering committee meeting and had a long meeting with Luis from ANDS re the audited records.  Working manually with the records is very time consuming.  I eventually finished adjusting the records which was extremely difficult to work methodically though.  I was working on the last record just deleting the final erroneous entry when the record just vanished.  For one horrible moment all the records did not seem to exist but all the records except for the one I was working on did reappear.  I did recreate the record but this was an activity record where I had drawn information from a website for a research fellowship.  Unfortunately in the length of time between creating the record and the record disappearing the fellowship web site had been update and the information summarised.  As I had entered the information directly into the RDA web site, I had no backup of the information.  This is a big mistake and I will never enter records like this again.
In future I will make a word or Excel working document to detail the record and then cut and paste across into RDA.

Sunday 18 November 2012

Week 16, ANDS Project

I had recorded the interview with the systems biology director, even though I had taken notes I described to transcribe the interview.  This did take quite a long time (50 minutes interview), but it was worth it as I had not recorded some important information.  The director had suggested that 20 minutes was an appropriate amount of time to spend interviewing researchers about their practices, so I have cut the number of questions down to 10.  I did think I could then do a follow up visit for any data collection material and if they are willing to talk for longer I can take the time.
Spoke to Luis, re implementation plan and I think I have a semi-plan of how to proceed now.  It is quite difficult to resolve who the audience is for that plan.  I believe it is for the executive level of the University and therefore should be at a high level strategic level.
Finally getting around to changing the records to meet the ANDS requirements, it is incredibly hard to do mostly because the SpreadSheet ANDs use to return the results does not match the tabs in the interface where you are fixing the record.  I have printed the spread sheet out but because it is so wide you don’t get the element names with the record.  Also one of the authors is away until the 8th of December.

Monday 12 November 2012

Week 15, ANDS Project

Wrote up notes from the eResearch conference into blog and completed the monthly report.  I have done some more work on the USQ data management plan, the difficulty is in remembering that this is a plan to be completed at the start of a project and therefore the researcher cannot and will not have all the answers needed to deposit a collection into a repository.  Perhaps we need a fallow up project information sheet for the end of a project, which could take the form of a deposit questionnaire for a research data repository.
I went and spoke to the systems biology centre director about RDM, this was really interesting because I got to ask centre level questions from an administration perspective.  So things like do you keep a list of the centre research projects that are happening the answer to this is no.  Therefore even at a centre level there is no clear idea of the total research activity.  This centre uses USQ provided storage but also relies on computer HDD’s and external HDD’s.

Took my proposed data management plan to the steering committee on Wednesday, minor changes were made such as adding a size column to the storage needs table and including space for an author to include an ORCID identifier.  Otherwise the plan is ready to be trialled. 

Sunday 4 November 2012

Week 14, ANDS Project

The eResearch conference was in Sydney Monday to Wednesday.  I had a terrific time and came back with a lot of ideas.
The keynote note speech was from Dr Clifford Jacobs about Earthcube he stated that the knowledge environment can be done in quartiles.
Making knowledge visible
Building knowledge intensity
Building knowledge infrastructure
Building knowledge culture
·         Value and culture
·         Rewards
·         Trust
·         Sharing exchange


Challenges
·         Collections and sufficient metadata
·         Trust
·         Usability
·         Interoperability
·         Diversity
·         Security
·         Education and training
·         Data publication and access
·         Commercial exploitation eg Google map
·         New social paradigms – crowd sourcing
·         Preservations and sustainability
·         Stakeholder alignment

Must be able to create value by expanding the available pie
Mitigate harm
Ensure systems are stable and agile
A social network was established early and also collaboration encouraged for instance had seven different proposals for ontologies and semantic web groups, they asked to group together to try to minimise silo’s of information.

The parts not developed well by researchers were
Governance and standards
Discipline specific needs and drivers
Education and training

The practical things were well taken care of by the researchers.
The top 6 quoted barriers were
·         No time
·         No repository
·         Files too big/ server too small
·         Don’t want their research to be scooped by other researchers
·         Uneven standards of metadata

Seven modes of success
·         They are proactive
·         Began with an end goal
·         Prioritised tasks for instance governance first
·         Emphasised non-competitive and broadly inclusive process
·         Listened to the community
·         Facilitated synergy within and across communities
·         Engaged and energised colleagues

Failure reason
·         Unrealistic and mis-aligned expectations
·         An attitude of build it and they will come
·         Not valuing what exists
·         Not advancing the frontier in transformational ways
·         Not engaging researchers
·         Not anticipating needs of the next generational researchers
·         Unknowns

Dell cloud product

Data always stored in Australia
For most people the country of storage is the no. 1 issue
Made the point that if you can move the data into the cloud easily also make sure you can move it out.

So the Dell solution is
Open
Integrated via Boomi
Security, the client have access to the security information

Can pay by     
·         Hour
·         Month
·         Or blade

Oracle
Digital preservation is a series of managed activities
Necessary to ensure continued access to digital materials for as long as necessary.
Issues of format of file type and storage medium.
Additional challenges
            Humanities
            Data not born digital
Digital data has large data protection problems for instance bit flip and bit rot
Multiple copies
Data integrity and validation

A facility must say if there are problems with the data.
Preservation standard OAIS
Software          {Tesella
                        {Fedora commons
                        {DPSACE
                        {DURACloud

This software only puts data into a preservation format, they do not carry out data curation
Preservation is a service

Analogue -> Digital
 
Ingest and convert to preservation format –PDFa = PDF for archive
                      
Automated verified tiered content infrastructure

JHOVE is an application which shows what file format is used.

So automated transfer from one technology to a new one.
Oracle T10000C provides data integrity by using CRC.
Tape has a shelf life of 30 years

TERN
To use Ausgoal implies copyright which implies some person had input however in the instance of a sensor feed no person is involved and therefore copyright does not exist.

Birds of a feather session Day two

Data management can be measured on two axis
·         How much research funding did we gain?
·         How many publications?

The question is how much better research did we do

Data management bis about a conversation so we have to be prepared to listen as well as talk.

Have to show the researchers something that means something
Show them something that they might be interested in

Need to build a collection of success stories that apply to the researcher and the level of research they are doing.
Therefore need to cheery pick stories to suit audience.

Communication is the most important thing.
Monash used a forum of bright ideas over breakfast (supplying breakfast)
So used researchers as a speaker and the speaker is also a champion of data management practices.
The venue makes a difference
There is a positive correlation between data sharing and a researchers H index

CAUDIT (Council Australian Universities Directors of Information technology) have been benchmarking on IT spending since 2003 in the first year there was resistance but then the directors started to share more and more.  CAUDIT now produce a report for their members only and the members really like the information with now a 100% participation.  The members are now ringing and asking for the report early.

Intersect – talked about the difficulty of measuring how successful their intervention is the research community.  Talked about the importance of forming a community of practice of eResearch.  They found Breakfasts worked well when informal with a short presentation.  However they did need a short agenda.
Need to tell the story from their perspective so how is this going help with funding.  NECTAR uses hypothetical measures of success such as use using Google analytics.

Choose case studies carefully as an early adopter is not necessarily the best case study.

The problem researchers have is
·         Distributed data
·         Messy data
·         How to keep data secure.

QCIF – will have a cloud storage facility by late December

Data citation enables
Better researcher discoverability
Enablers acknowledgement and reward for researchers
The researchers H factor is enhanced and therefore the institutions reputation is enhanced.
Question
How will USQ get visibility for researchers and their data.

The value of data citation

User driven
Good data management practice
Need to reference their own data
A requirement for stable links to data is served through the provision of a DOI


Data collection is expensive
The emphasis is on collaboration not competition
Opportunities can be lost through lack of access to the data
Data is irreplaceable
DOI’s part of the solution
Performance metrics
Policy must be relevant
Data metrics (DOI’s, citation indices)
Data-cite has a global reach

CSIRO for data citation uses a system based automated process.
Researchers are interested in DOI’s and understand them.
They are beginning to see papers of digital citations coming through.
ANDS working with Scopus and Elesevier
Alt-metrics are able to give some metrics without the traditionally long wait of traditional publication.

http://altmetrics.org/tools/   has a range of alt-metrics tools (thank you Pat Loria)

Wednesday 31 October 2012

Week 13, ANDS Project

Had two meetings on Monday, one was with the new member of the working party as the researcher representative.  It was very informal but we got to talk about the data management project and I filled her in on some of the background to the project.  I also worked on the outline for the RDM toolkit.
I gave a presentation to the research librarians giving a general overview of data management and we then just had a discussion around data management and it’s definitions for instance what a data collection was.  This is so if they are asked about data management in the faculty they would have a good general idea of the main concepts and know that I was the person to talk too.
Spoke to Luis about preparing an information session around a researchers needs.  I am still not sure how to make the session sound attractive to entice researchers.
I went to a HeardC presentation in my lunch time just so I could start to understand some of the metrics that are used to measure researcher and university value.

Sunday 21 October 2012

Week 12, ANDS Project

I completed the first draft of the terms of reference for the reference group which I will get reviewed for the first reference group meeting.  I have not tried drafting TOR before so a new task for me.  I tried to review what we would really need a researcher perspective on. The plan is that the reference group members will also become spokespeople for the research data management project.
I completed the presentation for new researchers for the CRN projects on Thursday.  I have tried to take a more research lifecycle point of view rather than a data centric point of view as per the Library Loon’s blog post.  The presentation went well and I think focusing on the researcher milestones rather than the data milestones helped to give the presentation better relevance to the researchers.  A number of researchers came and spoke to me about DM during morning tea, showing interest in practical solutions.
The reference group meeting went well and the terms of reference were accepted almost unchanged.  We have two members with an HPC interest, a climate scientist and archaeologist.  This appears to be a good number of people to achieve some action.  They read and commented on our draft research management policy.  The only change suggested was a slight change to the wording of one section.
I wrote an information sheet for our research data management tool kit on data storage.  I think I have it at an appropriate practical level.

Wednesday 3 October 2012

Week 11, ANDS Project

Spent time working on the procedures for an information security procedure for research data management.  Meeting with the policy officer again on Wednesday.
Read very interesting post from UWS on the workflow of research data management.  They had included some sequence diagrams that helped to express the workflows very neatly.  On http://eresearch.uws.edu.au/blog/2012/08/17/potential-research-data-repository-data-management-use-cases-for-discussion/ I think this really adds to the conversation about the workflows here.  Another thing which resonates with me is the Library Loon statement that data management lifecycles place data at the centre but this is not how researchers see the place of data.  Instead we should see how data fits into the research lifecycle http://gavialib.com/2012/08/data-lifecycles-versus-research-lifecycles/. 
The library has brought a copy of the book Managing Research Data (Pryor, G (ed.) 2012, Managing research data, Facet Publishing, London.)  the chapter I have read so far are interesting particularly the one on the life cycle although it takes a very data centric point of view rather than a researcher centric view.  However the information is good.  Some of the chapters take a very library orientated point of view so focus on what data management means to the librarian.
Still been working on the procedures part of this issue is working out the workflow of the researcher so I will draw on some of Peter Sefton’s sequence diagrams.  I am trying to examine necessary procedures for information security.