Storage options

Site: Postgraduate online research training
Course: Module 2: Planning the research
Book: Storage options
Printed by: Guest user
Date: Thursday, 29 October 2020, 8:01 AM

Description

storage options

1. Introduction

Digital storage devices are inherently unreliable and therefore a backup strategy is vital. This section looks first at some suggested storage strategies and then at backup tools that you might wish to consider.

Essentially, making regular back-ups of your data is a means to protect yourself from accidential (or malicious) data loss. Reasons for data loss occuring include:

  • hardware failure
  • software faults (corrupted data)
  • virus infection 
  • power failure
  • human error

The statistics for data loss are particularly scary.  

  • A hard drive crashes every 15 seconds.
  • One in 5 computers suffer a fatal hard drive crash during their lifetime.
  • 25% of lost data is due to the failure of a portable drive.
  • 31% of PC users have lost all of their PC files to events beyond their control.
  • 32% of data loss is caused by human error.
  • 60% of companies that lose their data close down within 6 months of the disaster.

Data Loss Statistics (taken from the SHARD Data Preservation course)

The amount of advice and disccusion online about data loss speaks for itself. For instance, on YouTube the University of Edinburgh Data Library have produced a variety of videos about data management and, in particular, about storage. Look at this video example below, where Edinburgh's Jeff Haywood describes the dangers of data loss.

 

This and other videos from the University of Edinburgh's Data Library can be found on their Youtube channel here.

2. Making Back-ups

When choosing a system for backing up your research you should consider at its heart just what it is you need to back-up. What will you need to restore in the event of data loss?

Consider:

  • Do you need to back up software as well as files?
  • Do you have data on more than one device or different data on different devices? How will you manage back-ups without losing the integrity of the individual data sets?
  • How will you deal with version control (i.e. the same file but saved with a different file name as it is a newer version. Versions of the same file are obviously useful when you make significant changes and you want to ensure that the original version remains intact in case you ever need it again).
  • How often do you need to back-up data?
  • How will you organise and label back-up files and media?

There are many other questions to consider through this process as well. For instance, the UK Data Archive argues that best practise for backing up should include the following:

  • Store data in non-proprietary or open standard formats for long-term software readability (we will talk about this in the section on software)
  • Copy or migrate data files to new media between two and five years after they were first created, since both optical and magnetic media are subject to physical degradation
  • Check the data integrity of stored data files at regular intervals
  • Use a storage strategy, even for a short-term project, with two different forms of storage, e.g. on hard drive and on CD
  • Create digital versions of paper documentation in PDF/A format for long-term preservation and storage
  • Organise and clearly label stored data so they are easy to locate and physically accessible
  • Ensure that areas and rooms for storage of digital or non-digital data are fit for the purpose, structurally sound, and free from the risk of flood and fire

In this book we will look at some of these questions in detail, starting with advice on making master copies of your research data.

2.1 Making Copies

A back-up is a copy of your files made at a specific moment in time (or automatically updated at a scheduled time). Depending on your needs you may wish to back up daily, weekly, or monthly, but whatever the frequency you should appoint a time to do this which is followed rigidly.

The back-up copy is a ‘master copy’ of your data. In other words you do not edit individual files within the back-up and you do not treat it like you would with your original version. Think of the back-up as just that. It is a version of your files that you can use to restore information if it is lost or damaged in the original copy.

The UK Data Archive suggests that mater copies should be made in an open, as opposed to proprietary, format for long-term validity. This might not always be worthwhile during the research but is something that should certainly be considered for long-term preservation strategies. In most cases what you are after with a back-up is a version of your files that is complete and accurate as a version of your working files. A back-up should be identical to your original in all ways at the moment of its creation.

What is perhaps more important is to ensure that you have more than one back-up at least of important files. For example, it’s not enough to have the original copy on a PC or laptop and a back-up on an external hard drive. If there’s a fire or burglary then both devices might be lost. Neither is it wise using a Cloud storage option alone. On the surface you might think that having multiple devices containing the folder is a secure means to ensure data survives even if one device is lost. Not so. If one device is stolen then the Cloud storage files are at risk. There is also no guarantee that data won’t be lost in the cloud storage system, thereby suddenly leaving you with no data anywhere.

A good back-up strategy would be to go for a mix of options that is regularly and strictly followed. Have at least three copies of your files. There are many variations that you might try. Here are a few examples.

Option 1

  1. The original copy used for editing and creating new data
  2. A back up on a physical device such as DVD, external hard drive, USB Stick
  3. A Cloud or Internet option where files are stored online and accessible from other devices

 Option 2

  1. The original copy used for editing and creating new data
  2. A copy on your institutions server
  3. A copy on an external hard drive kept at home or in another building

Option 3

  1. The original copy used for editing and creating new data
  2. A copy on a CD/DVD stored in the same building as the original copy
  3. A copy on a CD/DVD stored in a different building

2.2 Full, Incremental or differential backups

There are essentially three types of backup:

Full back-up – this is where you make a copy of all the relevant files (often the compete contents of a hard drive or folder system) and then subsequently make a new full back-up of that data. This process can require a large amount of data to be copied at the time of back-up each and every time.

Suggested media: CD/DVD/external hard drives

Incremental back-up – This consists of first making a copy of all relevant files (often the complete contents of a hard drive or folder system) and then making incremental back-ups of all the files which have altered since the last back-up. Thus, the original back up remains as is, but any changes to files are later added. Thus to restore a file system both the original back up and the updates need to be used.

Suggested media: CD/DVD/cloud storage

Differential back-up – This preserves the original backed up files as they are except where differences are found between the back-up and the original version. As changes to your data are generally few (i.e. you might have added to one or two files only in your file system) there is no point completely redoing the process of backing up files which haven’t changed. Thus in differential back-up only files that are noted as different are updated.

Suggested media: external hard drives/cloud storage

The UK Data Archive suggests NOT overwriting old back-ups with new ones. This is good advice but not always possible for practical reasons (such as hard drive space). Thus, consideration needs to be made of how often a back-up is replaced (i.e. you might wish to hold two back-up versions at any given time). 

2.3 Data security

The security of where you place your data is worth consideration, especially if some of your material is sensitive. You need to consider:

Physical Security

Are the data and its back-ups safe from damage or loss? This might involve ensuring that where the data is stored is secure or controlled in terms of who can access it.

Network Security

If the data is sensitive (for example it contains personal information) then ensure that the server or computer is not connected to an unsecured external network in which it could be unknowingly accessed.

Security of media

Ensure that password protection is active in some way on sensitive data. Generally it is best to ensure that any device that you use is password protected so only you or those who you give the password to, can access it.

3. Back up methods

There are many devices and storage options available to you. Some of them will be expensive whilst others will be relatively cheap or even free. In all cases consider what you need from the back-up solution and how you will manage the back-ups during and after the research project.

3.1 Hard drives

A popular and easy method to back-up files is to use external hard drives. These are fairly affordable now and enable terabytes of data to be stored in the same place. This is useful for full back-up of data. If you choose this method make sure that you back up regularly and that the drive has enough spare memory for the increase in files that is likely to occur throughout the research process.

One thing to remember about external hard drives is that they can fail suddenly and unexpectedly. NEVER rely solely on external hard drives to store files as they will – at some point – let you down. They are, however good places for back-ups when you have at least one other device containing the same files.

3.2 The Cloud

What is the Cloud? The Cloud is an online storage option where files can be stored and accessed without being available to everyone on the Internet. Think of it as a personal hard drive that can only be accessed online.

The cloud is an inexpensive option for backing up and storing data. It is generally easy to use and free of complication. One of its benefits is the ability to access data wherever you are (depending on which option you go for). This means that you don’t have any physical hardware to worry about. If there is a fire at your work place or home you won’t lose your files, you can’t forget it on a train, and it won’t break down in the same way that a hard drive or USB drive might. However, this does not mean that it is entirely safe. Most Cloud storage does not promise data security, thus your files can be lost on the Cloud just as easily as it could be on your own drive.

There are of course options which preserve data better by making copies on multiple servers but these generally cost a little more.

There are two main types of cloud storage:

  1. Dedicated back-up service – these services do act like a proper back up of your files. Generally a copy is made of your files and then – at scheduled times – an automatic check is made which compares the status of your files currently to what is held on the Cloud. Any changes are then updated. This is useful to ensure a good back-up of your files is maintained, but often is more involved in getting hold of that data again. Thus this is a good preservation strategy but not useful if you wish to access those files quickly.
  2. Cloud sync services – these work very differently. A sync-service usually allows you to create a folder on multiple devices (and to access it via web browsers). This folder is magically shared between those devices so updating or adding a file on one device will automatically update it everywhere that the folder is contained.

When choosing a Cloud service you should consider the fact that many start-up companies offer competitive rates but don’t necessarily survive for long – there is generally no guarantee that your files would be accessible if the firm went bust. Also find out if there are any guarantees that you won’t lose data. Obviously no one can guarantee this 100% but there are various different levels of safety to consider. As your data will be stored elsewhere and possibly in many locations there is a higher chance that your data will be erroneously accessed by unauthorised persons. If your data is sensitive this could be a significant concern.

Another thing that you will need to consider is the size of your files (now and in the future). How big is the data set? Costs increase with the more space that you require.

Examples of Cloud storage include:

Dropbox

SkyDrive

Box

SugarSync

MediaFire

3.3 The Internet

A simple means to back up files (at least individually or in small sets) is to e-mail them to yourself. This is a long-standing and easy means to ensure that you have a back-up copy that is easily available to you and secured online (and perhaps on individual devices as well).

If you choose to do this as a regular thing, then you will need to consider how you will find and retrieve these files at a later date. Most e-mail services provide folder systems that allow you to organise your e-mails. Create some of these as you would do a folder system on your own hard drive. By this means you can create a good back up system, although you will also need to ensure:

  • Each e-mail is titled so that it explains what contents it contains. There is no use sending yourself copies of your files if you can’t find them again.
  • Once the file has been sent to your e-mail make sure that it hasn’t become corrupted in the process

Another option for using the internet as a place to store files is to use websites or blogs as a storage location. Even free blogs allow for a certain amount of document storage which means that you can save a copy of your file there for later retrieval or as a simple back up. In many cases you won’t need to make this publically available (although you might choose to do that).