The History Data Management Lifecycle (HDML) model

Site: Postgraduate online research training
Course: Module 1: Introduction
Book: The History Data Management Lifecycle (HDML) model
Printed by: Guest user
Date: Monday, 29 November 2021, 11:29 PM

Description

The History Data Management Lifecycle (HDML) model

1. Glossary of Terms

Curation Lifecycle Model

DCC’s complex and comprehensive model for plotting the lifecycle of the data management process.

DCC

Digital Curation Centre.

Gestalt

A complete or whole pattern of concepts, activities or events that jointly form more than the sum the sum of their parts.

Gestalt Cycle of Awareness

A cycle paradigm based on the creative process as a cycle of experience, developed by Joseph Zinker (Zinker 1977) based on the work of Fritz Perls (Perls, Hefferline, et al. 1951).

HDML

History Data Management Lifecycle.

HDMT

History Data Management Training.

HRL

History Research Lifecycle.

Linear

“Flattened” versions of the HRL and HDML that present a timescale approach to the Lifecycle process.

Paradigm Shift

Thomas Kuhn’s concept of the revolutionary shift of awareness from one way of thinking to another; seeing things from a different perspective (Kuhn 1962).

Phases

The typical Lifecycle is depicted in “stages” or “phases”. “Stages” imply fixed events to be completed before the next is approached whereas “phases” are more fluid and may overlap or be revisited as necessary.

Phase Cycles

“Mini-cycles” that complete each phase of a larger, overall project cycle.

 

 

2. Introduction

The History Data Management Lifecycle (HDML) model provides a graphic perspective of how a typical research project, research thesis, or any similar research process work may be visually conceived.

The Digital Curation Centre (DCC) has developed a highly structured, technical Curation Lifecycle Model that “provides a graphical, high-level overview of the stages required for successful curation and preservation of data”. Details of the model are available at DCC: Curation Lifecycle Model

While the DCC model provides highly developed, process driven insights into the typical project development lifecycle, it is also:

  • complex
  • multi-disciplinary in focus
  • generalised and necessarily genre generic
  • information heavy

The HDML borrows from the Digital Curation Centre's (DCC) Lifecycle Model, but presents a more history specific approach that may make the entire History Data Management process more accessible to historians and history researchers.

The following sections invite you to partake in a few exercises that will encourage you to prepare the following:

  • A History Research Lifecycle (HRL)
  • A History Data Management Lifecycle (HDML)

The outcome will be a visual depiction that will accurately reflect your own research work, be it a large or a small research project, a thesis or a dissertation. By developing your own lifecycle model, you will gain insights into the processes that you need to be aware of and to act upon. The successful completion of any project, especially in the digital age, relies on a clear and concise data management strategy. The HDML will enhance the clarity and fill in the detail required to meet these requirements.   

3. The History research Lifecycle (HRL)

One difficulty with developing the HDML is placing it clearly in context. A serious question that may be asked is:

“Where does the Data Management process fit with the lifecycle of a typical dissertation, thesis or piece of research?”

The answer is simple:

“... at every phase of the research cycle.”

A typical history research project (dissertation, thesis, funded project, etc.) will develop along a path from conceptualisation to completion. Figure 1 below demonstrates a typical History Research Lifecycle (HRL).

 

 Figure 2: The History Research Lifecycle (HRL)

The “phases” in the developing HRL (Conceptualisation, Literature/Material Review, etc.) are not necessarily fixed for every project; projects vary in content, context and perspective which will affect these phases in terms of relevance/applicability and/or placement. However, a typical research project will include the phases shown in Figure 1, which may be used to develop a project lifecycle that is specifically geared for a particular project.

Logically, if any of the phases are missed, skipped or incomplete, there will be an impact on the entire project. This may lead to serious delays, poor quality outcomes, or even complete failure.

The HRL is a continuous, cycling loop that may flow from one project or piece of research into another. The elements of different research may overlap with the current research, and phases may be developed in partnership (as in a multi-disciplinary and/or multi-institutional project). Each piece of research will have its own lifecycle that applies, but is not necessarily independent of other project lifecycles.


Figure 3: Multiple Research Lifecycles

One aspect that is often overlooked is that the logical cycle requires the researcher to regularly review all phases of the cycle throughout the lifespan of the project to ensure that each phase has met, and continues to meet, its desired outcomes. “Review” in this sense may mean a regular check to ensure that the protocols established in an earlier phase are being met and complied with. For example, A thesis that is being written up will need the researcher to regularly verify that the basic conceptualisation phase is being referenced to avoid going off topic, and ensue that the research aims and outcomes are actually being addressed. Similarly, failure to reference the literature review may well lead to unfocussed and/or irrelevant work being undertaken to the detriment of the overall thesis.

In some cases, the completion phase simply returns to the conceptualisation phase and the process begins again. Large, multiple output, multiple partner projects may fall into this category. Individual research, such as a thesis may also fall in this category if the researcher has ambitions to continue the research after the thesis has been completed. For example, the researcher may regard their thesis research work as an element of a larger body of work to be developed later I their research career. In any event, the research carried out, despite being highlighted in the thesis, may well be useable again after the logical lifecycle is complete.

3.1 Phase Cycles

Each phase of the lifecycle is an entity that itself may be viewed as a cycle. Each phase leads to change that in turn makes the following phase(s) relevant and logical. These mini-cycles may be described in terms of a gestalt (whole, or pattern of interwoven processes) cycle of awareness. Each phase will

  • Beginning of awareness of the phase
  • Understanding of the requirements of the phase
  • Preparation for action needed to carry out the work of the phase
  • Action: Putting the preparation into practice
  • Carrying out the research work based on the preparation
  • Carry out verification that the requirements are being/have been met
  • Moving on to the next phase

For example, the Research phase will commence with a beginning of awareness of what is to be researched based on the previous phase (Literature/Material Review). This awareness will grow into understanding of the research requirements as the previous phase is developed. With understanding of the requirements for research, preparation can be made to ensure that the research can be carried out; for example, relevant people, institutions, etc. who have access to relevant identified resources can be contacted to ensure the availability of those materials for the research effort. Actual access to resources (e.g. people, documents, artefacts, curated materials, etc.) action can be taken to put the preparation into practice. The actual research work is carried out; this is the effective expertise of the research historian and does not need to be explained here. Once research is carried out, its effectiveness needs verification to ensure the requirements have been met. This brings the phase to a close and has effectively prepared the researcher for moving on to the writing up phase to begin.

 

Figure 4: Phase Cycles

Each of these mini-phases is a “cycle” (see ) which implies that once complete, it returns to the beginning to be re-run. The process is continuous, and anyone who has completed a project, thesis or dissertation will know the feeling that they could have done so much more- this is because the cycles are still in full flow and will never end.  The successful researcher will be awareness of his/her limitations and prescribe a satisfactory cut-off point for the various elements of their project. 

3.2 HRL Learning Summary

The HRL highlights, amongst others, the following issues that are vital for researchers to observe:

  • Phases of the cycle are dependent on each other
  • Failure to identify each relevant phase may result in project failure
  • Failure to complete one or more phases may result in project failure
  • Each phase requires continual review to ensure full project completion
  • Phases may overlap with other research projects
  • Each phase effectively has its own mini lifecycle
  • Completion may be contingent on a new cycle beginning

3.3 Creating an HRL

Exercise

Complete a History Research Lifecycle for your own project using the structures indicated here to assist in the process. Remember that there is no right or wrong answer here, and due to the fluid nature of the cycle concept, it can be revised to better suit your project, particularly in the early phases, as project awareness grows.

 

Figure 2: The History Research Lifecycle (HRL) and Figure 4: Phase Cycles may be especially helpful here.

figure 2: The History Research Lifecycle (HRL)

 

4. Lifecycle or Lifeline?

The cycle concept has much to recommend it. It highlights that each phase of a research project needs to be continually revisited in order to ensure that successful completion may be achieved. One difficulty is that this fluid concept does not lend itself to the very real world problem of timescales.

The HDML recognises that the “cycle” concept of a project may be better understood in terms of a linear development in certain situations. The idea is that the “circle”, which starts and ends at the conceptualisation/completion phases, is stretched out into a curve or line. This better indicates the vital research considerations of timescales. It may prove helpful when applied to project management processes such as GANTT charts and Critical Paths Analyses, but critically, it can be used to simply put a timescale on the various elements that need to be completed in order to reach the final outcome of the project.

The diagram below highlights this linear approach for a typical research project.

 

Figure 5: Cycle into Linear depiction

 

For many this is a clearer way to understand the Cycle, but some people will find that the linear depiction works better from top down. Which way works best for you?

5. History Data Management Lifecycle (HDML)

The HDML is a vital concept within the History Research Lifecycle. A well prepared Research Lifecycle will identify the various elements that comprise the Data Management Lifecycle, but this may appear in a disjointed manner with separate elements being seemingly unconnected or, even worse, unimportant.

The DCC Curation Lifecycle Model provides a very detailed and complex approach that identifies the processes that are relevant and important. However, this model is highly generic and serves to identify Data Management aspects for any and all research types. It is not discipline specific, neither is it aimed specifically at researchers; rather, it is intended for Data Managers and technicians who have deep insights into data processes.

Comparing the DCC Curation Model with the HDML, it is clear that there are few clear crossovers or easily identifiable points of similarly. The HDML uses the DCC's identified elements that are vital for successful Data Management, and places them into a history research context.

Multiple phases can be identified in the cycle, and these may often overlap or even appear to be repetitive at times. These phases may be of particular interest to Data Managers, ICT Technicians, and Computer Science personnel and students. They are not necessarily of interest to history researchers, and will be combined into more manageable phases that more closely reflect the work of the Historian.

Figure 6: Mapping the HRL to the DCC’s Curation Lifecycle Model, highlights the complex process of mapping to a highly developed model. The HDML addresses the various components in terms of the more manageable HRL.

Figure 6: Mapping the HRL to the DCC’s Curation Lifecycle Model

Based upon the HRL, and including the many elements from the Curation Lifecycle Model, the HDML links directly to the recognisable elements that the historian will typically encounter during any research project or process. Each sequential element needs to be carefully considered in order to identify components that require completion or development before the next components may begin.

The HDML is not a standalone cycle. It fits around the HRL and maps directly to the research process that is the work of the historian.

 Figure 7: HDML mapped to HRL

5.1 Conceptualistion

Identify what is being researched:

  • Recognise what “data” are being researched and what the intention is for these data.
  • Where is the data coming from?
  • Why is the data being targeted?
  • How accessible is the data?
  • Is the data relevant to the research remit?

 

Examples:

  • A manuscript from a particular archive will be transcribed and saved as a document to be referenced during the research
  • Sets of data from various sources will be gathered into tables and referenced in the research
  • A series of interviews will be recorded (oral history); information form these interviews will be assimilated and referenced in the research
  • A set of images (photographs, drawings, etc.) will be saved electronically and referenced in the research
  • An online blog will be maintained with responses being saved for research
  • A series of physical artefacts will be gathered and curated as a tangible outcome of the research

 

5.2 Literature/Material review

Describe the data by providing a full description of the data is essential in order to properly understand its relevance and validity. At this point, irrelevant, inaccurate and spurious data can be identified. Data that clearly assists in the research process can be effectively “tagged”. This process is essentially the same as the Literature Review.

  • Each data source must be referenced accurately
  • Each data element, or group of elements should be verified
  • The formats of each element must be clearly identified
  • The physical dimensions, resolution, and/or storage capacity must be identified
  • Copyright and ethical considerations need to be fully accounted for
  • Any licensing restrictions that apply should be identified
  • Protocols for holding the data need to be checked
  • Accurate naming and referencing conventions must be applied to ensure accessibility

 

Example:

Image title: RAF Sopwith Camel

Description: RAF WW1 Sopwith Camel biplane fighter, Sopwith Aviation Company, 1917 License: artistic work created by the United Kingdom Government is in the public domain, HMSO has declared that the expiry of Crown Copyrights applies worldwide

File Size: 1,024 × 768 pixels, 454 KB, jpeg

Unique File reference: Image0095

Image may be reproduced freely

 

 

5.3 Research

Preserve 

The research phase incorporates several aspects of the data management process.

Preservation of the data is crucial to the availability and accessibility during the project, and possibly beyond.

  • Identify how the data or material will be maintained during the project
  • Consider whether maintenance is required after the end of the project
  • Generate a plan to secure the data for the required period

 

Example:

“File will be stored on laptop, on a backup drive, a USB data stick and the University Server. The data will be required for at least 3 years after the end of the project; the University has agreed to preserve the data for this period.”

 

Conceptualise

Plan how the data will be generated, created or incorporated into the research. Consider how the data will be captured, and how/where it will be stored.

 

Example:

“Images (photographs) taken on location will require an eight megapixel camera, reasonable lighting and reasonable weather conditions. Several attempts may need to be made to capture the relevant images. The images will be formatted into .jpg format and saved on a local laptop, a USB drive and the institution server.”

 

Create or Receive

As part of the research process, new and innovative data may be created. Similarly, data new to the project may come to light and be received for incorporation into the project. These forms of data will need to be processed in much the same way as other data. A clear description needs to be provided, preservation needs to be considered, and a concept of how the data will be included in the project needs to be formed.

 

Example:

“Data generated from the logbooks and diaries will be entered into a database which will be saved in a format to be agreed with the project supervisor. It is envisaged that the database will not exceed 2000 entries and total about 1 Mb in size. The resulting database, once populated will be used to extract tables for inclusion in the thesis as appropriate.

 

 “Similarly, data identified as the project develops will be saved in a useable and robust format (e.g. PDF files).”

 

Appraise and Select

Data must be evaluated to determine its validity for the research process. It is vital that documented guidance, policies and legal requirements are adhered to.

 

Example:

“The proposed oral history interviews will require research into local [institution] policies to ensure compliance with local regulations, and to act in accordance with applicable laws such as the Data Protection Act. This information will be acquired from Supervisors, Project Managers and Research Leads.”

 

Ingest

Data gathered or created will need to be placed into an archive, repository or data centre. This goes beyond simply holding data on a local computer or a data stick. Transfer to a robust repository is strongly recommended. Again, local institution policies must be adhered to when data is transferred.

 

Example:

“Apart from storage on a  local computer, the data will be stored on a institution server. The data transfer will be facilitated by IT Services and/or Library/Archive services. Policies relating to volumes and formats will be adhered to.”

       

 

Preservation Action

Based on the planning carried out in the previous review phase, the long-term preservation of the data must be enacted. This may require the researcher to comply with a strict file naming process, removing inappropriate, irrelevant and/or repeated information. The file structures and integrity of the data must be checked and verified.

 

Example:

            “A specific file naming format will be applied in the following form:- NumericDate_ProjectName_FileDetails.filetypeextension

> 20131204_ENACT_localawarenessofarchives.doc

> 20131213_ENACT_michaelbirdinterview.wav

 

“Files will be checked (read/listened to/viewed/run) to verify their integrity, and accurately named as shown above. Any repeats or varying versions will attract different numeric dates, while repeated or irrelevant data may be deleted.”

 

 

Storage

Once the relevant checks have been carried out, the data should be stored in the appropriate location.

 

Example:

“Institutional repository or server as indicated by IT Services.”

5.4 Write Up

Access and Reuse

Having carried out the relevant research, including the various data management aspects detailed before, the data will now be readily accessible for use and reuse in the writing up of the work. The actual writing up is the remit of the researcher – a basic function of “doing history” and will not be explored here. During this phase, attention should still be focussed on the elements identified during the research phase.

 

Example:

Preservation: Have I determined how I will hold and maintain the data that I collect?

 

Conceptualise: Have I clearly determined how I will located and acquire the data I will be using?

 

Create or Receive: Have I created or received a database and/or data field structure that works suitably with my collected data?

 

Appraise and Select: Have I checked that each data file or element is relevant and useable in the project?

 

Ingest: Have I determined exactly where my data will be stored? Is it safe and reliable?

 

Preservation Action: Are my data files accessible based on accurate naming, and are they verified as useable?

 

Storage: Is my data stored securely in a repository or an institutional server?”

5.5 Assessment

Assess

The project assessment phase relates to checks carried out by project mentors, tutors, supervisors, team leaders, colleagues, etc. At this point, the researcher is afforded the opportunity to receive constructive criticism and identify areas of required development. Shortfalls in terms of data security or integrity could be examples of identified areas for further work.

 

Example:

“Has the data part of my research (as well as the written or transcribed elements) been assessed by my supervisor and/or project Lead?”


5.6 Project Review

Reappraise

Based on the Assessment phase, identified areas of further development may be addressed. This requires the researcher to repeat all previous phases and implement the identified elements from the assessment. This is also the opportunity for a full review of the processes at each point of the cycle. Corrections and amendments can be safely applied.

 

Example:

“Have I clearly determined which data aspects were identified during the assessment phase? Have I addressed all of the issues that relate to changes, corrections or revisions to my stored data?”

5.7 Publication/Dissemination

Transform

Following the review, with corrections applied and all reasonable steps taken to maximise the integrity of the overall project, the result of the research be published or disseminated.  Published data may take the form of, for example, a database or datasets provided online. 

This phase  may be the prompt for the researcher to formulate different formats of availability. For example, a database may be provided online via a website, as well as held locally on an institutional repository for local use.

Vitally, the extent of the data to be published is affected at this juncture. The researcher will identify which specific elements of the researched data are to be published, and which elements do not need to be made available.

 

Example:

“Check that all revisions and changes have been made, and have been assessed. The final format for data publication will be checked by myself, my supervisor and the relevant IT personnel to ensure it complies with the standards of my institution, is accurately referenced within my other work (written thesis), and that references to my other work within the data are accurately maintained.

 

”The data files will be migrated to the institution’s HYDRA repository (in the case of the University of Hull) and maintained according the institution’s data management policies. This will include regular backups, data integrity checks and viability studies during the lifespan of the data – typically ten years after the last use of the data.”


5.8 Completion

Completion of a research project invariably leaves the researcher with the insight that if more time and resource were available, even more could have been achieved. In terms of data management, this aspect is vital to the overall process.

Any data collected and collated during the project remains valid and important after the specific research has been carried out  and the project is complete. Access to the data remains a legacy of the completed project, and often spawns new or continuing projects that build on the existing research.

Disposal and Migration

At some point the data will eventually be disposed of, or it may need to be moved from one location to another. With awareness of these issues, disposal of useable, valuable data may be avoided. Continuing monitoring of the availability and location of data will ensure its continued viability for future use. Similarly, data are often altered in format to overcome issues of obsolescence in terms of hardware or software. 

Document files maintained in an obsolete format may be reformatted to work with newer programs. During the migration various formatting issues may arise: tables may be incorrectly copied, text may be corrupted or even lost, or the document may simply be completely corrupted and unusable.

 

Example: 

”Regular checks on the viability of the data will be carried out. Bas3ed on my institution’s policies, I will verify that the data will be made available for the duration of the project, and for any agreed period after the completion of the project. During this phase, based on institutional policy, as indicated in the previous phase, backups and integrity checks will be carried out until the data is no longer required.”

6. Creating a HDML

By following the phases proposed in the previous section, it is possible to create your own History Data Management Lifecycle model. The outcome will be to highlight the requirements and level of resource needed to ensure that the data management aspects of the project will succeed alongside the other phases of the project. By this point it will be clear that data management is an integral aspect of the project rather than an “add-on”.

6.1 A Unique History Data Management Lifecycle

Here is an example of how you might use this training course to help you to create your own History Research Lifecycle (HRL) and Data Management Lifecycle (HDML). This is only a suggestion and you might wish to print this page off and add your own references within this course and elsewhere as a guide for future use.

Exercise

Applying the History Research Lifecycle (HRL) created earlier (3.3 Exercise) apply the phases identified in 5. History Data Management Lifecycle (HDML):

History Research Lifecycle

History Data Management Lifecycle

Training Reference

 

 

 

5.1 Conceptualisation

Identify

Section 2.1

 

 

 

5.2 Literature/Material review

Describe

Section 3.5

 

 

 


5.3 Research

Preserve

Section 2.3

 

Conceptualise

Section 1.3

 

Create or Receive

Section 2.1

 

Appraise and Select

Section 3.3

Section 3.4

Section 3.5

 

Ingest

Section 3.1

 

Preservation Action

Section 3.1

 


Storage

Section 2.3

 

 

 

5.4 Write Up

Access and Reuse

Section 3.1

 

 

 


5.5 Assessment

Assess

Section 3.1

 

 

 

5.6 Project Review

Reappraise

Section 3.1

 

 

 

5.7 Publication/Dissemination

Transform

Section 3.1

 

 

 

5.8 Completion

Disposal and Migration

Section 3.1

 

 

 

 

7. Bibliogaphy

'DCC Because good research needs good data', Data Curation Centre (2013), Retrieved 10 July 2013.rch needs good data." Retrieved 10 July 2013.

Kuhn, T. S., The structure of scientific revolutions (University of Chicago Press: Chicago, London: 1962).

Perls, F. S., R. F. Hefferline, et al., Gestalt therapy: excitement and growth in human personality (New York: 1951).

'Research and Innovation', University of Hull (2013), Retrieved 15 July, 2013. 

'Support for researchers', University of Hull (2013), Retrieved 20 July, 2013.

'Research Data Management', University of Hull, Library and Learning Innovation (2013), Retrieved 20 July 2013.

Zinker, J. C., Creative process in Gestalt therapy. (Brunner/Mazel: New York, 1977).