Sunday, January 18, 2015

This blog is not dead....

This blog is not dead. I am tempted to say that it is hibernating, but since it went into shutdown mode in June it doesn't seem like an appropriate metaphor. It is definitely on hiatus.

My spare time has been taken up with other archaeological matters, but I hope that in February I will get back to posting links to archaeobotanical work - and writing about how to archive it and make the data open.


Monday, June 23, 2014

Impact of open data in archaeobotany?

Thinking about the "impact" of sharing my archaeobotanical data so far. Here are a few statistics as of the evening of 23 June 2014

Figshare statistics

So far the documents uploaded to Figshare (25 of them) have been viewed 430 times.Sometimes this was me checking links. Papers are more likely to be downloaded than datasets (this is a pity, I was hoping that people would do nice stuff with the data). Nobody is sharing the papers and dataset. I have not filed out a profile and used Figshare as a work space, or a networking platform. Most views must be generated by results within Figshare's search facility. I'm not sure what to make of these statistics. One curiosity is the fact that the archaeobotanical report from Ballinglanna North 1 had only 3 views (the lowest of all of my documents on Figshare, and all of those views almost certainly by me), while the dataset for the same site had 34 views (one of the highest hit counts). Odd. I begin to suspect that many of these "views" are not by people... It's hard to know, therefore,whether this data sharing exercise is having an "impact" at all. My Figshare data is available to view here.

Zenodo statistics

I have 12 documents uploaded to Zenodo. In some cases this includes different versions of what is essentially the same report. Zenodo does not currently provide statistics for viewing and downloads, so there are no potential impact measurements at this site. My uploads available to view here.

JOAD statistics

My article written for the Journal of Open Archaeological Data was published in May this year, referencing datasets and reports stored in Zenodo. After approximately six weeks online this has 27 views and 7 downloads. I have not promoted this online, but have sent an email to a few colleagues who may have been interested. Here's the ref.: Johnston, P 2014. Archaeobotanical Data from Two Middle and Later Bronze Age Round House Sites in Cork, Ireland. Journal of Open Archaeology Data 3:e1, DOI: http://dx.doi.org/10.5334/joad.ac


What's the impact?

It's difficult to assess whether this is having an impact or not. Certainly the datasets don't appear to have been used yet and the journal article hasn't been cited. However, it is likely that measuring this kind of impact is something that really occurs over the long term. Wait and see. Nevertheless, I am curious about how those high viewing figures are generated on Figshare.

Friday, June 20, 2014

Final datasets from Cork uploaded

The remaining datasets that I have from County Cork are from the following sites:

  • Gortnahown 2 (E2426)
  • Killydonoghoe (01E0481)
  • Killydonoghoe (01E0495)
  • Ballinvinny North (01E0802)
  • Brooklodge (99E0438)
At present, Gortnahown 2 is the only one of these that I am in a position to upload. 


Gortnahown 2 (Gortnahown 2 dataset, archaeobotany report, archaeology report)
This was an early medieval metalworking site (with structures). Cereals and other seeds were sparsely dispersed in the deposits at the site, primarily oats with small quantities of barley, wheat and arable weeds.
For cross referencing purposes, another blog post where Gortnahown 2 is mentioned can be found here.




Friday, May 2, 2014

Latest datasets uploaded

This past few weeks (my progress is slowing down) I have uploaded material from Gortore 1b, Ballinglanna North 3, Ballynacarriga 2 and Ballinglanna North 1.

As mentioned in my last post I am trying out different licence types. I am also trying out a variety of repositories. Datasets linked to below are stored in both Zenodo and Figshare, using CC-BY and CC-0 licences.

Gortore 1b was a multi-period site that included Mesolithic pits, an Early Neolithic house, early medieval features, as well as samples from undated contexts. 

Mesolithic plant remains were associated with pits near the banks of the Funshion river. The remains appeared to be predominantly collected food waste, and they included hazelnut shell fragments and some berry pips and fruit stones.

A small quantity of Early Neolithic plant remains were recovered from a truncated rectangular house. They included just a small amount of hazelnut shell fragments, some indeterminate cereal grains and possible tuber fragments.

Samples from early medieval contexts (approximately seventh to eighth centuries AD) were amongst the richest from the site, particularly in terms of cereal remains. However, preservation quality was poor. Of the identifiable remains, roughly equal quantities of wheat and barley were recovered. A small quantity of the wheat remains were identified as the primitive glume wheat, emmer (this is very unusual in contexts from the historic period). It was not possible to identify any of the barley grains to type. One immature oat grain was also recovered from these samples and this is unusual as oat is often recovered in abundance in samples from sites of early medieval date.

Most of the samples from this site were not in features from dateable contexts. Hazelnut shell fragments were found scattered throughout the deposits and their widespread distribution suggests that the nuts were probably utilised through all phases of activity at the site.

For purposes of cross referencing, the blog post where I mention Gortore 1b grey literature report is here.

The site at Ballinglanna North 3 comprised one well-preserved Early Neolithic rectangular house, a second, disturbed, possible house site, pits and post-holes suggesting extensive activity, and two Bronze Age burnt mounds.

The majority of samples with plant remains contained charred hazelnut shell fragments, with over 500 fragments counted. The cereals from this site were identified as emmer wheat and barley, with emmer being the most common cereal type from deposits associated with the Early Neolithic structures.

Ballinglanna North 3 grey literature report is also mentioned here.

An early medieval enclosure and souterrain was excavated at Ballynacarriga 2 in Co. Cork. Plant remains were widely distributed in the samples from this site, although they were generally only recovered in small quantities. Cereal types found included oats, wheat, barley and a small quantity of rye.

Ballynacarriga 2 grey literature report is also mentioned here.

Ballinglanna North 1 (Ballinglanna North 1 dataset, archaeobotany reportarchaeology report)
The site at Ballinglanna North 1 comprised a post-medieval structure, a habitation area, a drainage system, a ditch a metal-working area, two large pits and a burnt mound/fulacht fiadh. There was extensive disturbance of the archaeological deposits in some areas on the site.

Deposits from the area around the burnt mound and associated features were relatively rich in plant remains, including cereals such as barley, rye and wheat. It appears likely that these were re-deposited, as rye is very rare in Irish charred plant remains assemblages before the late medieval period.

A pit associated with small quantities of metalworking waste also contained plant remains, primarily oat grains. While another metalworking area contained seeds that were predominantly barley grains. It is speculated that some of the grains found in these deposits may be the burnt residues of excess/glut or damaged crops that were used for fuel instead of for human and/or animal consumption. 

Ballinglanna North 1 grey literature report is mentioned here.



.



Saturday, April 26, 2014

Licence choices for static and dynamic open data


Uploading dynamic datasets

My recent reading on the prospects for open data within archaeology in general has highlighted the distinction between open data that is static (such as a .pdf file that includes the finalised version of a report) and dynamic (data that can be incorporated into other datasets, can be analysed, and can be added to and updated). 

As I mentioned in my last post I am beginning to experiment with uploading dynamic data. As my data is from archaeobotanical analysis, these take the form of .csv files that list the identifications of different seed types found in archaeological deposits, and their quantities. These datasets, uploaded to both my Zenodo account and my Figshare account, have the potential to become dynamic open data, because they are stored in formats that can be reused. This why they have been stored as .csv files: csv (comma separated value) is a simple text-based format that can be read by many different applications. The format is used to exchange data between applications that do not otherwise "talk" to each  other.

.csv has great benefits for re-use over the .pdf file, as with a .pdf is difficult to re-use the data unless to re-type the information that it contains. (However, the .pdf does have other advantages, particularly that can be used to version control, and therefore for referencing. This is one of the most obvious advantages of the static format.)

What licence to choose?

While I have been converting Excel and Open Office files into .csv formats, I have been thinking about licences to choose. When I uploaded reports, I licensed them under a Creative Commons Attribution licence, CC-BY. But these were those static .pdf files with their version control. Datasets are different. In Figshare, the CC-0 licence is the default. This means that the data can be re-used by anyone without attribution. (Figshare do give reasons for this). This means that there is no legal obligation for the person using the data to cite it, although, as Figshare point out, the moral obligation remains. The conventions of academic citation mean that it is unlikely that someone using the data for genuine purposes would actually not bother to cite. Or if they did, it is unlikely that the person would retain their credibility (assuming they were caught). 

I struggled with this at first. Mostly, I hope, out of an apprehension that the practice of removing the legal obligations could begin to erode the moral and academic necessity of citation (rather than out of a vain wish to be cited as much as possible). I initially began uploading datasets to Zenodo, because they provide a range of licence choices, for datasets in particular, while Figshare is more limited. 

And then I went to a talk by Puneet Kishor from Creative Commons, one of the people who worked on the creation of the CC-0 licence. He pointed out that the Creative Commons licences do not stop people with no conscience from using data and text without attribution, instead the CC-0 licence is a way to help the people with good intentions, those who wish to work within the law, to re-use datasets from other researchers. And my ideas about using the CC-0 licence have changed. 

Subsequent reading has made me question whether I have the right to licence the data anyway; a fact is not copyright-able....and it is a fact that a grain of emmer wheat was found in Sample X from Site Y. (Although this could be questionable when the layers of interpretation that went into the retrieval and identification of that emmer wheat grain are taken into account....where to sample, how much to take, how to process, the identification decisions made based on the morphology of the grain, and so on.)

Nevertheless, I am about to start using the open licence more.

Monday, March 31, 2014

What's the point of open data in archaeobotany?

This quote from the referee statement that accompanies an Internet Archaeology open data paper (Richards and Roskams 2013).

The importance of the dataset thus lies in its contribution to a broader programme of research whose cumulative results have the potential to generate something approaching a holistic view of.... (Thomas 2013).

The sentence could be continued with a statement about whatever research area is pertinent to the dataset that is being opened up. For the data that I have been making open, this is about the history (and prehistory) of plant use in Cork and evidence for arable agriculture at various different times in the past. I think. Maybe others could use the data for something different?

The problems

I have spent some time uploading results from archaeobotanical analyses to online repositories over the past two months. Some issues emerged/rose to the surface of my consciousness as I set about doing this, and I'll discuss a few of these below.

First of all, the errors! These are archaeobotanical reports written quite a few years ago, and some written in a great hurry because of time and budget constraints. They contain typographical and formatting errors. Although I am aware of how off-putting these can be for readers, I am also now operating under time constraints (this is completely un-funded and it takes up my leisure time). I consider it more important to make the material available and open, rather than fussing too much about embarrassing slips in copy-editing attention.

Secondly, these technical reports served a fairly restricted function. The reports stand as documents of their time and their purpose when they were written. But this means that they are limited. They are limited because they are pieces that were written specifically as appendices to archaeological excavation reports, and this is all they are, they do not function well as stand alone documents. For this reason I have spent some time providing links between the plant remains reports and other, relevant and related material, such as the excavation report if it is online. But this means a lot of additional work for anyone seeking to re-use the material.

Thirdly, I have been concentrating on adding .pdfs of grey literature reports to the repositories, and the .pdf format does not really allow for easy re-use. It allows others to read what you have written about a certain assemblage, but it does not necessarily allow them to easily add/incorporate the results into their own work. As it stands then, the collection of reports in a repository acts as an information source, it serves a function of allowing my archaeobotanical colleagues (around 6 others specialise in Irish material) sight of what has been found at different sites. But in order to go any further, as the data stands as a .pdf file within the grey literature repository, if others want to use/re-use my results, currently they need to re-type. In terms of actually encouraging their use for archaeobotanical research, these reports are only a first step in the process, and making the data available openly is the next one.

The solution?

I have decided to make the tables of identification from my grey literature reports available as easily importable .csv files. At the moment I am concentrating on assemblages that contain more than 25 cereal grains. Although this is quite a small number, it is based on the cut-off point used in a study of early medieval archaeobotanical remains where archaeobotanical reports from multiple sources were re-used and compiled to produce a large scale analysis of plant material from Ireland in this period (McCormick et al. 2011, 52). These datasets are slowly being added to the Zenodo repository, under a CC-By licence.

More problems?

As I gradually add datasets to the repository, I have noted more potential pit-falls in the open-ness of the datasets. Most notable of these is the fact that many of the technical reports that the .csv files are based on were written/assembled long before radiocarbon results were obtained and the phasing of multi-period sites was sorted out. This means that some contain data from more than one period of occupation at any given site. While some phasing has usually been incorporated into the discussion of the archaeobotanical results, the samples from each phase have not always been clearly separated within the datasets. In addition, it is likely that, as it was not possible to radiocarbon date each and every sample, there will always be material from some contexts where the origin date is ambiguous. Nevertheless, in order to make the dataset more relevant for archaeobotanists to re-use it, it will be necessary to go back over the datasets again. And who knows what additional problems will emerge as the iterative process continues?

See a video discussion of the difficulties of restructuring data that was originally created for a different specific set of purposes, in order to make it open and viable as linked data (specifically Hugh Corley's comments c. minute 33) at:
https://www.youtube.com/watch?v=bkBmstZmRdM 


References


McCormick, F., Kerr, T., McClatchie, M., & O’Sullivan, A. (2011). The Archaeology of Livestock and Cereal Production in Early Medieval Ireland, AD 400 - 1100. Retrieved from http://www.emap.ie/documents/EMAP_Report_5_Archaeology_of_Livestock_and_Cereal_Production_WEB.pdf

Richards, J., and Roskams, S. (2013). Burdale: An Anglian Settlement in the Yorkshire Wolds (Data Paper). Internet Archaeology, (35). doi:10.11141/ia.35.8

Thomas, G. 'Referee Statement' in Richards, J., and Roskams, S. (2013). Burdale: An Anglian Settlement in the Yorkshire Wolds (Data Paper). Internet Archaeology, (35). doi:10.11141/ia.35.8

Open archaeobotanical datasets from Cork (Ballynacarriga 3, Caherdrinny 3 and Gortore 1)

Lately, I have challenged myself to make re-usable datasets available through online repositories. These are currently being uploaded using a CC-BY licence, and anyone can re-use them, for whatever purpose, as long as they attribute the source. 

So far, datasets from the following sites in Cork have been uploaded:
The excavation report for Ballynacarriga 3 indicates that this was a multi-period site, with an Early Neolithic hearth, Late Neolithic occupation/activity, Beaker pottery, Early Bronze Age ring ditches and burials, as well as limited evidence for unspecified Iron Age activity. The plant remains report from Ballynacarriga 3 (grey literature format) outlines that the Early Neolithic hearth was associated with the remains of emmer wheat (grains and chaff, suggesting cereal processing), while a small amount of barley was recovered from samples associated with the Late Neolithic activity. Slightly later plant material (from the Early Bronze Age burials and ring ditches) included barley grains and quite a large amount of weed seeds. A small quantity of barley grains was found associated with the Iron Age area of activity.

Potential for re-use for these datasets for research into charred plant material found in prehistoric Ireland (or in a wider area) in general. However, there are potential difficulties in re-using the dataset as it has been saved here, as all the periods of activity at the site are included in this one dataset. Separating the results from plant identifications into an individual .csv file for each separate phase of activity is an obvious next step, and something to add to the "to do" list.


The excavation report from Caherdrinny 3 outlines the details of excavation of another multi-period site, with an Early Neolithic rectangular house and an extensive area of activity with radiocarbon dating evidence to suggest that there was also Mesolithic, Early Bronze Age, Iron Age, early and late medieval occupation at the site.


The plant remains report for Caherdrinny 3 details the recovery and analysis of the plant remains assemblage. A small quantity of plant remains, including hazelnut shell fragments, some fruit seeds and some indeterminate cereal grains were found in deposits associated with the Early Neolithic house. A much wider variety of plant material was found in the area surrounding the house, including many identifiable cereal types, as well as weeds and legumes. The richest sample was from a kiln deposit, where more than 600 seeds were found; a n Early Bronze Age radiocarbon date was obtained from barley grains found in this deposit. Once again, the results in this dataset would be easier for others to use if they were split into relevant time periods (where possible).


The excavation report from Gortore 1 describes a site where an Early Neolithic rectangular house was found, as well as an isolated Bronze Age pit.


The plant remains report from Gortore 1 documents the recovery of wheat (in particular emmer wheat), barley and crab apple.


I have also mentioned links to the grey literature files in earlier blog posts: