Meeting minutes 2020-07-15 · Archivematica Product Support Program

15 July 2020 3:00 p.m. UTC

Attendees

Artefactual: Sara Allain, Sarah Romkey, Justin Simpson, Jesús Garcia Crespo, Kelly Stewart
Bodleian: Susan Thomas, Matthew Neely, James Mooney, Wendy Maule
International Institute of Social History: Eric de Ruijter, Robert Gillesse, Lucien van Wouw
Max Communications: Geoff Blissitt, Tim Schofield
Picturae: Wim van Dongen
Wellcome Collection: Tom Scott, Jonathan Tweed

Regrets: none

Agenda

3:00 UTC - Meeting starts (Sarah Romkey)

3:05 UTC - Housekeeping (Sara Allain)

3:10 UTC - Artefactual update (Kelly)

3:15 UTC - Use cases for reporting (All members)

3:40 UTC - Core definition progress (Sarah Romkey, Justin, Jesús)

4:00 UTC - Query builder demonstration (Geoff)

4:05 UTC - Mellon email grant (Susan)

Notes

Housekeeping

None

Artefactual update

Led by: Kelly
Company reorganized in January - split dev team into two units (project delivery [PDT], software architecture and development [SAD]); split across billable vs non-billable work rather than AtoM vs Archivematica
Software architecture and development team focuses on how to maintain sustainable software - the unsexy stuff we need to do to keep AM and AtoM viable (dependencies, releases, etc.)
Spent first few months getting ourselves organized, then started sprinting in late April
Development resources - billable work comes first, then whatever is left over goes to sustainability efforts.
In this time we’ve released Archivematica 1.11.1, 1.11.2, AtoM 2.5.4, and AtoM 2.6 - able to move a lot quicker on releasing now
Track work on a public Trello board - https://trello.com/b/jw8nuM36/sad-sprint-board - publicly accessible. We are hoping that external contributors like PSP members can use this board as a way to contribute to upcoming releases.
Suggestion from GB: make it clear which issues would be easy for an external dev to pick up

Use cases for reporting

Led by: Sarah Romkey
SR: Bodleian and Picturae have provided examples from the work they have done
- Reporting project is a dev project, so being led in the Project Delivery team with consultation from SAD team
- PDT member Peter van Garderen has been working on a proof of concept for reporting. Started with an Architectural Decision Record (ADR) - wanted to explore whether reporting should be integrated in the dashboard directly, are there other tools that can do what we need, or do we need to develop a new tool - landed on the latter and developed AIPscan.
- AIPscan uses Picturae’s requirements to develop a AIP storage reporting tool - e.g. for everything we’ve stored this year, what has been stored overall, what formats have we got, which formats are taking up the most space, etc.
- Peter has prepared a screencast of the work in progress so far, will be shared to the group
WvD: Peter still working on adding more reports based on priority for Picturae clients
- Tool will be extended to connect to MySQL database in future, not just AIP METS files
SR: Doesn’t prevent reporting using other tools - nothing is hardcoded into Archivematica, could still use Grafana or others
WvD: In future version of AM, more fields will be available for in-dashboard reporting, as well as CSV download of AIP storage search results
SR: This is the first time we’ve had the chance to work with a client to provide detailed storage reporting
GB: this aligns with the use cases we’ve seen
JT: We’ve built some reporting tools this year, includes operational monitoring but also information about packages and files in storage; waiting to finish METS file work before extending that reporting
TScott: reporting tool needs to work for both Archivematica and the other tool that Wellcome uses to get material into storage
RG: IISH also needs reporting tool to work on the storage independently from AM because they clean up the pipeline after each transfer - SR confirms that this is the intention with AIPscan, doesn’t rely on AM
MN: Created user stories prior to testing AM. One of the metrics that Bodleian wants is progress of big packages - how long is it taking to get into the system. Related is better reporting about errors, so that they are more understandable.
- All the storage metrics that have been done are of interest as well.
JM: Using Grafana to help see what the system is doing in the background (e.g. when is it running out of memory, measuring volume of the service) to see how materials are progressing through the system. AM front-end doesn’t show you these things so having Grafana running in the background provides that view.
ST: Sometimes have inquiries about uncatalogued materials, so being able to query storage to look for these items is something that would be really helpful.
SR: Grafana has a learning curve but it can be very powerful.
- Two areas of reporting: reporting on content, and reporting on management of processing pipelines. Artefactual sees that AIPScan might be the way to go for the first, and Grafana for the second.
SR: Any requested further actions?
- Will set up a separate demo session to show off AIPscan

Core definition progress

Led by: Sarah Romkey, Justin, Jesús
SR: At last meeting we presented an outline and got some feedback - promised to fill in the outline, which we have been doing.
- Added introduction, context statement
- Started to talk about the idea of using plain language to describe what AM can do now, rather than all of the things that AM might do in the future. Started with #7, “Archivematica enacts preservation plans as defined by the user” - we take it as core that AM can implement a preservation plan created by a user. Goal is to show that this goal can be met in several ways.
- Added motivation and consequences of each statement - e.g. “Archivematica creates standards-compliant, systems agnostic self-describing AIPs” creates a bottleneck because each AIP has to be serialized.
- Justin and Sarah didn’t start with a number of statements in mind but ended up with 12. Removed some statements that touched on functionality that is not truly core to AM - e.g. Appraisal tab
- Absence of statement doesn’t mean that it’s not going to be maintained, it just means that it isn’t core
- Focused on ensuring that statements applied to both small and large institutions with varied computing environments and deployment methods
JHS: focused on inputs and outputs, how AM interfaces with the world; a lot of users think of AM as a repository but we often think about it as much more ephemeral than that.
- Writing these statements allowed us to think about AM functionality more broadly - e.g. for preservation plans, statement doesn’t mean that AM has to provide a method for writing preservation plans; could build out an API to allow users to provide a preservation plan as an input (we’ve done this as a POC with Preservica through the Preservation Action Registries [PAR] project)
- Similarly, storage locations - current implementation is tightly coupled to storage locations defined in Storage Service, but doesn’t have to be this way.
- Feedback that we’re looking for - do these describe the ways that you’re using AM?
SR: considered whether the statements should be ranked - Sarah sent around a spreadsheet where each institution can rank the statements
GB: how do we expect this document to be used?
JHS: lower in the document is the data model section, looking to explicitly define the data model that is required to fulfil these required functions. Could use this to build out the API.
SR: Next steps - would love more comments. Could pick it up at next meeting or could have a breakout session.

Query builder demonstration

Led by: Geoff and Tim
GB: Max Communications has developed a tool for Imperial College. Many departments in the college will be using the service, so need to be able to report on who did what when (esp. Info like which departments are using most data).
TSchofield: [demonstration - unfortunately technical issues prevented a full demo]

Mellon email grant

Led by: Susan
ST: CFP went out a few months ago - know that email archiving is a priority for many of us here.
JHS: Artefactual helped to develop the report that the CFP is based on so think there’s a good chance that Archivematica-focused work would be successful.
ST: next step - collate expressions of interest to discuss in more detail. Please send an email to Susan.

Call for new members

Led by: Justin
JHS: In January we set July as date when we would call for new members. Still things we’d need to do to make that happen (publish TOR, create application form, confirm process for approving). Just want to check to make sure there is no concern with a call.
All: sounds fine
JHS: will follow up via email

Meeting wrap-up

Next meeting - October 14, 2020

Action items

SR to provide deadline for feedback on core definition
SA to look at how best to mark issues for external contributors