Open Data/Software in Particle Physics

Image: D. Montage (via Wikipedia)

Lyon is perhaps best known for as one of France’s great gastronomical hubs, with the Burgundy wine regions to the north and the fruit basket of the Provence to the south, all linked by the mighty Rhône river. But Lyon is also the home of a major computing hub: the Centre de Calcul (Data Processing Centre) of the French particle physics national institute. It is one of the major computing nodes of the LHC’s world-wide computing grid, but also supports many activities across nuclear physics and astronomy. It is also a convenient non-Paris location for meetings of representatives of French laboratories: one of which was the Open Data Workshop for Particle Physics, which I attended a few weeks ago for a panel discussion on open software.

Particle physics has a bit of a complex relationship with open science. On some aspects, we are way ahead of other fields, on others we have more difficulty. On the plus side, high energy physics papers are all stored in an open-access archive (called arXiv: the X is actually a greek Chi so the word is pronounced “archive”), which means anyone, anywhere can access all the field’s scientific content without paying (sometimes ludicrous) journal subscription fees. Many of the top journals in the field are open access too. So open the “open science” front we are doing great.

Where we struggle more is on “open data”. Indeed, unlike other fields (such as astrophysics), there is not much point in publicly releasing the LHC’s collision data as collected by ATLAS and CMS. It requires the work of a whole collaboration to convert the raw data (which is a collection of thousands electrical signals from the various sub-detectors) into useable physics objects, and that’s before calibration and alignment procedures. These require expert knowledge from the people who built the instrument. So releasing raw “open data” is not really meaningful because there is not much that people outside could do with it. There is work ongoing to release the reconstructed and calibrated data (CMS are pretty good at this and ATLAS are catching up) but it’s usage is rather complex (there is a reason it takes years and years for the collaboration members to actually produce physics results from it). So the question which asks itself is: is this the best way to make our result reproducible? At the workshop I made the argument that it’s better to concentrate on publishing additional material to make the statistical analysis reusable and repeatable (basically, following the FAIR principles), than to spend vast resources to release the raw data. Because so far (as far as I know), zero new scientific results have come out of Open Data efforts, while hundreds have come from progress in analysis preservation and re-interpretation. This is something that I have been striving towards for a large part of my career, see the Reinterpretation page for more details.

Another aspect of this discussion is Open Software. That’s the panel discussion I took part in. There is a known problem in particle physics that people who develop and maintain software, despite being absolutely crucial to the success of the science programme, are not well recognised (in the sense that their achievements are not ranked favourably against physic analysis results, for example in permanent job applications). One of the discussions at the workshop was about how the French system is well set up to create positions for Domain-Specific Software Engineers: someone who makes a career out of being a software developper who also knows about physics (and is also aware of the sometimes dubious technical skill level of the average physics PhD student). Indeed, French labs have a whole career path for engineers and technicians (Ingénieurs de Recherche) which mirrors that of staff researchers (Chargés de Recherche). If we ca attract software-oriented people into permanent jobs in physics, then this could help us to make sustainable software to ensure that the results of the LHC are relevant and re-useable all the way to the next hadron collider, maybe half a century in the future.

All in all, the Centre de Calcul was a well-chosen place for this important discussion: recommendations are being written based on the workshop to advise the heads of particle physics in France how we can best move forward in the years to come. I’m very proud to be part of that process.

Leave a comment