Talk:Physical measurement - Mesure physique

From KarstLink

As promissed to Frederic, I will comment his "Physical measurement" proposal here, and present an alternative solution.

1) history:

  This entity was introduced shortly in 2022, but not part of the voted items of the KarstLink core.
  We elaborated together, then presented at the UIS congress (paper cosigned by Frederic Urien, Peter Matthews and myself, Eric Madelaine) a proposal for an extension dedicated to measurements by (mainly underground) sensors as time-series, that many cavers use for temperature, pressure, conductivity, gaz rates, etc.

Here monitoring can mean short term to very long series of measurements for a single campaign, typically weeks, or several months, as opposed to individual measures as one could take typically with a and-held thermometer.

  In April/May 2024, related to the planned implementation of a physical measurement database module in the Grottocenter plateform, Frederic introduced in the karstlink wiki his proposal for Physical Measurements, that covers in a single Entity the two kinds of measures.

2) Encoding of physical measurements versus time-series:

  In addition to the obvious differences in the devices and in the way the measures are taken, the structure of

the results of observations is quite different. This shows clearly in the item of proposal 1 about results, defined as "made up of one or more Time/Value pairs". My point is (please refer to the UIS congress communication for more details) that when you have to deal with a (time-series) containing typically a hundred thousand sets of quantities (e.g. temperature, pressure, conductivity) at each time when the sensor registered a measure, you definitely do not want to build this complex object as a huge data structure that you would store in your database. In practice, the device providers encode this data in compact formats (usually text formats), and what is important to store, in addition to this text file, is its structure (format of timestamps, and which quantity/unit is in each column)

3) I find Frederic proposal quite complicated because he wants to define a single ontology entity for both a single measurement and a time-series, that forces him to underspecify the relations within the entity. One typical case is with time stamps, where a single measurement would be _necessarily_ associated to a DateTimeStamp, and nothing else, while a time series would certainly be _necessarily_ associated to a start date and an end date, that would allow applications using such data to make requests based on these start/end dates, without having to scan the full list of time/value pairs...

4) My proposal (B) is then to separate these 2 kinds of physical measurements. Of course there is a subset that will be common to the two "ObservationTypes", but each of them will be simpler and more precise.

5) In addition, from my experience using such sensors, it should be very useful to add a notion of "quality" of the measures, that are usualy coming from the caracteristics of the sensors, but also from poor calibration procedures, for exemple. When people will extract sensor data from our database, say 20 years from now, to make analyses of the effect of climate changes, this will be an important information. I have no precise idea what this "quality" could be, if someone has references to existing ontologies, I will be interested.

    • Response from Frédéric

==> my answers, in turn, within your text.

1) the physical measurements were presented succinctly in 2020. In 2022 Eric punctually updated the initial presentation

Copying the history of the physical measurements page

2) The data structure proposed for the standardized format of Proposition 1 is composed of a series of timestaps and values, as for the data provided by the sensor. The difference is that each sensor provides data in a different format which will require each user to transform the data. Providing standardized data or offering a file produced by the sensor presents a very important difference: the standardized file will only provide the requested data whereas the sensor file contains all the data collected, including data that does not correspond to the request for the user.

==> This is an interesting topic. But clearly, the ontology will not provide a proper common format that encodes all possible interesting sensors (at least, Proposal A does not do this). My point of view (quite partial, as my experience is from 20+ years visualization applications, not other kind of analysis) is that it is not the purpose of the physical measurement data-base, but of some "next step" app, that will be specialized for extracting data depending on the target analysis goal. For visualization purpose, there is no reason to restrict the set of interesting data.

3) Eric's proposal is incomplete to allow the exchange of data, which is the objective of Karstlink: information is missing on the license, on the author, on the organization which made the data available. It seems to me that Eric's proposal includes elements that were defined in proposal 1, by not retaining these mandatory elements and by not allowing data in a standardized format to be made available

==> this point was refering to a preliminary very incomplete version of Proposal B, it has been added, namely in the *common* part of the physical measurement subclass.

4) There is no difference between a single value and a series of a million measurements at the metadata level. The only difference is that the single measurement will not be associated with a file corresponding to the original data. Proposition 1 would not be different if it allowed describing a single measurement or measurements coming from a sensor, except for the point indicated. It is not useful to construct 2 different entities.

==> The devil is in the details: in proposal A, to allow two different "sub-types" of measurements in a single vocabulary, Frederic had to make several important notions "optional", in particular document, and start/end dates. This allows without problem people to *contribute* measurement data. but it will *not* allow client apps to send queries without identifying separately the 2 subtypes of physical measurements. I am convinced that we can converge on some model that can provide both adequate structures for storing and for recovering this data, but we definitely need to discuss on this (rather technical) topic.

5) The notion of quality cannot be defined with a single criterion: part of it is linked to the sensor with a number of more or less important criteria which can intervene in the quality of the data. There is also a human factor which intervenes and which will lead to imprecise data. In proposition 1 it's possible for all people to put a comment / a description (by managing copyright). This will allow the author, then other users, to give an opinion on the quality of the data but also to make other information available in a single field. Putting a specific text field for quality does not seem to me to be a satisfactory solution.

This, indeed, is not finalised, we need more experience, and more contributions from people that *do* analyse this kind of data... I know some works (e.g. inventories of die tracing of underground rivers) that have defined proper criterias allowing analyses to take quality into account. Again this is mainly important for analyses (i.e. for the "client apps"), but if the information is not stored with the measurements from the beginning, we will not have proper information in the end.