This is a first-pass example of how the physical table structures could be arranged for recording Security Level and Data Quality (Accuracy, Lineage, Age, Completeness, and Logical Consistency). Comments and suggested improvements are certainly invited. See also the further explanatory notes below.
Tips for using this page:
+--- The example single- (SV) and multi-valued ¦ (MV) tables below in this column show how ¦ to link individual field values to their ¦ data quality, which is recorded in the ¦ table in the column at the right. ¦ ¦ ¦ CA0000 ¦ Master table (a single-valued example) ¦ +---+-------------------------+----+ 1 ¦ ¦227¦Cave ID ¦A10*¦¦»-------+ ¦ 1¦Karst feature type ¦A2 ¦ CID ¦ ¦ 9¦Quantity of entrances ¦S ¦ 1 ¦ ¦ 10¦Entrance type ¦A1 ¦¦»-------+ ¦ ¦etc ¦ ¦ FID ¦ ¦ ¦etc ¦ ¦ ¦ +---+-------------------------+----+ ¦ * Quality of field 10[Entrance type] is ¦ via a lookup of CaveID + FieldID(10) + ¦ LatestUpdateID(from Hist table CA0447). ¦ * This example shows the Cave master ¦ table. Other single-valued tables and ¦ other entities would be done similarly. ¦ ¦ ¦ CA0030 ¦ Discovery (a multi-valued example) ¦ +---+-------------------------+----+ 1 ¦ ¦227¦Cave ID ¦A10*¦¦»-------+ ¦ 73¦Value sequence ¦S* ¦ CID ¦ ¦ 30¦Discoverer type ¦A1 ¦ ¦ ¦419¦Discoverer - person ID ¦A10 ¦ ¦ ¦420¦Discoverer - org ID ¦A9 ¦ ¦ ¦ 31¦Discoverer name ¦A45 ¦ 1 ¦ ¦ 34¦Discoverer year ¦A4 ¦¦»-------+ ¦ 33¦Discoverer month ¦A2 ¦ FID ¦ ¦ 32¦Discoverer day ¦A2 ¦ ¦ ¦477¦Discoverer date qualifier¦A2 ¦ 1 ¦ ¦464¦Value ID ¦S ¦¦»-------+ +---+-------------------------+----+ VID ¦ * Quality of field 34[Discoverer year] is ¦ via a lookup of: ¦ CaveID + FieldID(34) + ValueID + UpdateID. ¦ * If a record is updated, e.g. the value of ¦ any field in the table is changed, then a ¦ new VID is issued for the record and added ¦ to the Data Quality table (here CA0464 on ¦ right) for each field in the above table. ¦ * Other multi-valued tables containing one ¦ or more data fields would be done ¦ similarly. ¦ ¦ ********* ¦ ¦ For data security values, the table below ¦ is an alternative to including the security ¦ field 467[Data security level] in the data ¦ quality table in the right-hand column. For ¦ example, if security is always being shown ¦ only on a per field basis and not on a per ¦ field value basis, or if it is wanted to ¦ store it separately from data quality ¦ anyway. ¦ ¦ CA0467 ¦ Cave Data Security (at field level) ¦ +---+-------------------------+----+ 0:1 « ¦ ¦227¦Cave ID ¦A10*¦¦«-------+ ¦ 82¦Field ID ¦S* ¦¦ ¦467¦Data security level ¦A2 ¦L +---+-------------------------+----+ |
»--------------------------------------------+ ¦ CA0464 ¦ Cave Data Quality (for example) ¦ +---+-------------------------+----+ 0:1 ¦ ¦227¦Cave ID ¦A10*¦3 5» ¦ « ¦ ¦ 82¦Field ID ¦S* ¦3 ¦«--+ ¦464¦Value ID ¦S* ¦1 3 ¦ ¦ ¦421¦Update ID ¦A10*¦2 3» ¦ ¦ ¦520¦Data - key-in ID ¦A10 ¦6»Key-Ins¦ ¦521¦Data - upload ID ¦A10 ¦6»Uploads¦ ¦469¦Type of update ¦A1 ¦L3 ¦ ¦470¦Data supplier - person ID¦A10 ¦L6»People¦ ¦471¦Data supplier - org ID ¦A9 ¦L6»Orgs ¦ ¦465¦Data source type ¦A2 ¦L8 ¦ ¦466¦Data confidence level ¦A2 ¦L8 ¦ ¦467¦Data security level ¦A2 ¦L7 8 ¦ ¦468¦Data - reference ID ¦A11 ¦L8»Refs ¦ ¦472¦Data date - year ¦A4 ¦4 8 ¦ ¦473¦Data date - month ¦A2 ¦4 8 ¦ ¦474¦Data date - day ¦A2 ¦4 8 ¦ ¦475¦Data date - qualifier ¦A2 ¦L4 8 ¦ +---+-------------------------+----+ ¦ ¦ ¦ CA0550 ¦ Cave Data Quality Comments ¦ +---+-------------------------+----+ 0:1 ¦ ¦227¦Cave ID ¦A10*¦3 ¦ « ¦ ¦ 82¦Field ID ¦S* ¦3 ¦«-----+ ¦464¦Value ID ¦S* ¦1 ¦ ¦421¦Update ID ¦A10*¦3 ¦ ¦550¦Data quality comment memo¦M ¦8 +---+-------------------------+----+ One record in the main table above stores the properties of one individual field value existing in the site's Cave entity tables. Similar table for each other entity, because ID does not identify which entity type. The lower table stores only an occasional record when there is a comment about the respective value's quality. (1) [Value ID] is null for SV fields because value can be identified by Update ID. (2) [Update ID] distinguishes VID owners when record is in a central database, and identifies SV field values. It also permits the history of data quality to be seen by allowing older values to be retained in the system. (3) Auto-filled by the system. (4) Note that these are the dates of the actual data as opposed to the date recorded in the Update batch when it was submitted for data entry. (5) This ID field chosen to suit respective entity of the field values being tracked. Table filename would follow same entity. (6) If [Update ID] is available, i.e. Data History being kept, this field can be auto-filled by the system, or else not stored but calculated when needed. If Data History not being kept, this field would be explicitly entered by a user (but not 421, 520, 521). (7) Security Level for SV fields could be considered a field-level property rather than value-level, and stored in a separate table keyed on only CaveID+FieldID to differentiate between caves (see table at left). But for MV fields, there may be variations on Security Level between the values of a field, so Security Level must be stored in a table keyed like the above. (8) Would need individual or block entry by a user. |
By convention, "Data Quality" is made up of the following
properties: Accuracy, Lineage, Age, Completeness, and Logical Consistency. So
far, the first three are covered by fields shown here while the last two can
be covered in the Comments field.
Accuracy: Use 466[Data confidence level]
Lineage: Covered in the Data History tables
via the Update ID, Key-in ID and Upload ID fields. Further information
about the processing history of the data can be
included in 549[Data quality comment line] or 550[Data quality comment memo].
Age: Use 472[Data date - year], 473[Data date - month],
474[Data date - day], and 475[Data date - qualifier].
Completeness: Use 549[Data quality comment line] or 550[Data
quality comment memo] for individual field values.
However note that for the overall entity, e.g. cave, field 512[Entity update
status] is available.
Logical Consistency: Use 549[Data quality comment line] or 550[Data
quality comment memo].
Security Level has also been included for convenience in the table, as it too can be a value-level property. However if preferred, it could be moved out to a separate table using the same or similar key structure depending on the granularity required.
Identifying values In order to be able to record various items of data quality information against each field value without clogging up the normal tables, it is necessary to introduce a way of identifying each field value so that the information can be stored in separate linked tables dedicated to data quality.
The values of single-valued fields, e.g. those in an entity's master table, can be identified by using the UpdateID which supplied them, or, if data history is not being kept and there is only the latest value in existence, then there is only one candidate value anyway. The latter would be referenced by a key of CaveID+FieldID, the former by CaveID+FieldID+UpdateID.
For the values of multi-valued (MV) fields, we need to distinguish between these multiple values, and therefore one extra field for [Value ID] needs to be added to each MV table. This is sufficient regardless of how many fields are in a record of the MV table. Any existing [Value sequence] field is not suitable for identifying the values because [Value sequence] might get changed at some time in order to re-sequence the multiple values. Each MV value would therefore be referenced by a key of CaveID+FieldID+ValueID+UpdateID.
Value ID series Because of its use in conjunction with an Update ID, the value for 464[Value ID] can be a simple short integer as there is no longer a requirement for it be be unique in itself in order to differentiate itself when added to a central site. It can be taken from a separate series per field per cave per database site, which means that the next [Value ID] can be calculated from a simple lookup of existing [Value ID]s for that cave and field on the site, and also there will be no risk of running out of numbers.
Key structure The key for a Data Quality table would be made up
from 4 fields as follows. Note there would be a separate
quality table for each entity type, because an entity's ID does not
indicate what entity-type it is, so here this would be identified by the table
its quality is stored in.
Entity ID: For example, [Cave ID]. This field identifies which cave
is being referred to.
Field ID: This is the UIS Field ID (FID) and identifies which
field is being referred to. The db program will need a process or method
to find out the FID of a selected field so that this can be used when
looking up the Data Quality table for that field.
Value ID: Only needed for multi-valued fields, ValueID
identifies which of the one or more values for the field is
being referred to. For single-valued fields, ValueID would be empty.
Update ID: If data history is being kept, the UpdateID will identify
where the target field's value came from, and will make the data quality
record unique when mixed with cave records from other sources in a
central database. It also allows the history of data quality for a
given field to be stored.
Displaying data quality The dotted lines linking the tables show how the fields would be linked to the data quality table. Taking for example, the field 10[Entrance type] in the first sample table: if this field had been selected via the screen cursor, and a menu choice selected or a button clicked to "Show Quality", then a process would run with input parameters of CaveID, FieldID (=10), and latest UpdateID for the cave, and thereby select the appropriate record from the data quality table CA0464 and display its fields, possibly in a pop-up window. A similar method could be used for entering data quality, including generating a new [Value ID] if it was a multi-valued field.
Dates There is a distinction made between data date and the date it was submitted for data entry (Update date). The Data Quality Table stores the data date, whereas the Data History Update Batches Table stores the submission date. For example, if in 2002 a cave records keeper found an early report dated 1987 which contained some new information, and he submitted that information for data entry in 2002, then the data quality field 472[Data date - year] would be 1987, and the data history field 424[Update - date] would be the 2002 date.
Columns shown: 1. Standard UIS Field ID (FID). 2. Field name. 3. Data type. Codes: * = A primary key. L = A lookup table is used to set this field's values. R = Right-justified. » = This field provides a link to another entity, i.e. is a "foreign key". - The names above the table descriptions are example filenames for the tables using max 8 chars. - Any numbers to the right of a field refer to the notes below that table. Table linkages (respective instances in detail table per instance in master table): 0:1 = from 0 to 1 instances 0:n = from 0 to many instances 1:1 = always 1 instance 1:n = at least 1 instance The vertical bar(s) at each end of a link show the actual fields linked. Data types: S = Short integer, up to 32,768. N = Numeric, decimals allowed. Annn = Alphanumeric up to 255 chars long (A1-A255). D = Date. M = Memo (variable length free text). B = BLOB (Binary Large Object, e.g. a photo image).[ Top ] [ Tables ]
15-Jul-2002. Initial release.