An Open-source Web-based National Cave Database

[Presented at the 14th International Congress of Speleology, Greece, 26 Aug 2005]

Authors:

Michael Lake, Australian Speleological Federation
Peter Matthews, UIS Informatics Commission

Author Contact Information

Michael Lake
2 Derribong Place
Thornleigh NSW 2120
Sydney, Australia
MikeL@speleonics.com.au

Peter Matthews
+61 (3) 5263-1686
matthews@melbpc.org.au

Abstract

Building on its successful experience in 1985 with the Australian Karst Index Database (140 fields, 6600 caves and karst features, 2400 maps, 925 references, and the associated 500-page book [ 1 ]), the Australian Speleological Federation teamed up with the Informatics Commission of the International Union of Speleology to expand and convert the KID to a modern, web-based, fully relational database using open-source software. Those responsible for each of Australia's 355 cave areas can keep it up to date and control access to the information using a simple web interface. The read-only KID using the 1985 data has been operational on the web since 2001, and the updateable version since mid 2005 (http://www.caves.org.au). ASF produced the software and the installation, while the Informatics Commission (http://www.uisic.uis-speleo.org) advised on the field definitions and database structures, for which this installation also acts as a pilot. The Commission invites many more fields from the caving, research and management communities.

The software was professionally written and documented to ASF's detailed specifications, and released under the GNU Public Licence so that it would be freely available to other groups. It runs on a GNU/Linux server, but could also run on other platforms because it uses MySQL, Perl and Apache, all of which are open-source and multi-platform. Though easy to use, it requires reasonable knowledge about computer servers to install and manage. Attention has been paid to ease of conversion of the programs and data to languages other than English, and to ease of adapting the user interface, though there may be internationalisation problems for some languages. It is hoped that its ease, adaptability and economy will lead to wide use, and with its formally defined fields, will facilitate the ready exchange and consolidation of caving and scientific data.

Introduction

The Australian Speleological Federation's national Karst Index Database is accessible on the Internet at http://www.caves.org.au/ (Click on "Karst Index database"). Most of the information on the 6 600 caves is accessable to anyone via a guest login. The software was written so that it can be used via any Web browser, including text-based browsers, so users with vision impairment can access all functionality of the KID. The database itself consists of approximately 500 fields structured into 69 tables.

The ASF's KID is the first Web-based database to implement the suggested information standards developed by the International Union of Speleology Informatics Commission (UISIC) [ 3 ]. These Standards include definitions for the cave and karst data fields, numeric codes for their values, and suggested table structures for cave and related databases.

Functionality Provided by the ASF's KID

Features of the Web-based KID for users include:
* a web-based interface that is easy to use;
* works with text based browsers - important for visually impaired users;
* users can make queries about areas, caves, maps, persons and organisations;
* updaters can update areas, caves, maps, persons and organisations;
* users can see what changes have been made to the data so users and State Coordinators can easily identify what data has changed;
* the software creates Cave and Map Summary Forms for archiving and in-field use in PDF format directly from the database;
* data attribution keeps track of contributions.

Features for administrators include:
* web-based user administration system;
* administrators can create, delete or edit users;
* administrators can assign user access from state down to cave area level;
* administrators can restrict user access to individual fields;

Additional features include:
* UISIC field compatible;
* open sourced under the GPL licence so others can use it and contribute to it;
* source code is well documented;
* scalable to cope with tens of thousands of caves;

Anyone can use the KID to make queries, however updating functions are only available to updaters via a username and password. Screen shots and detailed tutorials for updaters can be viewed by anyone in the documentation section (http://www.caves.org.au/kid/doc). This means that anyone in a caving club in Australia wishing to help update caves can see how updating works and overseas organisations wishing to gain an overview of the updating system can also see how updating is done.

The integrity of the data in the KID is of paramount importance and for this reason checking of all updates by independent checkers is part of the KID updating system so that the possibility introducing errors into KID is reduced. Effectively a peer-review system is used; updates go into a staging table, a checker then reviews the changes and only if they are passed do the changes proceed into the main data tables where they can then be seen by users making a query.

This also means that no single user is responsible for errors that get into the database. Errors can still make their way into any database so data history is recorded so that all changes can be traced and mistakes corrected. This also extends to cover the data quality fields.

Data quality for fields is also recorded so that the accuracy of the entered data can be specified (e.g. data quality for the "discovery date" may be "probably correct" while data quality for a cave's length may be "known to be greater than"). There are 27 available data quality fields varying from simple ranges such as "correct" to "wrong" to more complex statements about data accuracy.

Data attribution is also tracked so that data is attributed to the organisation that produced that data.

The ASF KID Licence

The ASF's KID is released under the GNU General Public License [ 4 ]. Indeed, one of the requirements of the ASF specification to the progammer was that the software used in the KID was open source and released under the GPL. This means that the the ASF did not need to purchase any software or software licences and we can use other people's high quality, open source code in our KID. Because the KID software is released under the GPL it's also available for other speleological groups and individuals to use and hopefully contribute to it.

There is no publically available link on our website for the source code, however you only need to email the KID Administrator and a link from where you can download the latest version can be emailed to you. The complete code is 2.9 MB and some sample data is 200 kB in size.

There is no published link yet because:
a) The software is developing rapidly and we probably will not keep the link up-to-date with the latest version.
b) We would like to know who is interested in the software. If you decide you don't want to use it we'd like to know why, was it too difficult to install? was the interface not what you expected?
c) It is not client software and cannot be easily installed on Windows machines.

If people do download and use the KID, and make changes to the system, we would like to know. We would prefer if the code does not "fork". Keep track of your changes and if it is possible we will try and incorporate those changes, if applicable, into the KID code so that other countries and ourselves can benefit.

Software and System Requirements

The KID software was written by a professional programmer from our detailed specifications. The software runs on a Debian GNU/Linux server, however any other Linux distribution can be used. With minor changes to the installation procedure it will also install and run under Mac OSX. The database used is MySQL (http://www.mysql.com). The web server is Apache 1.3 server with mod_perl. The software is written in the Perl programming language and uses several CPAN modules (Comprehensive Perl Archive Network http://www.cpan.org). This perl code runs to over 150 000 lines. We are also using some open source relational database interface modules developed by Praxis for rapid application development. All of this software is available under either the GPL or other open source licence.

There is a detailed Installation Guide and a Maintenance Guide for the KID available on the KID Documentation page. However to install and manage the KID does require a reasonable level of computer knowledge. You will need to know how to setup, configure and run a web server, install software from a tarball, an RPM or Debian package and install Perl modules from CPAN.

Finally, the security of the system was carefully considered throughout during coding. However as with all systems that are connected to the Internet, install and run as few applications and services as possible, keep your system up-to-date with the latest security patches, read your log files regularly, run an intrusion detection system and check it regularly.

UISIC Database Schemas

UISIC aims to facilitate local and international storage, use and exchange of data related to caves and karst by developing and publishing related information-handling standards ( http://www.uisic.uis-speleo.org/exchange/exchprop.html).

These standards include definitions for cave and karst fields and their values, and suggested table structures for cave-related databases. The ASF's Web-based KID is the first Web-based database to implement these standards. The draft field definitions and most of the suggested table structures are used and the ASF will try to follow these standards as they evolve.

UISIC has identified the following three requirements to allow the valid transfer, comparison and/or consolidation of cave/karst data between independent databases. It is not required that the same software or database structure be used at each end of the transfer. These recommendations are still in draft, and will be discussed via international UISIC working groups before being finalised. All are invited to contribute.

1. Record Identifier: Use of a record identifier which is internationally unique and permanent for each cave or karst feature or other entity being transferred. The unique identifier for an entity being transferred consists of the concatenation of an ISO 2-letter country code, a 3-letter organisation code issued within that country, and a serial number issued by that organisation. The identifier is therefore issued locally by the creator of the record, yet is unique internationally.

2. Field Definitions: Use of internationally agreed definitions for the data fields and field values to be transferred. The draft fields and definitions can be seen on the above web pages. Fields and their values are designated by numeric codes so that they are independent of any national language yet can be expressed in any language.

3. Transfer Format: Export and import of the exchange data from/to the database via an intermediate standard UIS transfer format. A UISIC working group is currently establishing the transfer format using XML. Its name is CaveXML. Use of a standard transfer format means that the various independent database systems need allow for export/import to only one format to be able to transfer data to/from any other participating database system. (See http://www.cavexml.uis-speleo.org).

The ASF is trying out the record identifiers, field definitions and values, and suggested table structures in a real-life situation. However the CaveXML transfer format has not yet reached draft stage.

System Documentation

The software is very well documented. There is a detailed Installation Guide and a Maintenance Guide available on the KID Documentation pages. The field definitions used by the ASF are available as HTML pages produced on-the-fly from the KID database itself. The table relationships are also produced on-the-fly.

In addition to the code itself being well commented for every perl module in which object classes are defined the objects and methods are documented in HTML format.

Updaters require documentation to assist them in understanding the overall KID system, the procedural aspects of field data collection, collation and updating and in understanding the many fields in the KID and their meaning. This documentation is well advanced but much remains to be written.

Future Directions

Internationalisation: The software currently uses the English language. Future versions of the software should be internationalised so that it can be easily adapted by other countries. The KID already has excellent separation of the Perl code from the HTML markup so redesigning the user interface for other speleo groups is not too difficult. However there are areas of code which will present some problems, especially for countries that use non Latin character sets.

Further entities: Other entities we expect to add in the future are lists of articles/papers, lists of biological species, etc etc.

References

1. Australian Karst Index 1985, ASF Inc., Edited by Peter G. Matthews, ISBN 0 9588857 0 2

2. Australian Speleological Federation's Karst Index Database http://www.caves.org.au

3. The International Union of Speleology Informatics Commission http://www.uisic.uis-speleo.org

4. GNU General Public Licence http://www.gnu.org/licenses/licenses.html