CpEvaluationResults


Practical experiences in concurrent collaborative ontology building using Collaborative Protégé

Supplementary Material


Contents:

  1. Collaborative Protege setup
  2. Collaborative Protege performance
  3. Statistical evaluation on generated numeric data

1. Collaborative Protege setup

Since Collaborative Protege (CP) is an extension of the standard Protégé 3 Ontology editor, most users were quickly familiar with the CP GUI. For users of other Ontology tools the Protégé GUI is quite complex, but still enabled them to contribute in a valuable manner within a reasonable short orientation phase.

CP deploys a client-server style architecture and thereby allows for concurrent editing of a single OWL file. It features notes on representational units (RU), a change tracking log for RUs (such as an edit on a class), a discussion thread and an instant messaging client for real time chat.

Figure 1: A screenshot of the CP GUI is shown. The Collaboration Panel can be seen at the left, with an annotated class highlighted in the hierarchy. A (deep) discussion thread on the Chromatography Platform is shown in the CP panel to the right. CP_GUI_example_deep_thread

All changes made, get a user/date stamp. The tool captures all changes and notes and discussions in the form of instances of an integrated Change and Annotation Ontology (ChAO), thereby providing a granular audit trail of edits and decision making.

Figure 2: A screenshot of the ChAO ontology hierarchy (to the left) in Protege. The middle pane shows the list of "Advice" instances, with the last one (an advice to remove toy classes) marked and shown in detail to the right. The inset box displays the overall metrix of the ChAO KB associated with the OBI project. Inst_of_Annot_obi

From a technical perspective, the setup was quick (30 minutes). The installation guide was clear and easy to follow. Only minor updating of the installation guideline was required.

CP can be configured with user name and password. We provided the same passwords for all our users which lead to some abuse of the ability to log on as other users (three cases of 'identity fraud').

The normal protege plugins show up and work in CP, some however need to be installed on the server as well as on the clients to work. Since most are part of the standard distribution, these are all directly usable.

The CP tabs can be configured according to a projects needs and the default set of annotation properties created on class creation could conveniently be configured according to OBI policies.

Setting the browser slot: CP copes with more complicated ontology setups. When configured correctly, it shows as display key the rdfs:labels instead of numeric IDs in the class hierarchy, just as specified in the pprj file.

2. Collaborative Protege performance

A minor GUI restriction was encountered: Expanding the full class hierarchy at once in larger artefacts (it took ca. 20 sec with OBI), or opening a class with many direct subclasses will slow down and impair performance when done the first time. However, after this had been done once, it did not slow down the clients anymore. The reason is the same as stated in the 'Reasoning performance' section.

General performance can be increased by increasing the Heap Size to 1.5GB. The max heap size on client side was set to 800MB.

Removing the concurrent example projects from the metaproject knowledge base (KB), which otherwise all get loaded simultaneously everytime when CP is started also improved performance.

The final Protégé project took 3 min to load on a 512MB P4 PC, two minutes for the project and one for the GUI.

Discussion and Annotations update throughout the clients was slow. To see an Annotation update, people needed to change a frame and only then was the GUI updated. This bug has now been rectified.

The owl file was loaded directly, rather than a converted into a relational DTB like mySQL. The performance can be increased and the risk of data loss is further minimized, when the owl file is converted into a DTB using the provided database backend. RAM requirements are less and dynamic loading will increase overall performance.

Chats get stored in a separate chat project KB, but get overridden with each server re-start.

All messages are stored with message type, username and datestamp, and hence can be evaluated in detail, e.g. via SPRQL or the Queries Tab.

3. Statistical evaluation on generated numeric data

In the following, we describe the tables and diagramms, which can be found in the supplementary [WWW] EXCEL Sheet.

Table 1: Increase of ontology size

Table 1 shows the development of the project metrics over time (RU and RA). The absolute values, as computed from the OBI.owl file at the start and at the end of the meeting are given, as well as the increase in the number of individual entities (abs and %). The annotation_OBI.rdf file grew by 46,1 % per day, so nearly doubled in size during the 2 day meeting, indicating linear growth and no performance problems when projected into the future. The OBI file grew 4,3% over the meeting course, whereby the increase in added defined classes (5 abs, 10,2%) was nearly double that of added primitive classes (4,8%), but this considerable high number is also due to the fact that the ontogenesis participants were all quite experienced when it comes to the creation of logical class definitions, e.g. compared with the whole OBI dev group, where the amount of primitive classes grows more rapidly than that of defined classes. The large 'max siblings' value is due to the OBI class obsoletion policy, that bins all deleted classes as siblings in an '_obsoleted' helper class. Only 3 object properties were created during the meeting. These were used in a total of 68 new existential restrictions representing a 9,7% increase.

Diagram 1: The above results from table 1 are here visualized as diagram: Diagram_1

Table 2: Changes done on the ontology

Table 2 illustrates the distribution of changes/actions done on the ontology by individual users (stored as ChAO instances). The general trend seems to be that people that chat a lot also did more changes on the ontology. The table also indicates that certain users were also tackling certain topics more frequently than others, e.g. user 7 working more on relations, user 5 working more on Annotations. In general, we discovered that the large differences in overall activity appear to be the result of personality-structure, experience and confidence level of the users.

The quality of the changes has not been evaluated at this time. Many classes were edited by several editors, with an average of two editors per class. Changed classes: 13, (removed and added restrictions, changed superclasses, changed from primitive to defined, added annotations). The ration of created to deleted classes was 2,1 for user7, 2,2 for user 8, 2,3 for user 3, 3 for user 6, 4 for user 5, 4,1 for user 4 and 13,5 for user 2. We hypothesize that the ratio is smaller in users that generally made more changes (outlier user 4), than in more 'careful' users. In some cases, users making the comments and suggestions were not the ones actually making the changes. Seemingly more 'exsperienced' users created tasks for others (e.g., 'add metadata', 'remove redundancy'), so patterns in annotation behaviour can be used to infer roles of users, e.g. a 'moderator role' vs 'commenter', 'chatter', 'changer' roles.

There was no power law distribution for the number of comments done per person: most users made around 10 comments. Only one, user 5, made twice as many comments as others, hinting at its 'moderator role'. This role is further indicated by the use of more granular annotation types, e.g. advice, explanation as indicated in Table 3. Regarding participant roles, it would be interesting to interview the participants on their self-assessment or try to find motivations behind certain actions, e.g. Competition, Altruism, Narcissism, Self-interest.

The mean depth of threats created was 2-3, the max thread depth was 5 (responses). Regarding the chats, only 12 messages used internal hyperlinks (4 on day 1, 8 on day 2). Issues discussed via chat were mainly about what to work on next, modeling issues and new features resp. their implementation. Experimental helperclasses were created: '_Kearon's collect devices by function classes', 'Frank's new meaning of function' and 'asserted_gibbon_disco', but only one user adhered to the OBI policy to indicate such play-classes with the underscore prefix (see first example provided).

Diagram 2: The above results from table 2 are here visualized as diagram: Diagram_2

One class was created redundantly (liquid chromatography device: [WWW] http://purl.obofoundry.org/obo/Class_46_2 and [WWW] http://purl.obofoundry.org/obo/Class_46). Note that user 3 and user 1 did not take part much of day two and user 1 is more an OBOEdit than a Protege expert.

Table 3: Usage of Annotation Types

Table 3 shows the distribution of annotation types used by individual users, sorted according frequency (total, from last annotation_obi InstanceTab). Unfortunately no annotations were made on changes of the ontology, all annotations referred to RUs in the ontology. This hints to low perception of this possability in the user group, which is mainly due to the hidden Change ontology. No SimpleProposal, FiveStarProposal, FiveStarVote and seeAlso had been carried out.

By far the most abundand annotation/note was the comment (89 abs, assumingly due to its 'default' setting). The distributions of comments over classes followed a power law distribution: A few classes had a large number of annotations (> 15 each) and a large number of classes had only one or two annotations/comments. For two users (1 and 4) the comment was the only way they annotated entities, ignoring all other annotation types. Then followed Advices (14) and AgreeDisagreeVotes (12). There were a few AgreeDisagreeVoteProposals and Questions (each 6). Example and Explanation were only used seldomly and by more experienced 'senior' users. In general the overall distribution of annotations over the annotation types was highest among the latter (i.e. User 5).

Diagram 3: The above results of table 3 are here visualized as diagram: Diagram_3


Contact: [MAILTO] schober@imbi.uni-freiburg.de

last edited 2009-04-03 11:29:51 by DanielSchober