OntogenesisMeetingEightNotes

1. Ontogenesis Meeting Eight Notes

University of Manchester, 12-13 May 2009


Present

Robert Stevens - U Manchester

James Malone - EBI

Frank Gibson - AbCam

Simon Jupp - U. Manchester

Nick Drummond - U Manchester

Graham Klyne - Dept. Zoology, University of Oxford

Dave Randall - Manchester Metropolitan University

Helen Parkinson EBI

Tony Burdett - EBI

Andrew Howes -?

Phil Lord - Newcastle University

Kearon McNicol - Freshwater Biological Assn

Mike Haft - Freshwater Biological Assn

Robert Davey - Natl Collection of Yeast Cultures, IFR, Norwich.

Franck Tanoh (MyGrid)

Alan Rector - U Manchester

Matt Horridge - U Manchester

Arif Shaon - Rutherford Appleton Lab

Phil - (Dave Randall's colleague)

Brian Matthews - Rutherford Appeleton Lab

2. Intro Robert Stevens:

Previously we have normalised a hand crafted ontology, Dave Randall ethnographer looked at the process, both last summer, and in December we looked at OBI and used collaborative protege 3 to build some ontology over the space of two days. We used different groups of people, 8 in total, worked in pairs, singly doing different tasks to look at the collab process with tool support. Collab protege works, doesn't crash, editable. Lacks support for communication how do you talk to people electronically. Been further releases since then, we will see if it's better. Quite impressive in an informal way we built a lot of ontology, as good as not doing with collab protege. Today we want to want to look at the same thing with the cell type ontology, and with collab protege, start to build a bit of ontology. The shape of the meeting is we will decide on scope and use cases and come up with a plan and then do it. Simulated distributed as we are all here. For me the two things of interest

1. Artefact itself

2. Primary goal, the process itself, how do we decide what we want to do, what resources we use, when are we successful alone/paired etc. Details will come in a bit.


Round table intros.

Guarantee of confidentiality from Dave R.

Robert:

Already fed back info to the developers of CP3, we may not have told them new things, but there are many approaches, interested to know what you think about CP3. We have no vested interest here, so feel free to make negative and pos comments, and comments on the process. Don't want this meeting to be constrained, as we don't yet know how to run these sort of meetings. We are getting more experience. Using OWL, compositional approach, use the reasoner, methodology is AR's normalization methodology. For those of you who are not familiar with it, will become apparent as we go. Previous experience, we'll have a couple of use cases from James and then Frank, plus contributions from the loor. SW preservation props of SW, requirements will be informative. Once we have a set of use cases and scope, then we will develop a plan, what do we want to record, what do we want in the ontology, what modules do we want? What are the restrictions on a class we need? Then we will find which resources exist, what we need to build, then we can come up with tasks and assign these.

Said nothing about later today and tomorrow, this will likely happen OK. We have done it with 10 people or fewer - hoping won't distintegrate.

James Malone


In the ma group at EBI we are building an extension to the AE database, the Atlas, we want to annotate the data with ontology terms and we want to use an ontology to visualise and query. This already works, hope to expand to more complex queries. Variables are about the expt process, design expt, properties of materials. We want to expand SW in analysis, R packages, and already in some depth - in the context of OBI we are looking at a gene pattern use case, web based workflow tool for ma pipeline analysis. We want to capture some of the SW details. We have a detailed use case from GenePattern on a s/sheet will send around. He has a set of sw modules that we want in a ontology. Task to break these down, talk about what we want to represent. We have started within OBI. E.g. gene pattern for K means, gene pattern SW module has other classes working with this, input, output, objectives - different intent per module, and we also have parameters. Discussion on whether to module these sep. The Broad who build GenePattern. want to a have a GUI that presents tasks and SW modules than can be used to achieve a task, build a pipeline. Already done some of this work, have nearly 100 sw modules, I'd like to get this content into the SW ontology

RS: Are you interested in what platform they will run on,versions, dates

JM: Yes, but not done that so far. GenePattern is all pipeline based, implemented in R, excel input, C etc. Format of input/output important

RS: Do you care about the algorithm?

JM: We capture the algorithm and the objectives. We have some of the data transformations, we'd like to make the link between the sw that can run the algorithm and the task e.g. clustering or classification

BM: We have started to write an ontology for data/sw preservation, this may be a test bed for this group. Done some trials in the lab, look promising. Offer it as a thing to start with.

Franck Tanoh (plus slides):

The MyGrid ontology


Service discovery, web service annotation in MyGrid - running. building, describing wfs.

Service discovery problem - where are they, what do they do, how to invoke, what are their parameters. e.g. emma for clustalW

semantic annotation of services - myGrid ontology

Service ontology, - part 1. where is it, how many inputs/ouputs, who hosted, what is the 'type' part 2. domain ontology - bioinformatic algs, tasks, molecular biology, formats, domain specific data sources

Map ontology to elements of WSDL, simple tagging using the ontologies.

GUI called FETA Search by task, output/input, resources used, method used, etc

[WWW] http://www.mygrid.org.uk/tools/service-management/feta/

Ontology 7 years old, ~600 classes, automated workflow composition. Better support for hiding complexity and exposing semantics clearly, etc

Biocatalog, aims to create a library of workflows, collab ontology building important for biocatalog, different ontologies coming together in one regsistry.


RS: Experience in MG is the frustration when you find a ws and the operations have odd names, all param are 'string' no matter what complexity and using the ontology to find the input and output is very imp. SW is not named informatively, BLAST is an acronym, we don't know what emma is. Things are changing, projects looking at planning techniques, want descriptions of pre and post conditions, know that takes a protein seq or data, does it need to be normalised, recording this detail is needed. Already see that there is a commonality, task appears, input, ouput, these are annotation tasks.

JM: Genepattern is also about template building.

Areep. The annotation mechanism - there is a GUI FETA - is that for annotating a webservice, automated?

FT: Before was hard, we have a web front end on biocatalog, model defines what you need to annotate. You submit a service and we can annoytate the input and the output.

RS: Most of the services are undocumented and then Franck infers what they do

FT: We want to build an env where we can contact the providers

PL: Do you annotate quality, e.g. quality is the ws is not doc

FT: We monitor q of services, thumbs up/down depending on annotations. Social engineering

PL: There are uptime criteria, what happens when it exits

FT: these are subjective and objective - e.g. good for small amt data, runs fast. Split into the sub/obj - service is there running.

AR; Also an issue that we can crit things that are in no state to be criticised. In other bits of ontology and SW we need to know the status

RS: Dead alive, supported

BM: We seek to address that

Brian Matthews (plus slides)


Preservation of SW, attributes, functionality, environment, dependecies. User interactions, SW that we have and preserve, GUI changes, command line etc. Factors to take into account. Needed a FW to express the preservation properties of SW. FW, parts of may be useful.

FW has three parts, performance model, - what does it mean to perserve, retrieve, reconstruct, replay (RRR) Adequacy of performance

model to describe artefacts, digital objects, versions and variants

properties - for R, R, R

built an ontology.

Product-version-variant-instance

product - whole sw, e.g. linux, gross functionality

version - release, changes in funct

variant - version for a platform, os and env

instance - variant in a place, ownership, one licence, fixed to mac or ip address

JM: What's the relationship between these?

Areep: Product - has_version, has_variant relns

MH: Do you need to sep the binary copy and the running copy, seems complex

BM: we don't do that here. The model can cope with both, we don't care whether we preserve the source code, or the binary version

PL: two ways version, one sw, one name, one sw, one name difft sw e.g. genespring new is not related to the last thing. XP/Vista

BM: these are different

PL: what's the sim then?

BM Xerces XML parser, looks like one product, different for c and java, but they are different development paths. Sep between things is vague.

PL: Dependencies, e.f. test suites, sw fulfils a test one one platform not on another.

BM. Tests are low level at variant

GK: When you worked on thesuarus you had broader than narrower than in terms of info retrieval, dependencies between a and b could use the same relns.

BM: yes

FRBR model from library world, analogy here

Component model -

What descrived are abstract, associate binary files e.g. with these as components. source, binary, config, doc, test

Relate all this to an archive model OAIS.

Product props - licence, who offers the licence, inputs/outputs, description. e.g. Xerces -

HP: Do you deal with defects

GK: If it's known then it's part of the documentation.

GK: suggestion - getting a sense between version/variant is arbitrary and possibly broken.

BM: may not be useful in all cases.

HP: we can see if the reasoner will figure it out, suspect that it's not all that clear

BM: Instance properties - not a huge no of properties for preservation, more specificity, URLS, not that useful for preservation, capacity there in the model.

MH: is Xerces C++ a class?

BM: is an instance of a product

MH: when you talk about mac addresses, is complex, many levels

BM: trying to break down the complexity. Model was to organise the complexitu

RS: digital preservation different from the bioinf task description. In Helen's world the idea that get some data in an array expt, used some SW, people don't want to recreate that SW, that is not a requirement, could be.

BM: Don't get to get locked down into preservation, model to add info is the same.

Shows ontology.

RS: Anything to say about use cases before we start stage 2. Chance to chip in with micro use cases later ?

RS: We have done some info gathering, we have aspects of SW to describe, want to list this, go through these. We will not spend two days philosophising. We will use some aspects of upper ontologies for organisational purposes.

3. Group listing of SW desiderata

* Task - what does it do?

Biological perspective vs. BM described technical view. DNA and EST are the same data, but have different qualities. HP: these are different in how the data are generated. RS: SW supports a task, data have bio aspects as well as technical. Bio aspect affects what SW is used. KM: hypothetical use case might help. RS: keep James's use case in mind, where have a database where want to record info about an ma expt, produced some data, has been analysed and record aspects of that for search purposes. Also want to find a sw, same problem. KM do we care how it does it? RS: yes, if SW does a task on some data, and SW did it with x algorithm, then I don't trust that. KM: we care about the algorithms, but not the technologies.

* SW task

* SW

* Data

* Input

* Output

* Constraints - string, int,

* preconditions

* post conditions

* provenance (history/origin)

* access control

* terms of use

* licencing

* variables

* parameters

* quality (reputation, is it any good? both subjective, objective).

* Algorithmic properties - complexity, alg vs heuristic (objective/subjective)

* Performance

* Operators - functions - readers writers

* Interface style - command line/GUI/web service/

* Platform (hardware)

* Implementation language

* Documentation

* Serial number

discussion on where this lives and whether we care. Seems not much.

* API

* Hardware dependencies/SW dependencies - technical point of view, also user point of view.

* Time dependence e.g. on licences and other things

* People, Organization, owner - (also have time aspect e.g. for support)

* Source - where do we get it from - download from SF, buy a physical CD.

* Service vs. downloadable SW (GK: makes a real difference practically)

* Expertise needed - e.g. type of user - biologist vs. bioinformatician

* Is the process lossy or not (GK: lossy vs non lossy transform, important when building a pipeline)

* Software project (e.f. SF)

* Architecture

* Deployment

* Format

*Content/meaning

*Biology

*Data format schema

Discussion of the list


RS: seems primary distinctions are SW, Data, everything else is about SW and data.

HP: People and organizations are different.

GK: are we concerned with SW production or just SW - we think that we just care about SW.

KM: People want to find SW, understand which version, don't care how the version was developed.

JM: We can come up with some competancy questions

ND: Primary need to describe programs used in biomedical investigations. We don't want to duplicate the artefact ontology in a more general sense, pus to the bio sense.

HP: for me this is about the task

RS: from the talks, the task is one of the primary things we care about.

GK: card sorting, we can stick these to the whiteboard. Three headings, sw, data, people and organisations. Most things are under SW.

Organising the list as a group (see photos)

HP: Can we get an example?

4. Discussion

Dependencies - what does this mean

Arif: need requirements and make it more granualar

Platform, dependencies, user expertise,

HP: Do we know what we mean by architecture - components and the way that they fit together.

AR: There are categories of the ways that you put things together. Not clear to me if the GUI is about how it works, or is a thick or a thin client.

RS: UI in HCI terms is direct manipulation, wizard form filling, command line, GK: batch, interactive

RD: some things like R have command line UI but do produce GUIs

GK: Should we see if we can form another cluster and see what easily groups. What does it do, what alg does it use, these are 'functionality' - groups these together.

RS: also add inputs and outputs, error conditition

HP: isn't this another kind of output

FG: are we collab doing this with paper rather than protege

RS: we can do this in collab protege. After this pass we can start doing that. Sub question, when do we move to actually a more tech based discussion. Approaching that, worth doing large scale face-to-face

ND: Do we still need to think about scope. Is all the stuff on the board in the ontology

RS: I am noting what's out of scope for me. Clustering, we can look at lumps that are out of scope.

BM: Think about the task doing, my talk stages we went through were finding SW, rebuild it, run it to complete a task. Use case.

PL: If you are doing use case, then we have different outputs, sys admin, bioinformatician, the system you described is more towards the sys admin role, rather than the bioinformatician

RS: in MyGrid world building it and deploying are irrelavant, looking for what's there and available 4000 services, what do they do. what are the ins and outs.

BM: yes but, protege, I don't build it, I want the version, I am my sysadmin for this sw. If I need to go to sf and get source code

GK: leads back to the user -

PL: I was asking which user do we care about?

HP: there are different users, command line, GUI etc

RS: the functionality cluster seems most important, the others are not irrelevant, maybe not this scenario

GK: focus in on functionality and draw in others as we need

RS: complete the stuff under SW, does more go into the functionality SW

GK: current functionality SW task, what does it do, what algs are used, what data is used (bio and tech), what performance, post and pre cond, what inout is needed, what output, what vars and params, constraints on the data. Non orthogonal

others: media, quality, reputation. data and people - provenance, version, people, expertise, users, docs, serial no

FT: This is what we do in biocatalogue, these are converging. Slides.

Functional aspect, service model - how it works, social standing etc.

GK: we have more detail, and defined categories.

RS: reassuring. We have this basic classification, everything on the board is relevant. Functionality cluster is the bit we can concentrate on. Next stage, to decide what's overlapping, constraint, precondition, post condition,

HP: we can also come bottom up, and do that separately, maybe tomorrow.


LUNCH


RS: Two ways forward, take the top level categories, put these in ontology, record etc and start e.g. Arif's suggestion about requirements h'archy. One group can start that. Helen suggested - not top down, look at the bottom up process, tasks that the SW support and get things here. In that case we can also use collab protege at each end of the room then we do things with collab protege. May not become the ontology, way of starting to organise things. We then report about this, and has worked in previous meetings. Any other tasks that people want to do?

ND: Simon and I will look at SWP into the collab protege, software preservation ontology so we don't start from scratch

RS: once we decide what to do, look at what can be plundered or extended.

ND: from the bottom up side can look at the mygrid ontology stuff

GK: Top down needs data from bottom up

RS: we can all do bottom up, and see if there's enough to do top down

HP: we can do two groups of bottom up

ND: we can use service vs. other views on things

RS: if we were fewer than 10 people, we could use the whiteboard, as we are more, not sure useful. Borderline for this group.

5. We split the groups and both work on the bottom up 'task' part of things

We decide to use the same ontology (so we can use the chat feature) and have two different classes at the top. Group1 and Group2 classes.

KM: Do we want to set down some collaborative ontology rules. Only use the chat, preferable to use chat to get someones attention for e.g.

RS: CP is just protege 3, there's a mechanism for attaching notes and threads to things, and there's chat aspect. In December meeting when you left notes on things, there was no notification mechanism, to look at the class and see what's there. Chat feature are notified that someone has made a note. So we only used chat. CP3 has not changed in this aspect. KM is right, if we use this then we need to use the chat as a notify tool. Chat's not threaded, about or to anything. 'Group1 look at this'

HP: as biologists we can think of SW instances - and see what functions they fulfil.

JM: if you are familiar with protege then this is easy to do with CP chat and discussion.

RS: OK let's try, if inadequate for the planning stage of things

James Malone gives a demo of CP3 in server mode.

We decide not to delete anything, we will use a parent obsolete node.


Frank G presents in protege for group 1.


sw defined in terms of input, output, task, e.g. blast. Algorithms, e.g. Blastn, blastp. Formats are defined as well.

Data format - transform the same info in different formats - does it matter if this is lossy at transformation, what does lossy mean, losing XML tags on convert to plain text.

Discussion about parameters, default values are instantiated in some implementations, and not available for the users.

Need to constrain the input e.g. DNA/RNA

Approach was to take a program and describe it from scratch, BLAST.

Next task is to do the user experience

RS: was the order important

Tony B presents in protege for group 2.


Different approach - we had issues with collab protege, dual effort, data into a s/sheet, Tony and Simon adding to protege, lot of SW descriptions, not categorized. Decent spread of applications and we have some inout and output data types, and the tasks are categorized.

JM: We concentrated on concepts of interest, tasks, SW, data types, definition and cols of aboutness, input/output/format - if we can then code these into the ontology. Split task, SW, data, format.

TB: There's a lot of overlap.

HP: We have some implicit relations between things that we need to model, sw that implements algorithm

RS: we have started to model along the lines we discussed this am. We started to get to users. We have got a consistent list of things. These are all ways of classifying SW - can we draw a list of stuff that we will talk about?

ND: Do you mean comptency qus

RS: No we have a consenus

HP: These are the easy things, or the first things,

RS: What to do next? Plan for tomorrow. If we ran along the same agenda as for the cell type ontology, we now have the initial axes of classification, we can build a single ontology with a tangled h'archy, we can choose some of these things, or some of these, then what h'archy do we build.

RD: Functionality parts are easy to do - output input, etc all easy to do

GK: how much do we need to do to get an ontology for sw not an ontology for blast.

RD: the other group has a lot of types of sw, text mining tools, normalization things, biopython, this addresses the granularity

RS: there are 1000s of sw. What's a representative sample for this? When we did the cell type ontology work there were 450 types of cell, by end of day two, we had done 20 of them. Suspect that we can do 10 bits of SW.

HP: if we use the OPPL script then it will go many times as fast

RS: we had supporting ontologies was fast

HP: the biology was hard with the CTO though

RS: we can choose the primary axis of classification, in the bio world there was no single axis. We were looking ontoclean rigid characteristics - there were none. Ploidy, nucleation, etc - things are not temporally stable. Then we use the reasoner to build the hierarchy e.g. all SW that takes DNA for an input. We use s/sheet to capture the info, and don't do anything with collaborative protege.

[RS]we may want to use the s/sheet to describe the software

ND: you'd need to know that the pattern is good.

RS: yes Mikel had to write rules, most of the work we did was with the spreadsheet, and the plan

HP: depends on what the goal is - we usually want to build an ontology as fast as possible, if you want to use collab protege that's a different goal

[RD]the fact that we can't have nested restriction is a problem

FG: we want to run the reasoner after each restriction, performance issues

ND: are there going to be obvious nested expression, are these useful

HP: universally applicable

AI: Decide on three bits of sw, build an ontology for that. Take the list of terms that we described and get a list of diverse applications.

RD: Bioperl, biopython, chained events e.g. sequences alignment, where inputs become outputs, k means clustering etc

RS: is twelve too many people.

ND: the tasks need to be merged.

HP: collaboratively with three editors.

RS: still have restriction issue - then export to protege 4 and do this at the end.

ND: that will take 20 mins per ontology once it's in P4

AR: suggest we give Stanford a ring and see if we can sort this out this evening.

RS: we need to go through for the class SW which restrictions we put in it, what those are going to be etc


6. Wednesday 13 May 2009

RS: Suggest now that we go through the list of SW and and some top down classes and now we can start make the list more formal. We need to write some natural language defs of the classes and then decide on what the inputs etc are. Tony and Rob were recording groups 1 and 2. Rob shall we start with your blast example.

ND: blast is not asserted to be a SW, SW should match based on the inputs and outputs, it's a domain class, input of blast database, fasta sequence, we want that to be a union.

RS: is an input database a parameter for blast

PL: is both

HP/GK: what's the difference between a parameter and an input

RS: seems like an input, grep -i makes it case insensitive

GK: this is an artefact of invocation

FG: parameters constrain the input set

RS: difference between flicking a switch

PL: formally no distinction, commonly used. All inputs to the program

CB: I see a difference, params are finite set of possibilities, data is open ended, any sort

general disagreement

CB: still setting a param e.g. line length, x, y, z - data could be a huge block

ND: the seq param is using the blast DB, querying the sequence

PL: it's a user orientated distinction, the thing you care about and what modifies

JM: you could put in the same data and different params. We discused this in OBI, the diff between data and param is convention as inputs. It's a role of some input.

PL; cashpoint, output are cash, card, receipt, you never leave without cash, you might leave without card

TB: context is imp, UI can be parameterised, and invoke difft programs, the code has no param, but the GUI does

RS: Enough, has input database, has input x, and this has a role of being a parameter, or has_parameter.

JM: either is OK

GK: don't want to consider this

FG: we do this as has_input

HP: can you run blast without the db?

RD: some cases use a fasta

RS: input data is in blast format

RD: blast needs at least one fasta sequence. blast db is not necessary, need something to search again.

RS: min 2 fasta seq are the input.

PL: if you just use 2 seqs then that's a stupid way to do blast

HP: but for a pairwise alignment then some algs will have 2 inputs and we will want to model this, so we should be precise

PL: the blast user doesn't need to know about the format

RD: are we modelling the users of SW, or the SW

PL: blast database needs a name

RS: we have a precondition of format, may not be appropriate here. Ever a case where we care that this is in blast format

FT: if you want to run your own blast then maybe, front end one then you don't care

RS: if I have my own blast, and my seq database in a taverna workflow, then I need to reformat the blast database. When we use a web front end, then no-one cares. Phil, Mr formal ontology man, if I were to describe there's a condition of blast format, how would we represent that. Has precondition.

FT: depends on what the precondition is for, is it the SW or the user

PL: input of blast database and a precond that the blast database is useable

RD: some of blast's algs don't need a database

PL: if we do all of the ors we will get lost, let's model one of the ways to invoke

RS: shall we put something into protege

PL: id for a blast db, precondition that the db is is the correct format

GK: the precondition in isolation is not meaninful, precondition for what, we need the context

HP: maybe preconditions have context and are therefore special

GK: if the precondition is not satisfied what are the consequences

FG: does run, but doesn't give what you expect

HP: error is an output

GK: had a different view of precondition. if you run the program then the preconds hold

RS: if I produce some data, pass to this program, the data has to be of this type, satisfy these precond to be a sensible input, how formally shall we capture this.

GK: how formally capture this

RS: so that the program works

ND: GK is describing a state machine

CB: also an issue of granularity. Input must be in comma sep values, or more detail than that

GK: hypothetical program, csv, or xml input, output csv, xml paired precondition

ND: now describing everything that the program does

GK: is that not the point?

ND: depends on the use case, for taverna need less info

GK: I made the point I wanted to make and I can back off

FG: inputs, and there are preconditions

JM; are modelling inputs and outputs as bags, GK is more precise

FG: as a first pass then we can do this less granular, and then be more precise

HP: Graham are you happy?

FG: I don't think we disagree

GK: ok lets continue

RS: we can't relate input and output together Duncan Hull did this and it didn't work

- some group editing -

FG: there's also a subset of a fasta as an input

HP: fasta seq is a format, the subset is the range

FT: can also be an accession number from an external example

- some more group editing -

PL: the output is a blast output

RD: prioritized alignment is the output

HP: is there a task about alignment?

ND: output some prioritized output list, has format, plain text or xml

RD: data structure which the ouput is, and that is represented in some kind of format

RS: has format and has precond are this different?

ND: we imply that satisfy the inputs is a precondition

HP: is a precondition a minimum set if inputs

FG: no - as all inputs have preconds

ND: do we want to specify the min inputs

RS: blast is very complex

JM: plenty of neural nets that we could have specified

FG: optional input/is required input

ND: there's a modelling issue of optionality in OWL

RS: think FG's ppoint about required and optional is valid, but leave as comment or note.

AI: optionality/required is important but hard in OWL so deferred

ND: task is sequence simiality and pairwise alignment

RD: there's a difference between the task for the program does and what the user wants to do

RD: user tasks are easier to model, stats algs are hard

ND: do we want to distinguish? Appl may do seq sim, but the visn is the important part

FG: sw implements alg and satisfies some task

ND: does the alg include the visn

FG: does any SW produce a visn -

HP: what does graphvis do then

FG: draws an image

RS: task and alg - are they different

FG: that is the difference - user and sw tasks

RS: there are many difft algs for sequence alignment

FG; do we need to say more than SW implements alg

RG: if we model algs then that's hard

RS: psuedo code in OWL

RD: do we want to describe what algs do?

-- small gap in the notes --

RS: If SW has a version is that a datatype property with a string at the end?

GK: can a bioinformatician, looking up blast n and p are different sequences, they are different programs, do they implement the same alg

PL: yes,

GK: here they are different algs

PL: difft UI on the same program

RD: most people use blast all

FT: we are missing some params, matrix, gap alignments etc

RS: yes, we glossed over these, and went onto something else, we need to return to the params

ND: sounds like the blast program has 5/6 difft things it can do, each one of these has different inputs. Appl then the services each services has sets of params

FT: the blast service operations are difft, each constrained bythe inputs that they expect

RS: Nothing we have said so far, is not true, we have underspecified that's all on purpose. Blast p is a subclass of this, input blast n, blast p

GK: on the screen - 'implements some blastn, blastp etc' implements range is an algorithm, and these are modelled as alg. This is not the role of blastp/n - this is a user task not an algorithm

RD: yes, correct

FT: depends on what we mean by algorithm, blasn and blastp have the same algorithm

RS: we just need blast algorithm the blastn blastp describe parameters

RD: these are constraints to nucl or protein

HP: we need the task to be more specific

RS: subclass of task, nucl seq sim, has input blastn. doesn't need to be complete, can be underspecified

GK: ok blastn/p etc make these subclasses of seq sim

RS: blastn/blastp are not the task

RD: blastn is not a task

GK: then I don't know what the task is then? what does that mean?

RS: nucl seq/seq sim that's a task, blastn is not a task

GK: ok

ND: task is seq sim, application is blastm

RS: like the difference between case insensitive search and -i on grep

FT: we model blast and then the functional units of blast

RS: leave the description - and tasks in nucl seq-seq and input is blastm

FG: blastn has specific params, blastp has specific params, agree with RS (** Frank agrees - special occasion)

RS: complete a broad sweep, what version of blast, would I use a data type property plust string?

PL: yes, blast is forked, many implementations

HP: the front ends are different, as are the ouputs

RD: with blast the params are different e.g. fragmented genome

PL: the code is different so is difft algorithm

-- blast is complete --

RS: OK blast is done, underspecified, but OK. Now we will divide into smaller groups, and has one bioinf, protege person, each group chooses a tool which they have expertise and models it. Or we could get Franck Tanoh, to take some alg description from the MyGrid ontology, and add these to the ontology

FT: we have both tasks and alg and they are linked

PL: alg is limited in useage in bioinf e.g. smith waterman, blast includes SW and low complexity masking. Experience alg is not that useful sep from the program. Not useful to sep the alg from the implementation

FT true, people don't really care, technical user don't care, care about tasks of the program

PL: there are some cases of 30-40 implementations of different algorithms, so model the user tasks

RS: how do we do that? Some people describe SW, do they say their user task or make requests to FT to do that. FT can make a tree

JM: use collab prot to do this

PL: if you pull the tasks we have in then we'll have most of it

RS: FT will do that then

HP: how many are there?

FT: 12 tasks - add them all in

JM: we have some in OBI

HP: can Jm and FT work together

RS: we have a lot of bioinformatician


PM session


Review of the classes that we have added in the am session.

Discussion on thing of interest vs. reference

AI: SW project vs SWCollection - discussion on what the difference is needs to be resolved

JM: explains how OBI and the IAO deals with this e.g. the code , MS word, and copy of that is an instance and these relate to the class. This is what BFO deals with. James and Frank think this example is the wrong to do this.

FG: is it helpful to model SW as instances is the question, does anyone have a reason why it's better modelled as instance

TB: difference between SW application and SW library -

PL: should be sibs, what's the disctinction, API/Interface. SW library has an API, not any other interface

GK: one is style of interface, other thing is the library is a collection

ND: library is a component part of SW application

PL: use in the services env for FT, all services have an API only all services = library

ND: we need to be strict about API

GK: SW and library is sim to workflow and the program in a wf. may not be a disctinctiopn that it's helpful to make. There are things that are chained together, we want to do that.

PL: specific solution?

RD: we intended, sw project and lib were desciptive classes, and SW project - versions etc. Not modelling capabilities, modelling the meta data

PL: application and library could be roles

FG: roles are transitory, do they get lost?

PL: we need a soln, suggest that we are clear there is a use, but not the distinction, we move sw appl into sw and see what turns up under there.

FG: appl does have a set of restrictions, suggest we use the restrictions for binning

next task - define more stuff.

GK: how will this work discover services - do we have a good enough description to meet this use case

HP: if we can tie the sw to the task and know the input and the output then we can answer that, suggest GK looks at the tasks and ties a piece of SW to each SW.

ND: I will do the disjoints

JM: I couldn't do that when I tried

last edited 2009-05-13 13:29:45 by helen parkinson