1. Ontogenesis Meeting Eight Notes
University of Manchester, 12-13 May 2009
Present
Robert Stevens - U Manchester
James Malone - EBI
Frank Gibson - AbCam
Simon Jupp - U. Manchester
Nick Drummond - U Manchester
Graham Klyne - Dept. Zoology, University of Oxford
Dave Randall - Manchester Metropolitan University
Helen Parkinson EBI
Tony Burdett - EBI
Andrew Howes -?
Phil Lord - Newcastle University
Kearon McNicol - Freshwater Biological Assn
Mike Haft - Freshwater Biological Assn
Robert Davey - Natl Collection of Yeast Cultures, IFR, Norwich.
Franck Tanoh (MyGrid)
Alan Rector - U Manchester
Matt Horridge - U Manchester
Arif Shaon - Rutherford Appleton Lab
Phil - (Dave Randall's colleague)
Brian Matthews - Rutherford Appeleton Lab
2. Intro Robert Stevens:
Previously we have normalised a hand crafted ontology, Dave Randall ethnographer looked at the process, both last summer, and in December we looked at OBI and used collaborative protege 3 to build some ontology over the space of two days. We used different groups of people, 8 in total, worked in pairs, singly doing different tasks to look at the collab process with tool support. Collab protege works, doesn't crash, editable. Lacks support for communication how do you talk to people electronically. Been further releases since then, we will see if it's better. Quite impressive in an informal way we built a lot of ontology, as good as not doing with collab protege. Today we want to want to look at the same thing with the cell type ontology, and with collab protege, start to build a bit of ontology. The shape of the meeting is we will decide on scope and use cases and come up with a plan and then do it. Simulated distributed as we are all here. For me the two things of interest
1. Artefact itself
2. Primary goal, the process itself, how do we decide what we want to do, what resources we use, when are we successful alone/paired etc. Details will come in a bit.
Round table intros.
Guarantee of confidentiality from Dave R.
Robert:
Already fed back info to the developers of CP3, we may not have told them new things, but there are many approaches, interested to know what you think about CP3. We have no vested interest here, so feel free to make negative and pos comments, and comments on the process. Don't want this meeting to be constrained, as we don't yet know how to run these sort of meetings. We are getting more experience. Using OWL, compositional approach, use the reasoner, methodology is AR's normalization methodology. For those of you who are not familiar with it, will become apparent as we go. Previous experience, we'll have a couple of use cases from James and then Frank, plus contributions from the loor. SW preservation props of SW, requirements will be informative. Once we have a set of use cases and scope, then we will develop a plan, what do we want to record, what do we want in the ontology, what modules do we want? What are the restrictions on a class we need? Then we will find which resources exist, what we need to build, then we can come up with tasks and assign these.
Said nothing about later today and tomorrow, this will likely happen OK. We have done it with 10 people or fewer - hoping won't distintegrate.
James Malone
In the ma group at EBI we are building an extension to the AE database, the Atlas, we want to annotate the data with ontology terms and we want to use an ontology to visualise and query. This already works, hope to expand to more complex queries. Variables are about the expt process, design expt, properties of materials. We want to expand SW in analysis, R packages, and already in some depth - in the context of OBI we are looking at a gene pattern use case, web based workflow tool for ma pipeline analysis. We want to capture some of the SW details. We have a detailed use case from GenePattern on a s/sheet will send around. He has a set of sw modules that we want in a ontology. Task to break these down, talk about what we want to represent. We have started within OBI. E.g. gene pattern for K means, gene pattern SW module has other classes working with this, input, output, objectives - different intent per module, and we also have parameters. Discussion on whether to module these sep. The Broad who build GenePattern. want to a have a GUI that presents tasks and SW modules than can be used to achieve a task, build a pipeline. Already done some of this work, have nearly 100 sw modules, I'd like to get this content into the SW ontology
RS: Are you interested in what platform they will run on,versions, dates
JM: Yes, but not done that so far. GenePattern is all pipeline based, implemented in R, excel input, C etc. Format of input/output important
RS: Do you care about the algorithm?
JM: We capture the algorithm and the objectives. We have some of the data transformations, we'd like to make the link between the sw that can run the algorithm and the task e.g. clustering or classification
BM: We have started to write an ontology for data/sw preservation, this may be a test bed for this group. Done some trials in the lab, look promising. Offer it as a thing to start with.
Franck Tanoh (plus slides):
The MyGrid ontology
Service discovery, web service annotation in MyGrid - running. building, describing wfs.
Service discovery problem - where are they, what do they do, how to invoke, what are their parameters. e.g. emma for clustalW
semantic annotation of services - myGrid ontology
Service ontology, - part 1. where is it, how many inputs/ouputs, who hosted, what is the 'type' part 2. domain ontology - bioinformatic algs, tasks, molecular biology, formats, domain specific data sources
Map ontology to elements of WSDL, simple tagging using the ontologies.
GUI called FETA Search by task, output/input, resources used, method used, etc
http://www.mygrid.org.uk/tools/service-management/feta/
Ontology 7 years old, ~600 classes, automated workflow composition. Better support for hiding complexity and exposing semantics clearly, etc
Biocatalog, aims to create a library of workflows, collab ontology building important for biocatalog, different ontologies coming together in one regsistry.
RS: Experience in MG is the frustration when you find a ws and the operations have odd names, all param are 'string' no matter what complexity and using the ontology to find the input and output is very imp. SW is not named informatively, BLAST is an acronym, we don't know what emma is. Things are changing, projects looking at planning techniques, want descriptions of pre and post conditions, know that takes a protein seq or data, does it need to be normalised, recording this detail is needed. Already see that there is a commonality, task appears, input, ouput, these are annotation tasks.
JM: Genepattern is also about template building.
Areep. The annotation mechanism - there is a GUI FETA - is that for annotating a webservice, automated?
FT: Before was hard, we have a web front end on biocatalog, model defines what you need to annotate. You submit a service and we can annoytate the input and the output.
RS: Most of the services are undocumented and then Franck infers what they do
FT: We want to build an env where we can contact the providers
PL: Do you annotate quality, e.g. quality is the ws is not doc
FT: We monitor q of services, thumbs up/down depending on annotations. Social engineering
PL: There are uptime criteria, what happens when it exits
FT: these are subjective and objective - e.g. good for small amt data, runs fast. Split into the sub/obj - service is there running.
AR; Also an issue that we can crit things that are in no state to be criticised. In other bits of ontology and SW we need to know the status
RS: Dead alive, supported
BM: We seek to address that
Brian Matthews (plus slides)
Preservation of SW, attributes, functionality, environment, dependecies. User interactions, SW that we have and preserve, GUI changes, command line etc. Factors to take into account. Needed a FW to express the preservation properties of SW. FW, parts of may be useful.
FW has three parts, performance model, - what does it mean to perserve, retrieve, reconstruct, replay (RRR) Adequacy of performance
model to describe artefacts, digital objects, versions and variants
what is the thing, simple model for doing this - id components
properties - for R, R, R
built an ontology.
Product-version-variant-instance
product - whole sw, e.g. linux, gross functionality
version - release, changes in funct
variant - version for a platform, os and env
instance - variant in a place, ownership, one licence, fixed to mac or ip address
JM: What's the relationship between these?
Areep: Product - has_version, has_variant relns
MH: Do you need to sep the binary copy and the running copy, seems complex
BM: we don't do that here. The model can cope with both, we don't care whether we preserve the source code, or the binary version
PL: two ways version, one sw, one name, one sw, one name difft sw e.g. genespring new is not related to the last thing. XP/Vista
BM: these are different
PL: what's the sim then?
BM Xerces XML parser, looks like one product, different for c and java, but they are different development paths. Sep between things is vague.
PL: Dependencies, e.f. test suites, sw fulfils a test one one platform not on another.
BM. Tests are low level at variant
GK: When you worked on thesuarus you had broader than narrower than in terms of info retrieval, dependencies between a and b could use the same relns.
BM: yes
FRBR model from library world, analogy here
Component model -
What descrived are abstract, associate binary files e.g. with these as components. source, binary, config, doc, test
Relate all this to an archive model OAIS.
Product props - licence, who offers the licence, inputs/outputs, description. e.g. Xerces -
HP: Do you deal with defects
GK: If it's known then it's part of the documentation.
GK: suggestion - getting a sense between version/variant is arbitrary and possibly broken.
BM: may not be useful in all cases.
HP: we can see if the reasoner will figure it out, suspect that it's not all that clear
BM: Instance properties - not a huge no of properties for preservation, more specificity, URLS, not that useful for preservation, capacity there in the model.
MH: is Xerces C++ a class?
BM: is an instance of a product
MH: when you talk about mac addresses, is complex, many levels
BM: trying to break down the complexity. Model was to organise the complexitu
RS: digital preservation different from the bioinf task description. In Helen's world the idea that get some data in an array expt, used some SW, people don't want to recreate that SW, that is not a requirement, could be.
BM: Don't get to get locked down into preservation, model to add info is the same.
Shows ontology.
RS: Anything to say about use cases before we start stage 2. Chance to chip in with micro use cases later ?
RS: We have done some info gathering, we have aspects of SW to describe, want to list this, go through these. We will not spend two days philosophising. We will use some aspects of upper ontologies for organisational purposes.
3. Group listing of SW desiderata
* Task - what does it do?
Biological perspective vs. BM described technical view. DNA and EST are the same data, but have different qualities. HP: these are different in how the data are generated. RS: SW supports a task, data have bio aspects as well as technical. Bio aspect affects what SW is used. KM: hypothetical use case might help. RS: keep James's use case in mind, where have a database where want to record info about an ma expt, produced some data, has been analysed and record aspects of that for search purposes. Also want to find a sw, same problem. KM do we care how it does it? RS: yes, if SW does a task on some data, and SW did it with x algorithm, then I don't trust that. KM: we care about the algorithms, but not the technologies.
* SW task
* SW
* Data
* Input
* Output
* Constraints - string, int,
* preconditions
* post conditions
* provenance (history/origin)
* access control
* terms of use
* licencing
* variables
* parameters
* quality (reputation, is it any good? both subjective, objective).
* Algorithmic properties - complexity, alg vs heuristic (objective/subjective)
* Performance
* Operators - functions - readers writers
* Interface style - command line/GUI/web service/
* Platform (hardware)
* Implementation language
* Documentation
* Serial number
discussion on where this lives and whether we care. Seems not much.
* API
* Hardware dependencies/SW dependencies - technical point of view, also user point of view.
* Time dependence e.g. on licences and other things
* People, Organization, owner - (also have time aspect e.g. for support)
* Source - where do we get it from - download from SF, buy a physical CD.
* Service vs. downloadable SW (GK: makes a real difference practically)
* Expertise needed - e.g. type of user - biologist vs. bioinformatician
* Is the process lossy or not (GK: lossy vs non lossy transform, important when building a pipeline)
* Software project (e.f. SF)
* Architecture
* Deployment
* Format
*Content/meaning
*Biology
*Data format schema
Discussion of the list
RS: seems primary distinctions are SW, Data, everything else is about SW and data.
HP: People and organizations are different.
GK: are we concerned with SW production or just SW - we think that we just care about SW.
KM: People want to find SW, understand which version, don't care how the version was developed.
JM: We can come up with some competancy questions
ND: Primary need to describe programs used in biomedical investigations. We don't want to duplicate the artefact ontology in a more general sense, pus to the bio sense.
HP: for me this is about the task
RS: from the talks, the task is one of the primary things we care about.
GK: card sorting, we can stick these to the whiteboard. Three headings, sw, data, people and organisations. Most things are under SW.
Organising the list as a group (see photos)
HP: Can we get an example?
4. Discussion
Dependencies - what does this mean
Arif: need requirements and make it more granualar
Platform, dependencies, user expertise,
HP: Do we know what we mean by architecture - components and the way that they fit together.
AR: There are categories of the ways that you put things together. Not clear to me if the GUI is about how it works, or is a thick or a thin client.
RS: UI in HCI terms is direct manipulation, wizard form filling, command line, GK: batch, interactive
RD: some things like R have command line UI but do produce GUIs
GK: Should we see if we can form another cluster and see what easily groups. What does it do, what alg does it use, these are 'functionality' - groups these together.
RS: also add inputs and outputs, error conditition
HP: isn't this another kind of output
FG: are we collab doing this with paper rather than protege
RS: we can do this in collab protege. After this pass we can start doing that. Sub question, when do we move to actually a more tech based discussion. Approaching that, worth doing large scale face-to-face
ND: Do we still need to think about scope. Is all the stuff on the board in the ontology
RS: I am noting what's out of scope for me. Clustering, we can look at lumps that are out of scope.
BM: Think about the task doing, my talk stages we went through were finding SW, rebuild it, run it to complete a task. Use case.
PL: If you are doing use case, then we have different outputs, sys admin, bioinformatician, the system you described is more towards the sys admin role, rather than the bioinformatician
RS: in MyGrid world building it and deploying are irrelavant, looking for what's there and available 4000 services, what do they do. what are the ins and outs.
BM: yes but, protege, I don't build it, I want the version, I am my sysadmin for this sw. If I need to go to sf and get source code
GK: leads back to the user -
PL: I was asking which user do we care about?
HP: there are different users, command line, GUI etc
RS: the functionality cluster seems most important, the others are not irrelevant, maybe not this scenario
GK: focus in on functionality and draw in others as we need
RS: complete the stuff under SW, does more go into the functionality SW
GK: current functionality SW task, what does it do, what algs are used, what data is used (bio and tech), what performance, post and pre cond, what inout is needed, what output, what vars and params, constraints on the data. Non orthogonal
others: media, quality, reputation. data and people - provenance, version, people, expertise, users, docs, serial no
FT: This is what we do in biocatalogue, these are converging. Slides.
Functional aspect, service model - how it works, social standing etc.
GK: we have more detail, and defined categories.
RS: reassuring. We have this basic classification, everything on the board is relevant. Functionality cluster is the bit we can concentrate on. Next stage, to decide what's overlapping, constraint, precondition, post condition,
HP: we can also come bottom up, and do that separately, maybe tomorrow.
LUNCH
RS: Two ways forward, take the top level categories, put these in ontology, record etc and start e.g. Arif's suggestion about requirements h'archy. One group can start that. Helen suggested - not top down, look at the bottom up process, tasks that the SW support and get things here. In that case we can also use collab protege at each end of the room then we do things with collab protege. May not become the ontology, way of starting to organise things. We then report about this, and has worked in previous meetings. Any other tasks that people want to do?
ND: Simon and I will look at SWP into the collab protege, software preservation ontology so we don't start from scratch
RS: once we decide what to do, look at what can be plundered or extended.
ND: from the bottom up side can look at the mygrid ontology stuff
GK: Top down needs data from bottom up
RS: we can all do bottom up, and see if there's enough to do top down
HP: we can do two groups of bottom up
ND: we can use service vs. other views on things
RS: if we were fewer than 10 people, we could use the whiteboard, as we are more, not sure useful. Borderline for this group.
5. We split the groups and both work on the bottom up 'task' part of things
We decide to use the same ontology (so we can use the chat feature) and have two different classes at the top. Group1 and Group2 classes.
KM: Do we want to set down some collaborative ontology rules. Only use the chat, preferable to use chat to get someones attention for e.g.
RS: CP is just protege 3, there's a mechanism for attaching notes and threads to things, and there's chat aspect. In December meeting when you left notes on things, there was no notification mechanism, to look at the class and see what's there. Chat feature are notified that someone has made a note. So we only used chat. CP3 has not changed in this aspect. KM is right, if we use this then we need to use the chat as a notify tool. Chat's not threaded, about or to anything. 'Group1 look at this'
HP: as biologists we can think of SW instances - and see what functions they fulfil.
JM: if you are familiar with protege then this is easy to do with CP chat and discussion.
RS: OK let's try, if inadequate for the planning stage of things
James Malone gives a demo of CP3 in server mode.
We decide not to delete anything, we will use a parent obsolete node.
Frank G presents in protege for group 1.
sw defined in terms of input, output, task, e.g. blast. Algorithms, e.g. Blastn, blastp. Formats are defined as well.
Data format - transform the same info in different formats - does it matter if this is lossy at transformation, what does lossy mean, losing XML tags on convert to plain text.
Discussion about parameters, default values are instantiated in some implementations, and not available for the users.
Need to constrain the input e.g. DNA/RNA
Approach was to take a program and describe it from scratch, BLAST.
Next task is to do the user experience
RS: was the order important
Tony B presents in protege for group 2.
Different approach - we had issues with collab protege, dual effort, data into a s/sheet, Tony and Simon adding to protege, lot of SW descriptions, not categorized. Decent spread of applications and we have some inout and output data types, and the tasks are categorized.
JM: We concentrated on concepts of interest, tasks, SW, data types, definition and cols of aboutness, input/output/format - if we can then code these into the ontology. Split task, SW, data, format.
TB: There's a lot of overlap.
HP: We have some implicit relations between things that we need to model, sw that implements algorithm
RS: we have started to model along the lines we discussed this am. We started to get to users. We have got a consistent list of things. These are all ways of classifying SW - can we draw a list of stuff that we will talk about?
ND: Do you mean comptency qus
RS: No we have a consenus
HP: These are the easy things, or the first things,
RS: What to do next? Plan for tomorrow. If we ran along the same agenda as for the cell type ontology, we now have the initial axes of classification, we can build a single ontology with a tangled h'archy, we can choose some of these things, or some of these, then what h'archy do we build.
RD: Functionality parts are easy to do - output input, etc all easy to do
GK: how much do we need to do to get an ontology for sw not an ontology for blast.
RD: the other group has a lot of types of sw, text mining tools, normalization things, biopython, this addresses the granularity
RS: there are 1000s of sw. What's a representative sample for this? When we did the cell type ontology work there were 450 types of cell, by end of day two, we had done 20 of them. Suspect that we can do 10 bits of SW.
HP: if we use the OPPL script then it will go many times as fast
RS: we had supporting ontologies was fast
HP: the biology was hard with the CTO though
RS: we can choose the primary axis of classification, in the bio world there was no single axis. We were looking ontoclean rigid characteristics - there were none. Ploidy, nucleation, etc - things are not temporally stable. Then we use the reasoner to build the hierarchy e.g. all SW that takes DNA for an input. We use s/sheet to capture the info, and don't do anything with collaborative protege.
we may want to use the s/sheet to describe the software
ND: you'd need to know that the pattern is good.
RS: yes Mikel had to write rules, most of the work we did was with the spreadsheet, and the plan
HP: depends on what the goal is - we usually want to build an ontology as fast as possible, if you want to use collab protege that's a different goal
the fact that we can't have nested restriction is a problem
FG: we want to run the reasoner after each restriction, performance issues
ND: are there going to be obvious nested expression, are these useful
HP: universally applicable
AI: Decide on three bits of sw, build an ontology for that. Take the list of terms that we described and get a list of diverse applications.
RD: Bioperl, biopython, chained events e.g. sequences alignment, where inputs become outputs, k means clustering etc
RS: is twelve too many people.
ND: the tasks need to be merged.
HP: collaboratively with three editors.
RS: still have restriction issue - then export to protege 4 and do this at the end.
ND: that will take 20 mins per ontology once it's in P4
AR: suggest we give Stanford a ring and see if we can sort this out this evening.
RS: we need to go through for the class SW which restrictions we put in it, what those are going to be etc
6. Wednesday 13 May 2009
RS: Suggest now that we go through the list of SW and and some top down classes and now we can start make the list more formal. We need to write some natural language defs of the classes and then decide on what the inputs etc are. Tony and Rob were recording groups 1 and 2. Rob shall we start with your blast example.
ND: blast is not asserted to be a SW, SW should match based on the inputs and outputs, it's a domain class, input of blast database, fasta sequence, we want that to be a union.
RS: is an input database a parameter for blast
PL: is both
HP/GK: what's the difference between a parameter and an input
RS: seems like an input, grep -i makes it case insensitive
GK: this is an artefact of invocation
FG: parameters constrain the input set
RS: difference between flicking a switch
PL: formally no distinction, commonly used. All inputs to the program
CB: I see a difference, params are finite set of possibilities, data is open ended, any sort
general disagreement
CB: still setting a param e.g. line length, x, y, z - data could be a huge block
ND: the seq param is using the blast DB, querying the sequence
PL: it's a user orientated distinction, the thing you care about and what modifies
JM: you could put in the same data and different params. We discused this in OBI, the diff between data and param is convention as inputs. It's a role of some input.
PL; cashpoint, output are cash, card, receipt, you never leave without cash, you might leave without card
TB: context is imp, UI can be parameterised, and invoke difft programs, the code has no param, but the GUI does
RS: Enough, has input database, has input x, and this has a role of being a parameter, or has_parameter.
JM: either is OK
GK: don't want to consider this
FG: we do this as has_input
HP: can you run blast without the db?
RD: some cases use a fasta
RS: input data is in blast format
RD: blast needs at least one fasta sequence. blast db is not necessary, need something to search again.
RS: min 2 fasta seq are the input.
PL: if you just use 2 seqs then that's a stupid way to do blast
HP: but for a pairwise alignment then some algs will have 2 inputs and we will want to model this, so we should be precise
PL: the blast user doesn't need to know about the format
RD: are we modelling the users of SW, or the SW
PL: blast database needs a name
RS: we have a precondition of format, may not be appropriate here. Ever a case where we care that this is in blast format
FT: if you want to run your own blast then maybe, front end one then you don't care
RS: if I have my own blast, and my seq database in a taverna workflow, then I need to reformat the blast database. When we use a web front end, then no-one cares. Phil, Mr formal ontology man, if I were to describe there's a condition of blast format, how would we represent that. Has precondition.
FT: depends on what the precondition is for, is it the SW or the user
PL: input of blast database and a precond that the blast database is useable
RD: some of blast's algs don't need a database
PL: if we do all of the ors we will get lost, let's model one of the ways to invoke
RS: shall we put something into protege
PL: id for a blast db, precondition that the db is is the correct format
GK: the precondition in isolation is not meaninful, precondition for what, we need the context
HP: maybe preconditions have context and are therefore special
GK: if the precondition is not satisfied what are the consequences
FG: does run, but doesn't give what you expect
HP: error is an output
GK: had a different view of precondition. if you run the program then the preconds hold
RS: if I produce some data, pass to this program, the data has to be of this type, satisfy these precond to be a sensible input, how formally shall we capture this.
GK: how formally capture this
RS: so that the program works
ND: GK is describing a state machine
CB: also an issue of granularity. Input must be in comma sep values, or more detail than that
GK: hypothetical program, csv, or xml input, output csv, xml paired precondition
ND: now describing everything that the program does
GK: is that not the point?
ND: depends on the use case, for taverna need less info
GK: I made the point I wanted to make and I can back off
FG: inputs, and there are preconditions
JM; are modelling inputs and outputs as bags, GK is more precise
FG: as a first pass then we can do this less granular, and then be more precise
HP: Graham are you happy?
FG: I don't think we disagree
GK: ok lets continue
RS: we can't relate input and output together Duncan Hull did this and it didn't work
- some group editing -
FG: there's also a subset of a fasta as an input
HP: fasta seq is a format, the subset is the range
FT: can also be an accession number from an external example
- some more group editing -
PL: the output is a blast output
RD: prioritized alignment is the output
HP: is there a task about alignment?
ND: output some prioritized output list, has format, plain text or xml
RD: data structure which the ouput is, and that is represented in some kind of format
RS: has format and has precond are this different?
ND: we imply that satisfy the inputs is a precondition
HP: is a precondition a minimum set if inputs
FG: no - as all inputs have preconds
ND: do we want to specify the min inputs
RS: blast is very complex
JM: plenty of neural nets that we could have specified
FG: optional input/is required input
ND: there's a modelling issue of optionality in OWL
RS: think FG's ppoint about required and optional is valid, but leave as comment or note.
AI: optionality/required is important but hard in OWL so deferred
ND: task is sequence simiality and pairwise alignment
RD: there's a difference between the task for the program does and what the user wants to do
RD: user tasks are easier to model, stats algs are hard
ND: do we want to distinguish? Appl may do seq sim, but the visn is the important part
FG: sw implements alg and satisfies some task
ND: does the alg include the visn
FG: does any SW produce a visn -
HP: what does graphvis do then
FG: draws an image
RS: task and alg - are they different
FG: that is the difference - user and sw tasks
RS: there are many difft algs for sequence alignment
FG; do we need to say more than SW implements alg
RG: if we model algs then that's hard
RS: psuedo code in OWL
RD: do we want to describe what algs do?
-- small gap in the notes --
RS: If SW has a version is that a datatype property with a string at the end?
GK: can a bioinformatician, looking up blast n and p are different sequences, they are different programs, do they implement the same alg
PL: yes,
GK: here they are different algs
PL: difft UI on the same program
RD: most people use blast all
FT: we are missing some params, matrix, gap alignments etc
RS: yes, we glossed over these, and went onto something else, we need to return to the params
ND: sounds like the blast program has 5/6 difft things it can do, each one of these has different inputs. Appl then the services each services has sets of params
FT: the blast service operations are difft, each constrained bythe inputs that they expect
RS: Nothing we have said so far, is not true, we have underspecified that's all on purpose. Blast p is a subclass of this, input blast n, blast p
GK: on the screen - 'implements some blastn, blastp etc' implements range is an algorithm, and these are modelled as alg. This is not the role of blastp/n - this is a user task not an algorithm
RD: yes, correct
FT: depends on what we mean by algorithm, blasn and blastp have the same algorithm
RS: we just need blast algorithm the blastn blastp describe parameters
RD: these are constraints to nucl or protein
HP: we need the task to be more specific
RS: subclass of task, nucl seq sim, has input blastn. doesn't need to be complete, can be underspecified
GK: ok blastn/p etc make these subclasses of seq sim
RS: blastn/blastp are not the task
RD: blastn is not a task
GK: then I don't know what the task is then? what does that mean?
RS: nucl seq/seq sim that's a task, blastn is not a task
GK: ok
ND: task is seq sim, application is blastm
RS: like the difference between case insensitive search and -i on grep
FT: we model blast and then the functional units of blast
RS: leave the description - and tasks in nucl seq-seq and input is blastm
FG: blastn has specific params, blastp has specific params, agree with RS (** Frank agrees - special occasion)
RS: complete a broad sweep, what version of blast, would I use a data type property plust string?
PL: yes, blast is forked, many implementations
HP: the front ends are different, as are the ouputs
RD: with blast the params are different e.g. fragmented genome
PL: the code is different so is difft algorithm
-- blast is complete --
RS: OK blast is done, underspecified, but OK. Now we will divide into smaller groups, and has one bioinf, protege person, each group chooses a tool which they have expertise and models it. Or we could get Franck Tanoh, to take some alg description from the MyGrid ontology, and add these to the ontology
FT: we have both tasks and alg and they are linked
PL: alg is limited in useage in bioinf e.g. smith waterman, blast includes SW and low complexity masking. Experience alg is not that useful sep from the program. Not useful to sep the alg from the implementation
FT true, people don't really care, technical user don't care, care about tasks of the program
PL: there are some cases of 30-40 implementations of different algorithms, so model the user tasks
RS: how do we do that? Some people describe SW, do they say their user task or make requests to FT to do that. FT can make a tree
JM: use collab prot to do this
PL: if you pull the tasks we have in then we'll have most of it
RS: FT will do that then
HP: how many are there?
FT: 12 tasks - add them all in
JM: we have some in OBI
HP: can Jm and FT work together
RS: we have a lot of bioinformatician
PM session
Review of the classes that we have added in the am session.
Discussion on thing of interest vs. reference
AI: SW project vs SWCollection - discussion on what the difference is needs to be resolved
JM: explains how OBI and the IAO deals with this e.g. the code , MS word, and copy of that is an instance and these relate to the class. This is what BFO deals with. James and Frank think this example is the wrong to do this.
FG: is it helpful to model SW as instances is the question, does anyone have a reason why it's better modelled as instance
TB: difference between SW application and SW library -
PL: should be sibs, what's the disctinction, API/Interface. SW library has an API, not any other interface
GK: one is style of interface, other thing is the library is a collection
ND: library is a component part of SW application
PL: use in the services env for FT, all services have an API only all services = library
ND: we need to be strict about API
GK: SW and library is sim to workflow and the program in a wf. may not be a disctinctiopn that it's helpful to make. There are things that are chained together, we want to do that.
PL: specific solution?
RD: we intended, sw project and lib were desciptive classes, and SW project - versions etc. Not modelling capabilities, modelling the meta data
PL: application and library could be roles
FG: roles are transitory, do they get lost?
PL: we need a soln, suggest that we are clear there is a use, but not the distinction, we move sw appl into sw and see what turns up under there.
FG: appl does have a set of restrictions, suggest we use the restrictions for binning
next task - define more stuff.
GK: how will this work discover services - do we have a good enough description to meet this use case
HP: if we can tie the sw to the task and know the input and the output then we can answer that, suggest GK looks at the tasks and ties a piece of SW to each SW.
ND: I will do the disjoints
JM: I couldn't do that when I tried