Data Entry Query

Roger Hyam roger at HYAM.NET
Thu Jan 9 10:24:25 CST 2003


I mainly just 'lurk' on this list. I no longer work as a taxonomist and
so I just read the digest from time for time to see how things are
going. I trained/worked as a plant taxonomist (about 15 years of my life
in all I suppose) but towards the end of my career I became more and
more involved in information technology and 3 years ago made the jump to
the commercial world - greatly increased my salary and security and
started a family before it was too late!. I tell you all this because I
believe it is relevant to the points made below. I am out of the loop.
Nobody who reads this is going to be peer reviewing a paper I submit
next week or judging whether or not I get a grant next month. Basically
I can say what I feel - which I couldn't do when I was working as a
non-tenure academic.

Here are my points:

1) I am staggered that there is still confusion about taxonomic data and
taxonomic opinion. The fact that a specimen exists in a certain
collection was collected by a certain person on a certain date in a
certain place are all "fact" as far as they can be - and will not
change. What taxon this specimen belongs to is opinion and may well change.

2) Databasing of *data* is good as it helps people who are working with
the specimens answering questions like "What else was collected here?"
"What else did this person collect at this time?"

3) Databasing *opinion* is of limited value and is really only a note on
the 'main' database record for a specimen. It may be very important from
the point of view of the output of the taxonomic process but not for the
number crunching to facilitate the process - see below. The trouble is
people are so used to specimens being arranged in taxonomic order in
collections and, dare I say it, value their own opinions on such matters
so highly that they can't see this. It is the only way they can find the
thing both in the collection and in their own mind. They hate the fact
that it could be arranged differently in some one elses mind/collection.

4) It either 'hurts' to put data into a system or to get it out -
whether that system is a computer or a collection. If specimens are
labelled and catalogued well as they go in and the collection is
maintained well then taxonomists will find it easy to reach opinions
that they can make available to others (I presume that is what we are
trying to do). This is the same for card indexes as for computers.
Curation is expensive - period. Databasing of collections is curation.
Taxonomists should complain when it is done badly but shouldn't complain
about it being done.

5) "Rapid Data Entry" is a falacy. There are 4 variables in any project
(not just IT): Quality; Time; Scope; and Resources.

* Quality shouldn't be movable - though in some cases it can be dropped
a bit!
* Time is a useful variable - we can always extend the project another
year. This is the main thing that is changed in floristic projects!
* Scope (i.e. what the project does). In reality this is the main
management tool.
* Resources (i.e. people & machines). It is difficult to change this as
it effects cost so much. Also you can't chuck more people onto a project
as they need to have skills and training. It takes 9 months for a woman
to have a baby but 9 women can't produce on in a month!

So if you want data in your database quickly (i.e. time is fixed) you
can either drop quality (not on as the database will be useless),
increase resources (probably not on as money is always tight plus the
people need to be trained) or narrow the scope (the only thing you can
control). How much can you narrow the scope till the database becomes
useless? That is probably the goal of your "Rapid Data Entry Assessment
  Project".

6) There is alot of bad IT about both in taxonomy and the commercial
world. I know some people in IT in taxonomy who are very good dedicated
people but I feel that many people with and interest/ability to run
large database projects will have quit taxonomy by the time they have
the maturity and experience to run the projects (i.e. the technology and
people side of things). I see adverts for posts that I think would be
exciting and I think would apply for. Posts where one could really make
something happen. Then I read the salary and the fact that they are in
central London/New York and realise that I could never provide for a
wife and kids on that money. The posts are either taken by incredible
dedicated monk/nun like individuals or young post docs many of whom move on.

6) Ownership of data. The main institutions need to take an Opensource
policy to their big datasets as this will allow true and fast inovation.
If you don't understand why this is true you probably don't know enough
about how to grow a large scale IT project :)

Hope this answers your question a little Fran and my blatant ranting
doesn't get in the way too much.

Roger
(an hour of my bosses time was stolen for this contribution!)




More information about the Taxacom mailing list