I think this forum provides a great venue to discuss photo-identification and pattern recognition for whale sharks, and I would like to share my own experiences using I3S, the ECOCEAN Whale Shark Photo-identification Library, the modified Groth algorithm, and the affine transformation-based algorithm created by the developers of I3S. I see a very positive integration of these technologies developing. If you're interested in these topics, read on.
For those of you I didn't meet at the International Whale Shark Conference in Perth, WA in 2005, I am the programmer for the
ECOCEAN Library and one of the modifiers of the Groth algorithm (originally developed as part of the Hubble Space Telescope program) now also used to identify whale sharks from photographs.
We published our results in December 2005 in the Journal of Applied Ecology. The algorithm uses the natural spot patterning from photographs of whale sharks to uniquely identify them from a large catalog of other photos. In 2005, I won the
Duke's Choice Award from Sun Microsystems for Innovative Use of Java technology for incorporating this algorithm as one of many features built into the ECOCEAN Library, which is a web-based research application tailored for the whale shark community. Please see the "ECOCEAN Library Updates and Maintenance" section of this forum for a list of some of the functionality offered by the library. The ECOCEAN Library provides an easy portal for the submission of photographic mark-recapture data for whale sharks from the research community as well as ecotourists and diving enthusiasts around the world. Under the Rolex Award for Enterprise granted to Brad Norman in 2006, we will be expanding the library to include 20 new active research stations in 2007 and 2008, in addition to its role as a global collection site of whale shark data.
Since I started working with Brad Norman of ECOCEAN in 2002, I have seen interest in whale shark research grow, and I have been happy to see the tools available to the scientific community grow as well. In 2003, the ECOCEAN Library went online to begin collecting data. In 2004, we completed and integrated the modified Groth algorithm to provide for rapid and accurate identification of whale sharks. Since then, the functionality of and the data collected by the ECOCEAN Library has grown quite rapidly.
2007 brings new tools to the whale shark research community in the form of
I3S. The
algorithm created by the I3S developers has now been published in the Journal of Applied Ecology with specific application to spotted ragged tooth sharks but also with clear application to whale sharks. The algorithm is encompassed in a free and user-friendly tool called I3S which can be downloaded from
http://www.reijns.com/i3s. The
application of this algorithm to whale sharks has now also been published by Conrad Speed et al. from Charles Darwin University.
At the beginning of 2007, the whale shark research community is now supported by two algorithms (modified Groth and I3S) and two research software clients (ECOCEAN Library and I3S). For those of you with active whale shark mark-recapture projects or for those of you thinking of undertaking the same, a new question arises: which to use?
Here's the quick answer: all of them.
As the programmer for the ECOCEAN Library, I am thoroughly familiar with its underlying code (I wrote it). Having worked with and even now modified the I3S client, whose code base is open source, I am very comfortable discussing its structure as well. As the only person familiar with both systems, here are my observations.
Algorithms for pattern recognition and identification of whale sharksThe modified Groth and the I3S algorithms take different approaches to identifying patterns between two reference images. The I3S algorithm uses an affine transformation to put two comparison patterns into the same reference space and then examines the linear distances between potentially matched spots as an indicator of similarity in patterning. The modified Groth algorithm builds triangles from all combinations of three spots in each comparison pattern, forms relationships between the two sets of triangles by examining their internal angles, and then filters the matched triangle lists looking for commonality in magnification.
Which is better? Really, the data is not out there to support the ascendance of either. The modified Groth algorithm was published listing an 86% chance of successful identification using tested images. The criteria were a 1st ranking for the correct match among the test images. This value is an average of sharks with one or multiple patterns to test against. The I3S algorithm was published (Van Tienhoven et al.) listing 72% (exhaustive search option) as the correct match percentage for patterns with only one matched pattern ranking as the first potential match and a 92% chance of 1st ranking if three reference matches exist...using a rough average of the two values...82%. Given different data sets, different operators, etc., the published efficacy of both is identical.
The developers of I3S make the following claim:
"The key feature of the software is that it is not fully automated. The user must point out the reference points, which in the case of C. taurus are the fin origins, and the most distinctive marks. Finally, the user must select the best match from a ranked list of possible known shark images. As the user manually points out the natural marks, image artefacts such as particle reflection in the water, backscatter from incorrect flash position and flash overexposure of the flanks, can be ignored. Only those natural marks that can be clearly discerned by the human eye are selected, thereby ensuring the best possible choice. Additionally, the use of this software is beneficial as a clear image focus is not as stringent a requirement for spot patterns as it would be for other natural marks, such as notches and tears in fins. We believe that this is more beneficial to correct identification of individuals and represents a preferable option over the system reported by Arzoumanian, Holmberg & Norman (2005)."This is an interesting statement but unfortunately incorrect. The authors do not have access to the ECOCEAN Library (and never asked for it). With both the I3S client and the Spot Extractor created by ECOCEAN, spots are user-selected. Both systems gain the same benefits of avoiding "noise spots" that would result in a fully automated system in which the computer could select light reflection as a false spot, for example.
The authors also make this claim:
"An added benefit of our system is that only one computer-based package is required and the entire process, from image download, spot selection and matching, requires less than 5 min if the shark already exists in the database. If a new shark is recorded that has not been previously identified, then a more rigorous visual inspection of the database is necessary but still will be completed in a substantially shorter time than reported for other identification software packages (Arzoumanian, Holmberg & Norman 2005)."Here the authors make a claim that the I3S spot selection system is faster. Actually, this is true. Spot selection is much easier with the I3S client than with the Spot Extractor I developed for ECOCEAN. The I3S client interface is quite user-friendly. The subsequent sentence
"...a more rigorous visual inspection of the database is necessary but still will be completed in a substantially shorter time than reported for other identification software packages (Arzoumanian, Holmberg & Norman 2005)." is unfortunate because again the authors do not have access to the ECOCEAN Library (and never asked). Claims about visual inspection speed are...well, difficult to document...who is the observer...how many photos...etc. The ECOCEAN Library does feature a built-in suite of photo keyword tools and search capabilities to help narrow visual searches considerably. I3S also allows for visual comparison but without search parameters to narrow the list. I make no claim as to which is faster for visual inspection...there's no objective way to determine it.
Speed et al. also make a claim regarding the modified Groth algorithm vs. the I3S algorithm:
"One such system has been developed from an algorithm originally designed for stellar pattern recognition, and is currently being employed by the ECOCEAN whale shark database [20]. This system has great potential; however, the procedure for entering and matching patterns is complex, and neither the algorithm nor results are publicly available."This is also an interesting statement considering the author has no experience with the ECOCEAN tools or the ECOCEAN Library. I agree that extracting patterns is simpler in I3S...probably four times as fast in fact...but "matching patterns is complex" is an odd statement. In the ECOCEAN Library I just press a button to get the results, with match scores and potentially matched spots remapped to the original side-by-side images for comparison and evaluation. Click. One step. Not overly complex I think.
So, given the literature and statements by the authors? Which algorithm should you use? My opinion: both of them. In fact, I have now incorporated the I3S in addition to the modified Groth algorithm into the ECOCEAN Library. For every new pattern, we run both and evaluate the results of both. I think over time and large data sets we may find a preference, but my work with both to date has convinced me that both are of value and should be used side by side. Therefore, ECOCEAN is committed to putting the modified Groth algorithm into I3S as a C++ library that can be used for dual algorithm comparison, just like we have available in the Library. We hope to finish this within the next three months. More on this later in this posting. We stand by the strength and efficacy of the modified Groth algorithm, and we’ve enjoyed how well it has scaled over the large data set in the ECOCEAN Library. We’re looking forward to testing the I3S algorithm to its fullest in the same.
I am also looking for ways to improve both algorithms based on the strengths of the others. Here are some thoughts:
-The I3S algorithm requires the selection of three additional spots as reference spots for the affine transformation in addition to the selection of the spots on the flank of the whale shark. These spots are not required for the modified Groth algorithm. The order of these spots and their location must be standardized between data sets for effective comparison using I3S. I recommend the research community standardize on the three reference spots presented by Speed et al.: 1) top 5th gill, 2) point on the flank corresponding to the posterior point of the pectoral fin, 3) bottom 5th gill. Because spots are user-selected, each new spot introduces the potential for human error. Adding three references spots in addition to the pattern spots adds that much more potential for human error...but that can't be helped using an affine-transformation based system. Pointing out the anterior insertion point of the pectoral fin can be challenging when the fin itself blocks the observer from seeing the exact point.
-The I3S and modified Groth algorithms can use a single spot selection process. The I3S spot selection process requires three additional clicks for selection of reference points. The original Groth algorithm allows for rotational indifference, however we chose to actually make it rotation-sensitive as whale sharks have a distinctive top and bottom that we can orient to for greater matching power. We allow for only a fifteen degree difference in rotation of the shark pattern before match degradation occurs. Therefore we rotate the vertebral column of the whale shark above the patterning area to be parallel to a horizontal line before we process spots. If you are interested in using both algorithms in either I3S (near future) or the ECOCEAN Library (now), make sure you add the reference points and rotate the image correctly. Because we use both, ECOCEAN now accommodates for both, and we are busy updating all of our patterns to additionally support the I3S algorithm by adding the three reference points. They are all already properly oriented.
-I really see no advantage to the Quick Scan of the I3S algorithm. Unless you need to make a snap identification of a shark, use the exhaustive option...that's the version I put into the ECOCEAN Library.
-The I3S client currently only supports the selection of up to 40 spots maximum and requires a minimum of 12. In my modified version of I3S, I removed this limit. The Groth algorithm has no such restriction. In the ECOCEAN Library, the number of spots for each whale shark within the selected region behind the fifth gill and above the pectoral fin varies from 8 to 60+. I believe the tools should meet the research needs and not vice versa. In I3S:
"The user then points out the most distinct pigment spots on the left flank of each shark. Between 12 and 40 spots are selected within the reference area..." Choosing the most distinctive spots is far too arbitrary and subjective. I have removed this limit in a copy of the I3S client I will distribute for whale sharks.
-For pattern matching, we recommend always comparing a smaller number of spots to a larger number of spots. Comparing a larger number of spots to a smaller number of spots reduces the potential match scores by forcing the larger pattern to try to match spots not found in the smaller pattern. For the modified Groth algorithm, when comparing two patterns, we always compare smaller to larger to ensure that we’re not trying to force false matches. I recommend the same for the I3S algorithm.
-When we incorporate the modified Groth algorithm as an option in a whale shark-specific version of I3S (more on this later) we also look forward to recommendations for improvement in the modified Groth from the I3S team. I think they'll have some excellent ideas, and we hope we can make more effective improvements together...perhaps even the creation of a single, dual-pass algorithm?
Spot pattern extraction clients: I3S AND ECOCEANThe following statement by the authors Speed et al. pointed out what I think is a common misconception about I3S:
"Therefore, a simple, yet reliable algorithm accessible to the public is needed to incorporate effectively a large number of photographs from a wide range of researchers, tourist operators and private organizations. Such a software package has recently been developed and is known as Interactive Individual Identification System (I3S)." What I3S is: A good tool for researchers to extract spot patterns and to look for matches in a catalog of photos on a single hard drive. Because the algorithm itself is written in C++ (and not Java), pattern matching may also be faster.
What I3S is not: I3S is not an integrated database of photographs and related metadata (sighting location, GPS, approx. length, date, scarring photos, submitter contact information, parameter searching, etc.). While it is also publicly available, it is not a tool for the public. Ecotourists and divers are generally not interested in carrying out their own personal mark-recapture studies, and I3S provides no publicly available system to collect their photos and manage their data in a standardized way. It also does not incorporate an email system to automatically keep them interested and aware of how their data is being used. I3S is also not a multi-user tool. It is designed for a single operator looking at a catalog on a single computer. I3S clients cannot speak to each other, though data could be shared via email attachments. If you're looking for these advanced features, use the ECOCEAN Library to share and protect your data and to gain access to a global data set collected by anyone who can get in the water with a whale shark and a camera. The ECOCEAN Library allows you to scan patterns using two algorithms (I3S and modified Groth) against all contributed patterns rather than just against a single data set, which is important for a massively migratory species.
That said, I3S is useful too. Hands down, the I3S client is easier and faster to use than the Spot Extractor I created for the ECOCEAN Library. I am now using a modified version of I3S to submit patterns to the ECOCEAN Library, although I use the dual algorithm capability of the ECOCEAN Library for finding matches. I made the following changes and will distribute these and other improvements in a modified version of I3S (called Interconnect) shortly.
-Removed spot minimum and maximums to reflect the reality of whale shark patterning
-Added direct submission to the ECOCEAN Library. I can now mark up a pattern, save it to the I3S file system and send it to ECOCEAN for storage and matching.
I also updated the web-based component of the ECOCEAN Library to store the additional three reference points to allow me to "extract spots once, scan twice" using both algorithms.
For those of you currently using I3S, if you also rotate your images to make sure the vertebral column above the spot pattern region is flat against a horizontal line, you can also take advantage of the Groth algorithm when it's available in I3S. We're adding reference points in the ECOCEAN Library to make sure we can work with the patterns you have already extracted. This is a great opportunity for the whale shark community to process mark-recapture data in a standardized fashion that is comparable between researchers and with photos collected from the general public (using the ECOCEAN Library) if we all choose to participate.
InterconnectECOCEAN will now be supporting a unified tool set to assist the research community to get a global picture of whale sharks as a threatened species to and to assist their conservation. This tool set will integrate the easy spot extraction of I3S with the advanced feature set of the ECOCEAN Library. I will soon begin distributing a new version I3S called Interconnect. Based on the modified version of I3S that I am already running, it will integrate future I3S feature additions as they are released. However, I will be adding new features specific to whale sharks. In addition to the changes I've already made listed above, ECOCEAN is committing to adding the following:
1. Adding the Groth algorithm for dual algorithm pattern matching and for faster pattern recognition using C++ client-side. Run them both. Be twice as confident in your matches. This will be finished in 2-3 months.
2. If you choose to contribute data to the ECOCEAN Library's global database, Interconnect will also give you permission to synchronize (download) all patterns in the Library related to your region of interest. You'll be able to run matches against everything in the Library right from your desktop or from the web-based Library itself using the speed of C++. This should be finished in mid-April. We will of course work to ensure that those interested in publishing retain their rights to publish on their data. We will consult with the research community on how to best implement this.
Interconnect will allow you to continue working independently or to directly submit data to the ECOCEAN Library and work with its advanced features. It's up to you. And Interconnect will also be free and open source. You can modify it yourself if you want.
Hopefully this analysis (and the upcoming Interconnect client) is of value to you. Let's discuss...
...and let's work together using all of the tools available...
...and remember to add reference points AND to rotate your vertebral columns when processing images....
Cheers,
Jason Holmberg
ECOCEAN Whale Shark Photo-identification Library
http://www.whaleshark.org