ECOCEAN Whale Shark Forum
May 18, 2012, 07:34:29 AM *
Welcome, Guest. Please login or register.

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: A tale of two algorithms  (Read 4147 times)
Jason Holmberg
Administrator
Newbie
*****
Posts: 113


View Profile WWW
« on: January 31, 2007, 02:19:47 PM »

I think this forum provides a great venue to discuss photo-identification and pattern recognition for whale sharks, and I would like to share my own experiences using I3S, the ECOCEAN Whale Shark Photo-identification Library, the modified Groth algorithm, and the affine transformation-based algorithm created by the developers of I3S. I see a very positive integration of these technologies developing. If you're interested in these topics, read on.

For those of you I didn't meet at the International Whale Shark Conference in Perth, WA in 2005, I am the programmer for the ECOCEAN Library and one of the modifiers of the Groth algorithm (originally developed as part of the Hubble Space Telescope program) now also used to identify whale sharks from photographs. We published our results in December 2005 in the Journal of Applied Ecology. The algorithm uses the natural spot patterning from photographs of whale sharks to uniquely identify them from a large catalog of other photos. In 2005, I won the Duke's Choice Award from Sun Microsystems for Innovative Use of Java technology for incorporating this algorithm as one of many features built into the ECOCEAN Library, which is a web-based research application tailored for the whale shark community. Please see the "ECOCEAN Library Updates and Maintenance" section of this forum for a list of some of the functionality offered by the library. The ECOCEAN Library provides an easy portal for the submission of photographic mark-recapture data for whale sharks from the research community as well as ecotourists and diving enthusiasts around the world. Under the Rolex Award for Enterprise granted to Brad Norman in 2006, we will be expanding the library to include 20 new active research stations in 2007 and 2008, in addition to its role as a global collection site of whale shark data.

Since I started working with Brad Norman of ECOCEAN in 2002, I have seen interest in whale shark research grow, and I have been happy to see the tools available to the scientific community grow as well. In 2003, the ECOCEAN Library went online to begin collecting data. In 2004, we completed and integrated the modified Groth algorithm to provide for rapid and accurate identification of whale sharks. Since then, the functionality of and the data collected by the ECOCEAN Library has grown quite rapidly.

2007 brings new tools to the whale shark research community in the form of I3S. The algorithm created by the I3S developers has now been published in the Journal of Applied Ecology with specific application to spotted ragged tooth sharks but also with clear application to whale sharks. The algorithm is encompassed in a free and user-friendly tool called I3S which can be downloaded from http://www.reijns.com/i3s. The application of this algorithm to whale sharks has now also been published by Conrad Speed et al. from Charles Darwin University.

At the beginning of 2007, the whale shark research community is now supported by two algorithms (modified Groth and I3S) and two research software clients (ECOCEAN Library and I3S). For those of you with active whale shark mark-recapture projects or for those of you thinking of undertaking the same, a new question arises: which to use?

Here's the quick answer: all of them.

As the programmer for the ECOCEAN Library, I am thoroughly familiar with its underlying code (I wrote it). Having worked with and even now modified the I3S client, whose code base is open source, I am very comfortable discussing its structure as well. As the only person familiar with both systems, here are my observations.



Algorithms for pattern recognition and identification of whale sharks


The modified Groth and the I3S algorithms take different approaches to identifying patterns between two reference images. The I3S algorithm uses an affine transformation to put two comparison patterns into the same reference space and then examines the linear distances between potentially matched spots as an indicator of similarity in patterning. The modified Groth algorithm builds triangles from all combinations of three spots in each comparison pattern, forms relationships between the two sets of triangles by examining their internal angles, and then filters the matched triangle lists looking for commonality in magnification.

Which is better? Really, the data is not out there to support the ascendance of either. The modified Groth algorithm was published listing an 86% chance of successful identification using tested images. The criteria were a 1st ranking for the correct match among the test images. This value is an average of sharks with one or multiple patterns to test against. The I3S algorithm was published (Van Tienhoven et al.) listing 72% (exhaustive search option) as the correct match percentage for patterns with only one matched pattern ranking as the first potential match and a 92% chance of 1st ranking if three reference matches exist...using a rough average of the two values...82%. Given different data sets, different operators, etc., the published efficacy of both is identical.

The developers of I3S make the following claim:

"The key feature of the software is that it is not fully automated. The user must point out the reference points, which in the case of C. taurus are the fin origins, and the most distinctive marks. Finally, the user must select the best match from a ranked list of possible known shark images. As the user manually points out the natural marks, image artefacts such as particle reflection in the water, backscatter from incorrect flash position and flash overexposure of the flanks, can be ignored. Only those natural marks that can be clearly discerned by the human eye are selected, thereby ensuring the best possible choice. Additionally, the use of this software is beneficial as a clear image focus is not as stringent a requirement for spot patterns as it would be for other natural marks, such as notches and tears in fins. We believe that this is more beneficial to correct identification of individuals and represents a preferable option over the system reported by Arzoumanian, Holmberg & Norman (2005)."

This is an interesting statement but unfortunately incorrect. The authors do not have access to the ECOCEAN Library (and never asked for it). With both the I3S client and the Spot Extractor created by ECOCEAN, spots are user-selected. Both systems gain the same benefits of avoiding "noise spots" that would result in a fully automated system in which the computer could select light reflection as a false spot, for example.

The authors also make this claim:
"An added benefit of our system is that only one computer-based package is required and the entire process, from image download, spot selection and matching, requires less than 5 min if the shark already exists in the database. If a new shark is recorded that has not been previously identified, then a more rigorous visual inspection of the database is necessary but still will be completed in a substantially shorter time than reported for other identification software packages (Arzoumanian, Holmberg & Norman 2005)."

Here the authors make a claim that the I3S spot selection system is faster. Actually, this is true. Spot selection is much easier with the I3S client than with the Spot Extractor I developed for ECOCEAN. The I3S client interface is quite user-friendly. The subsequent sentence "...a more rigorous visual inspection of the database is necessary but still will be completed in a substantially shorter time than reported for other identification software packages (Arzoumanian, Holmberg & Norman 2005)." is unfortunate because again the authors do not have access to the ECOCEAN Library (and never asked). Claims about visual inspection speed are...well, difficult to document...who is the observer...how many photos...etc. The ECOCEAN Library does feature a built-in suite of photo keyword tools and search capabilities to help narrow visual searches considerably. I3S also allows for visual comparison but without search parameters to narrow the list. I make no claim as to which is faster for visual inspection...there's no objective way to determine it.

 Speed et al. also make a claim regarding the modified Groth algorithm vs. the I3S algorithm:
"One such system has been developed from an algorithm originally designed for stellar pattern recognition, and is currently being employed by the ECOCEAN whale shark database [20]. This system has great potential; however, the procedure for entering and matching patterns is complex, and neither the algorithm nor results are publicly available."

This is also an interesting statement considering the author has no experience with the ECOCEAN tools or the ECOCEAN Library. I agree that extracting patterns is simpler in I3S...probably four times as fast in fact...but "matching patterns is complex" is an odd statement. In the ECOCEAN Library I just press a button to get the results, with match scores and potentially matched spots remapped to the original side-by-side images for comparison and evaluation. Click. One step. Not overly complex I think.

So, given the literature and statements by the authors? Which algorithm should you use? My opinion: both of them. In fact, I have now incorporated the I3S in addition to the modified Groth algorithm into the ECOCEAN Library. For every new pattern, we run both and evaluate the results of both. I think over time and large data sets we may find a preference, but my work with both to date has convinced me that both are of value and should be used side by side. Therefore, ECOCEAN is committed to putting the modified Groth algorithm into I3S as a C++ library that can be used for dual algorithm comparison, just like we have available in the Library. We hope to finish this within the next three months. More on this later in this posting. We stand by the strength and efficacy of the modified Groth algorithm, and we’ve enjoyed how well it has scaled over the large data set in the ECOCEAN Library. We’re looking forward to testing the I3S algorithm to its fullest in the same.

I am also looking for ways to improve both algorithms based on the strengths of the others. Here are some thoughts:

-The I3S algorithm requires the selection of three additional spots as reference spots for the affine transformation in addition to the selection of the spots on the flank of the whale shark. These spots are not required for the modified Groth algorithm. The order of these spots and their location must be standardized between data sets for effective comparison using I3S. I recommend the research community standardize on the three reference spots presented by Speed et al.: 1) top 5th gill, 2) point on the flank corresponding to the posterior point of the pectoral fin, 3) bottom 5th gill. Because spots are user-selected, each new spot introduces the potential for human error. Adding three references spots in addition to the pattern spots adds that much more potential for human error...but that can't be helped using an affine-transformation based system. Pointing out the anterior insertion point of the pectoral fin can be challenging when the fin itself blocks the observer from seeing the exact point.

-The I3S and modified Groth algorithms can use a single spot selection process. The I3S spot selection process requires three additional clicks for selection of reference points. The original Groth algorithm allows for rotational indifference, however we chose to actually make it rotation-sensitive as whale sharks have a distinctive top and bottom that we can orient to for greater matching power. We allow for only a fifteen degree difference in rotation of the shark pattern before match degradation occurs. Therefore we rotate the vertebral column of the whale shark above the patterning area to be parallel to a horizontal line before we process spots. If you are interested in using both algorithms in either I3S (near future) or the ECOCEAN Library (now), make sure you add the reference points and rotate the image correctly. Because we use both, ECOCEAN now accommodates for both, and we are busy updating all of our patterns to additionally support the I3S algorithm by adding the three reference points. They are all already properly oriented.

-I really see no advantage to the Quick Scan of the I3S algorithm. Unless you need to make a snap identification of a shark, use the exhaustive option...that's the version I put into the ECOCEAN Library.

-The I3S client currently only supports the selection of up to 40 spots maximum and requires a minimum of 12. In my modified version of I3S, I removed this limit. The Groth algorithm has no such restriction. In the ECOCEAN Library, the number of spots for each whale shark within the selected region behind the fifth gill and above the pectoral fin varies from 8 to 60+. I believe the tools should meet the research needs and not vice versa. In I3S: "The user then points out the most distinct pigment spots on the left flank of each shark. Between 12 and 40 spots are selected within the reference area..." Choosing the most distinctive spots is far too arbitrary and subjective. I have removed this limit in a copy of the I3S client I will distribute for whale sharks.

-For pattern matching, we recommend always comparing a smaller number of spots to a larger number of spots. Comparing a larger number of spots to a smaller number of spots reduces the potential match scores by forcing the larger pattern to try to match spots not found in the smaller pattern. For the modified Groth algorithm, when comparing two patterns, we always compare smaller to larger to ensure that we’re not trying to force false matches. I recommend the same for the I3S algorithm.

-When we incorporate the modified Groth algorithm as an option in a whale shark-specific version of I3S (more on this later) we also look forward to recommendations for improvement in the modified Groth from the I3S team. I think they'll have some excellent ideas, and we hope we can make more effective improvements together...perhaps even the creation of a single, dual-pass algorithm?


Spot pattern extraction clients: I3S AND ECOCEAN

The following statement by the authors Speed et al. pointed out what I think is a common misconception about I3S:
"Therefore, a simple, yet reliable algorithm accessible to the public is needed to incorporate effectively a large number of photographs from a wide range of researchers, tourist operators and private organizations. Such a software package has recently been developed and is known as Interactive Individual Identification System (I3S)."

What I3S is: A good tool for researchers to extract spot patterns and to look for matches in a catalog of photos on a single hard drive. Because the algorithm itself is written in C++ (and not Java), pattern matching may also be faster.

What I3S is not: I3S is not an integrated database of photographs and related metadata (sighting location, GPS, approx. length, date, scarring photos, submitter contact information, parameter searching, etc.). While it is also publicly available, it is not a tool for the public. Ecotourists and divers are generally not interested in carrying out their own personal mark-recapture studies, and I3S provides no publicly available system to collect their photos and manage their data in a standardized way. It also does not incorporate an email system to automatically keep them interested and aware of how their data is being used. I3S is also not a multi-user tool. It is designed for a single operator looking at a catalog on a single computer. I3S clients cannot speak to each other, though data could be shared via email attachments. If you're looking for these advanced features, use the ECOCEAN Library to share and protect your data and to gain access to a global data set collected by anyone who can get in the water with a whale shark and a camera. The ECOCEAN Library allows you to scan patterns using two algorithms (I3S and modified Groth) against all contributed patterns rather than just against a single data set, which is important for a massively migratory species.

That said, I3S is useful too. Hands down, the I3S client is easier and faster to use than the Spot Extractor I created for the ECOCEAN Library. I am now using a modified version of I3S to submit patterns to the ECOCEAN Library, although I use the dual algorithm capability of the ECOCEAN Library for finding matches. I made the following changes and will distribute these and other improvements in a modified version of I3S (called Interconnect) shortly.

-Removed spot minimum and maximums to reflect the reality of whale shark patterning
-Added direct submission to the ECOCEAN Library. I can now mark up a pattern, save it to the I3S file system and send it to ECOCEAN for storage and matching.

I also updated the web-based component of the ECOCEAN Library to store the additional three reference points to allow me to "extract spots once, scan twice" using both algorithms.

For those of you currently using I3S, if you also rotate your images to make sure the vertebral column above the spot pattern region is flat against a horizontal line, you can also take advantage of the Groth algorithm when it's available in I3S. We're adding reference points in the ECOCEAN Library to make sure we can work with the patterns you have already extracted. This is a great opportunity for the whale shark community to process mark-recapture data in a standardized fashion that is comparable between researchers and with photos collected from the general public (using the ECOCEAN Library) if we all choose to participate.

Interconnect

ECOCEAN will now be supporting a unified tool set to assist the research community to get a global picture of whale sharks as a  threatened species to and to assist their conservation. This tool set will integrate the easy spot extraction of I3S with the advanced feature set of the ECOCEAN Library. I will soon begin distributing a new version I3S called Interconnect. Based on the modified version of I3S that I am already running, it will integrate future I3S feature additions as they are released. However, I will be adding new features specific to whale sharks. In addition to the changes I've already made listed above, ECOCEAN is committing to adding the following:

1. Adding the Groth algorithm for dual algorithm pattern matching and for faster pattern recognition using C++ client-side. Run them both. Be twice as confident in your matches. This will be finished in 2-3 months.

2. If you choose to contribute data to the ECOCEAN Library's global database, Interconnect will also give you permission to synchronize (download) all patterns in the Library related to your region of interest. You'll be able to run matches against everything in the Library right from your desktop or from the web-based Library itself using the speed of C++. This should be finished in mid-April. We will of course work to ensure that those interested in publishing retain their rights to publish on their data. We will consult with the research community on how to best implement this.

Interconnect will allow you to continue working independently or to directly submit data to the ECOCEAN Library and work with its advanced features. It's up to you. And Interconnect will also be free and open source. You can modify it yourself if you want.

Hopefully this analysis (and the upcoming Interconnect client) is of value to you. Let's discuss...

...and let's work together using all of the tools available...

...and remember to add reference points AND to rotate your vertebral columns when processing images....


Cheers,
Jason Holmberg
ECOCEAN Whale Shark Photo-identification Library
http://www.whaleshark.org
« Last Edit: February 16, 2007, 04:40:35 AM by Jason Holmberg » Logged
jurgen
Newbie
*
Posts: 1


View Profile
« Reply #1 on: February 04, 2007, 01:12:50 AM »

Here's a response from the I3S development team. First, a quick introduction. We are Jurgen den Hartog and Renate Reijns, the developers of I3S. I3S was originally developed in 2004 for the identification of Ragged Tooth sharks. As such it has been put in the public domain. Details of our method will be published in the Journal of Applied Ecology in '07 but can also be found in the manual (cf. http://www.reijns.com/i3s) .

In the pas few days there have been two posts on this forum to which we would like to respond:

• To start with, we welcome the integration of I3S and the Ecocean software environment! Anything that advances the state of the art in shark identification and helps in their conservation will be supported by us.

• Unfortunately, after a very brief e-mail contact initiated by us (December 13, 2006) we never heard from Ecocean's interest and had to learn from this integration from others. Jason Holmberg wrote on this forum he would like to cooperate with us. Obviously, we are more than willing to work together. The issue here is that I3S was developed for 'Raggies' and might need some further finetuning for whale sharks. Apart from the discussion on the maximum number of spots, like any algorithm, there are several parameters to consider. As the developers of I3S we have the expertise and therefore we would like to be involved in the correct setting of those parameters and in the evaluation of the algorithm.

• With I3S we want to support researchers worldwide with identification issues. So, I3S is not specifically targeted on a single species like whale sharks. As Ecocean now benefits from I3S, I3S might benefit likewise from Ecocean. We therefore call upon Ecocean to make their algorithm available to us so hopefully we can improve I3S in a similar way.

• We strongly believe in the open software concept. Therefore, I3S is free software distributed under the Gnu Public License (GPL). However, this also means that all software based on I3S (such as Interconnect) must be distributed with all sources and under the same GPL. In this way it is ensured that redistribution of such software is not restricted and that new extensions and improvements will also remain open source.

Hope to be in contact soon,

Jurgen den Hartog & Renate Reijns
Logged
Jason Holmberg
Administrator
Newbie
*****
Posts: 113


View Profile WWW
« Reply #2 on: February 04, 2007, 03:54:53 AM »

Thanks for responding Jurgen and Renate. The Groth algorithm will be put into the public domain and will be moved from Java into C++ specifically to support I3S and the derived Interconnect client. We are happy to make this publicly available and will do so in a manner that makes it easy for us both to employ it (i.e. we'll use methods and classes that directly interface with your own). I think we'll both find ways to improve both algorithms based on sharing the two and may even want to look at a single dual-pass algorithm as a future development.

I have a Java version of the I3S algorithm available as well if you ever have need of it, though the Java implementation is of course somewhat slower than the C++ version of the same. We used Java to more easily embed it into a web-based, distributed computing framework. So rather than a file system based approach, the code is based on sending and receiving data via HTTP from the ECOCEAN web site. We've tested it against the I3S client to ensure the returned match values are identical.

To be honest, I still find some of the statements in your paper and that of Speed et. al to be misleading with regard to our technology and our approach. Since both papers were submitted before Dec. 2006, I didn't see the effort made to better understand our approach before doing a negative comparison. Can we make a gentleman's agreement not to take this approach in the future? I think there's a great paper out there to be done comparing and contrasting the strengths of both over large data sets. I'll soon have the capability to run automated tests of both over our collected whale shark data set. We'll be able to compare and contrast where both algorithms succeed and fail and perhaps create a hybrid? Would you be interested in a joint project and publication?

Cheers,
Jason Holmberg
ECOCEAN Whale Shark Photo-identification Library
http://www.whaleshark.org
Logged
montse
Newbie
*
Posts: 4


View Profile WWW
« Reply #3 on: February 13, 2007, 01:40:22 AM »

hello all,

i am a whale shark researcher with not so much understanding of what goes on behind these two programs, i stick to population studies Cheesy. but i CLEARLY see the benefit of using these programs.
i strongly apreciate both teams efforts. i also apreciate this conversation that is going on, it is clearing many doubts and putting everythings into really clear language.

THANK YOU ALL,
montse
www.domino.conanp.gob.mx
Logged

ontserrat Trigo Mendoza

mtrigo@conanp.gob.x¿mx
Simon Pierce
Newbie
*
Posts: 16


View Profile WWW
« Reply #4 on: March 19, 2007, 06:34:22 PM »

Hi all,

It's great to see the software converge. I'm sure it'll be beneficial for users of both I3S and the ECOCEAN database, and ultimately the sharks themselves.

I3S and its user manual was initially designed for use on Carcharias taurus, and Jason helpfully points out some areas where it could be improved for use on whale sharks. Jurgen and Renate have just released an update which removes some of the programs' previous assumptions about reference points, available at http://www.reijns.com/i3s/.

It's also important that new users use consistent 'landmarks' if databases are to be combined in the future. I've uploaded a brief PDF manual for using I3S on whale sharks to http://mozmarinescience.googlepages.com/news&researchupdates. This draft version assumes that the user has already read the standard I3S manual. Any suggestions, corrections or additions to the text will be welcomed.

Cheers,
   Simon.

Simon Pierce
Lead Scientist - Whale Sharks
Manta Ray & Whale Shark Research Centre
Tofo Beach, Mozambique
 
Ph: +258 2935 6254
Email: simon_j_pierce@hotmail.com
Website: http://mozmarinescience.googlepages.com/

Adopt a Whale Shark! See our website for details: http://mozmarinescience.googlepages.com/adoptionprogram 
Logged

Simon Pierce
Lead Scientist - Whale Sharks
Manta Ray & Whale Shark Research Centre
Tofo Beach, Mozambique
 
Ph: +258 2935 6254
Email: simon@giantfish.org
Website: http://www.giantfish.org
Jason Holmberg
Administrator
Newbie
*****
Posts: 113


View Profile WWW
« Reply #5 on: March 20, 2007, 04:18:42 PM »

Hi Simon,

Thanks for bringing this to the forum. This is a good first stab at a protocol, and I think the next logical step is to open this up to discussion at a conference. Mexico in 2007 would be ideal.

A data collection protocol for truly comparable image libraries that can be used to generate robust and comparable population metrics should be community-driven and emphasize compatibility between research groups (for example, why not add image pre-processing for use with both available algorithms as published here in the forum?). It should also address some seemingly basic questions that have a huge impact on the results we'll see as more and more research stations come online and begin amassing large enough data sets for accurate population analysis. Here are a few examples:

1. When is a shark a shark?
Sounds silly, eh? But when do users of I3S justifiably call an extracted pattern with no match a new marked animal? With the I3S algorithm, there is a very finite chance of misidentification of a pattern even if perfectly oriented. Is a reported encounter a "new" shark when it has a perfect left-side OR a right-side pattern to prevent double-counting? Within the ECOCEAN Library, we must have a properly-oriented, left-side pattern before we allocate an animal as "marked" for population analysis. We also back that up with internal peer review (two eyes are better than one) and photo keyword matching tools to look for identifiable visual features that we can match on (I'll be publishing a list of our keywords and examples here in the forum shortly). I think the discussion around this must be done at a conference and be issued as a joint protocol. By recommending image pre-processing and mark-recapture guidelines, we can prevent DIY science (one would hope journal peer review would prevent this...but can we really expect reviewers to go through hundreds or thousands of photos and analyze matches?) that may produce population metrics with large misidentification errors and misrepresent the status of this animal at a given location.

2. When is a spot a spot?
Another seemingly basic question, but having extracted about 2500 patterns myself using my own tool and now a modified form of I3S, there are times when a "blotch" could be a spot, a part of a line, or just some pigmentation. Just defining a spot as "a reflection of pure white pixels" in an image may help users better choose visible features and therefore better extract and match patterns.

3. What angle of skew away from a perpendicular photo of the fiducial region is allowable to "mark" a new  whale shark? As skew increases, misidentification potential also increases. In the ECOCEAN Library, we leave data unallocated if we don't believe the chance of identification is high enough because of skew in the image. First identifications are the most critical, and I would argue it's better to leave questionable images unallocated and accept a lower capture rate (p) in models than it is to double-count a shark.

In summary, I think a protocol should be community-driven and should emphasize the quality of the objective (accurate population analyses) rather than the tool used.

Just my two cents. I'm hoping the Mexico conference organizers will follow your lead here and pick this up as a conference topic for panel discussion.

Cheers,
Jason

















Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!