|
Simon Pierce
|
 |
« on: August 22, 2006, 09:54:21 PM » |
|
Hi all, I've been working on whale sharks in Mozambique for a little while now, and initially faced the problem of identifying individual sharks. Given that email access 'on-site' isn't particularly reliable, I also wanted a software program that ran easily on my laptop so that I could process my own sighting/resighting data. A little bit of asking around led me to IRIS, the Interactive Raggie Identification System, which was designed for use on ragged-tooth sharks (= grey nurse sharks) in South Africa. It's completely free shareware, downloadable from http://staff.science.uva.nl/~rest/Iris/. Thanks to Vic Peddemors for putting me on to it! I've found IRIS to be very effective, with almost all of my confirmed resightings being the 'top ranked' match. It's also very simple to use. I use the flank of the shark, with the top and bottom of the fifth gill slit and the trailing edge of the pectoral fin as the fixed reference points that the program requires. This area is easy to photograph, has a suitable number of spots for ID purposes, and also has the advantage of being the same area that the ECOCEAN program uses if you're contributing your photographs to their data bank. I'm guessing that other researchers may also find this program useful, so I thought I'd pass it on. Shoot me an email at simon_j_pierce@hotmail.com or reply to this thread if you have any questions, and you can learn from my mistakes! Cheers, Simon. Simon Pierce Queensland Shark & Ray Research Group School of Biomedical Sciences The University of Queensland Brisbane, Australia Ph: +61 7 3365 2720 Email: simon_j_pierce@hotmail.com
|
|
|
|
|
Logged
|
|
|
|
|
jholmberg
Guest
|
 |
« Reply #1 on: August 24, 2006, 03:46:34 AM » |
|
Do you know what algorithm is used for identification? The documentation is sparse there. The following statement from the IRIS documentation is very limiting for use in the whale shark problem space:
Select approximately 30 spots in each image. The system requires a minimum of 15 and will not allow more than 40 spots. Experiments with 50 to 60 spots also resulted in an increase of mismatches. The images in the database with many spots turned up nearly always in the top 10 of best matches. 30 seems to be the best number as a rule of thumb.
In other words, the power of the algorithm breaks down for larger numbers of spots. From the 500+ sharks documented in the ECOCEAN Library, we have found the range of spots on the fiducial region of a whale shark to range between 8 and 60+. For scalability and accuracy, additional spots should help to generate fewer mismatches, not more. Our experience with the Groth algorithm has been very positive and accurate in terms of identifications for larger and smaller numbers of spots in small and large data sets.
Simon, do you have any sense of how scalable IRIS is? Pattern recognition in small data sets can be easily accomplished. Scaling to larger data sets (thousands of patterns) is much more challenging and requires a careful set of assumptions, algorithmic parameter tuning, standardized data collection, standardized image processing, peer review, cohesive metadata to bind all encounter data (right side spots, left side spots, photo keywords for scarring, time, date, etc.).
|
|
|
|
|
Logged
|
|
|
|
|
Simon Pierce
|
 |
« Reply #2 on: August 24, 2006, 11:15:17 AM » |
|
Hi Jason,
I’m afraid that I don’t know the answer to your technical questions, but I can certainly appreciate your points. I’ll try to address them as best I can.
My ‘standard’ photo is taken from parallel to the flank of the shark, and includes the fifth gill slit and trailing edge of the pectoral as reference points. In effect, the user-identified reference points allow IRIS to rescale and rotate the spot patterns for comparison between sharks. I use the same points on each side, but change the order so that the program treats each side separately. Within these fiducial regions I’ve rarely had problems with the 40 spot maximum. In a small number of particularly ‘spotty’ sharks I’ve been forced to be a little more selective with my spot allocation, but by choosing only the most prominent spots (working under the assumption that these will show up best on lower-quality photographs), I’ve been able to get around this problem.
My database is relatively small compared to yours, with hundreds rather than thousands of images. Originally I included high-resolution photographs in the database, with up to three images of each side, so an individual shark’s folder was several megabytes in size. This did lead to the program slowing down when I was up to about 130 sharks, but I’ve temporarily solved the problem by resizing photos (since I’m only viewing them on a screen, I realised that they didn’t need to be A3 quality!) and including only the best shot of each side for each shark. While this is obviously a short-term ‘fix’, it should work with up to around 1000 sharks.
IRIS is designed to automatically sort a photo database to find the closest match to an unknown shark. Potential matches are confirmed visually by the user. All my (known) re-sightings so far have been one of the top two sharks 'found' in the database, so I generally only check the top five matches. I take ID shots from both sides of my sharks as a semi-independent confirmation, and then if length, sex and scarring data match I’m confident that the shark is a resighting. Since IRIS is purely an identification program, it has no way of keeping track of metadata and I have to use a separate program for cataloguing my images. It’s not hard to keep track of IDs, I just use file names to keep tabs on which shark I’m looking at (i.e. WS0001-LS, for the left-side ID shot). It’d certainly be great to have a single program that did both, such as your ECOCEAN database (which sounds great!) but I’m not aware of any that I can use on my own desktop for processing my sighting/resighting data. Since the ECOCEAN database uses a similar area of the shark for ID purposes, it means that photographs can be used by either program when necessary reference points are included. I like IRIS because my internet access is hopeless, and it makes is easy to keep on top of my incoming data. It has drawbacks, but it’s certainly better than wading through images manually!
Cheers,
Simon.
|
|
|
|
|
Logged
|
|
|
|
|
David Rowat
|
 |
« Reply #3 on: August 25, 2006, 07:40:27 PM » |
|
Hi Simon & Jason,
We have also been trialing the IRIS programme and I know of at least one other large scale trial using IRIS on whale shark photos. At the moment we are testing the error index on dummy targets with respect to angles of acceptance in both horizontal and vertical planes... it seems that it has a wider latitude in the vertical orientation that in lateral, correctly identifying shots taken up to 60' in the vertical plane and up to 45' in the horizontal.
We too wanted to have a method of rapidly identifying our in-water photo IDs to see if this 'virtual tagging' can be used as effectively as our marker tagging programme. We are currently working our way through 200+ ID sets from last season and so by the end of this we should have a good idea about the capacities of IRIS on this species. We are also testing the programme on dorsal fin shots, effectively setting up four seperate databases, left gill, left dorsal, right gill and right dorsal so we also hope to see if there is a difference in generating IDs from these specific areas.
As Simon says the programme does not like big images and so we have scaled ours down to help speed things along but I suspect its more the size of the RAM memory thats important and my system is pretty puny!
Maybe there are some more IRIS users out there who would like to comment?
best regards
David
|
|
|
|
|
Logged
|
|
|
|
|
jholmberg
Guest
|
 |
« Reply #4 on: August 26, 2006, 12:35:58 AM » |
|
It's very odd that image size would have any effect on the application matching speed. From a digital perspective, spots are a list of x, y coordinates. Just simple numbers for crunching. Sounds like IRIS is loading the image for every comparison instead of storing the coordinates separately after processing the image once. This would cause a large and unnecessary processor hit.
|
|
|
|
|
Logged
|
|
|
|
|
David Rowat
|
 |
« Reply #5 on: August 26, 2006, 11:30:31 AM » |
|
Hi Jason,
Its not at the matching phase that the programmes slows, at least not in our trials to date, its at the initial fingerprinting stage. What we have found is that if you use a large image file the programme will let you set the three land-mark points but then will not allow you to store the individual spots..... if you reduce the file size then it has no problem.
The fingerprint files are indeed very small (normally under 1Kb) and the programme searches through the database almost instantaneously, at least in our relatively small number of images thus far. It only calls the images when you select the individual images to compare them visually.
The draw back of the system is the way that it automatically references a particular directory to search for matches, rather than allowing you to define a particular directory for the search. If the latter was possible then it would be easy to set up as many different categories for matching as required, i.e. left gill slit v left dorsal etc. However, at the moment the programme runs using an environment variable that the user has to set to the path for the specific database folder. I'm sure it would be a simple matter to re write the code to enable a number of data searches but I don't have that sort of skill... the simplest fix is to set up the different databases on different computers which is what we have done.
I note that Simon has worked around this by altering the order of the three landmark points so that the programme tends to compare like with like. We had kept the same order of points as we were unsure if the programme would function properly if we changed them, it seems as though it does!
|
|
|
|
|
Logged
|
|
|
|
|
|
|
Jason Holmberg
|
 |
« Reply #7 on: January 31, 2007, 02:20:31 PM » |
|
If you're following this topic, please see my new post: "A tale of two algorithms".
|
|
|
|
|
Logged
|
|
|
|
|