« Ho John's Blog | Main | Life with 31,000 photos »

August 27, 2005

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834524b0369e200d83453a6b369e2

Listed below are links to weblogs that reference Peter's Question:

Comments

Reminds me of the time I was working in speech-based biometrics: the difference between speaker identification (who is speaking) and speaker authentication (is it really that person). They use similar kind of technologies: the 1st must look swiftly through a vast library of voice profiles and pick the correct one (so optimize for speed), while the 2nd just has to compare a voice sample with one 'model' and come up with a match score and decide whether it is the right voice (optimize for equal error rate).

The 'similarity' technology you describe actually sounds more like the first one. So I can imagine you guys coming up with a library of easy-to-recognize icons (an iPod, a bicycle, the Eiffel tower, Mickey Mouse) with associated tags.
If there's any chance of getting into your beta program, I'd be delighted :-) This is really interesting stuff!

Peter - we would love to have you in the beta - just send an email to beta at ojos-inc.com

Content based image retrieval using computed similarity metrics can provide a decent starting point for either follow up tagging (by humans), or as a way to dredge up a collection of otherwise unlocatable scenes, again for review by humans.

Some of the major weaknesses of systems implemented in the past have been reduced, due to the commoditization of computing and storage resources. More interestingly, the commoditization of communications cost to the "end user" opens the possibility of human user feedback to the image search system that hasn't been usefully present in the past, by aggregating both the image search click throughs and the clusters of associated tags accumulated over time.

At present, to me this makes the most sense as a service targeted at topical communities of users, rather than on my desktop or on my hosted photos only. I'd love to have my photos magically tag themselves based on who's in them, which isn't practical unless I'm Paris Hilton (and thus hang out with other well-known, recognizable people). It seems reasonably possible to identify an interesting set of landmarks and buildings, and perhaps a number of celebrity faces.

It didn't sound like you were trying for the Visionics-style parametric face recognition. Even without it, I you could probably generate a page full of photos generally similar to, say, Elvis Presley, or photos that might be of (name-your-favorite-celebrity-here), which could then be more or less voted on (by clickthroughs and tagging) by a community of users forming an opinion about how well the images met their expectations.

The stock photo / video / audio agencies have been grappling with the tagging / keywording problem for a long time, typically ending up with a small set of people who know particular collections and associated keywords.

Image content-based search can be a useful starting point, but the 2-way web, incorporating the users collective knowledge is just becoming possible and could really make things interesting.

Exactly. All of the computer vision technologies alone can only do so well. User tagging alone doesn't scale (we are just too lazy to do it for the tens of thousands of photos we have). If you can marry computers to do 80% of the work and users to just fix errors and handle exceptions you have a very powerful solution. This is what we are shooting for.

Much like the nuances of our gate affects the wear on our shoes. You leave an impression on the photos you take. Any algorithm for analyzing photos will involves configurations and thresholds. To meet the needs of a broad audience median values may be chosen for best results most of the time. There is another solution.

The computer alone falls short and tagging doesn't scale. I agree. The solution is to use the computer to analyze but get the user to provide frequent and minimal feed back.

Take the algorithm you employ and concatenate all the variables required to configure it. Convert this concatenation into a raw string of zeros and ones. This is the DNA of an instance of the algorithm. Ship the product with a population of a couple dozen individual strings of DNA. Any time the user does a search a random individual is chosen to configure the algorithm. Then monitor the user reaction to the search. If they click on a bunch of photos maybe conclude it worked. If they start a new search right away assume it was a failed search. Give that individual DNA an appropriate score. After some number of searches throw away the lowest scoring individual DNA. Then "breed" a second generation.

Take two strings of 0's and 1's, chose a splice location take the front of one and the end of the other. Sprinkle a couple mutations and continue until you have generation 2.

Over many iterations the population of DNA should move toward individuals that perform the best searches. This will be a configuration tailored to the user. The great thing is that if the users photo collection changes with time so should the search.

Michael you seem to be describing a sort of genetic algorithm for taging/determining relevancy. We are exploring ways to learn from user responses to make the these better. This is an interesting approach. What I don't understand is exactly to what to encode and what is the objective function to us.

Munjal,

Sorry I wasn't clearer in my last post.

You want to encode all inputs to your algorithm except for the image. This includes all values that might currently be hardcoded into the code.

These would include values that are used in heuristic calculations. For instance, if you use some sort of threshold to guess whether a photo is taken indoor or outdoor, this threshold value should be encoded in the "DNA".

The concatenation of all these encoded values would comprise the "DNA".

The objective function is a little tricky in this case. You basically want to know if this instance of the algorithm did a "good" job.

In the most basic form you could ask the user if they liked the results. This could be a little tedious. Another solution would be to infer from the user's actions whether or not the results were "good". This would be more error prone but may average out in the long run.

What you might do is monitor what they do after they run your algorithm. If they immediately run it again you might infer that they didn't like the results. If they look through the results for a while and then select some of them then this might have been a good search. If they look through the results for a long time and don't select any, then maybe that is a bad search.

This is all theoretical, but I think you might have some success with this technique. I have some experience applying genetic algorithms to graphics algorithms, feel free to email me if you would like to talk in more depth.

Good luck.

Michael - the encoding seems complex but the idea of infering if the system was correct from the users implicit action is of course very scalable input.

Your correct about the complexity. The upside is that this complexity is just tacked on the front of the software. It doesn't have to affect the development of your algorithms.

You can easily design the software so that you run it regularly or within the genetic algorithm container. The genetic algorithm container contains all the encoding complexity and hides that nicely from the rest of the software.

I look forward to reading more about your solution to this problem and your other adventures with Ojos.

Cheers,
Michael Artemiw

hey tell me im i ganna fine ma lover

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment