The new software helps us search the world’s largest photo album: the Internet!

The New Software Helps Us Search The Worlds Largest Photo

There are more and more photos on the Internet, which is a good thing: after all, images speak more than a thousand words. However, with the rapid increase in the number of photos, finding photos has become more and more difficult, especially because many people do not tag their photos. Fortunately, the scientists came to the rescue!

Are all these (collection) images on the Internet very good? The more the better! You might think so, but the correct answer to this question is actually “yes” and “no”. “Yes” is because you will have more choices and a greater chance of finding a suitable image, while “No” is because you may not find a suitable image. The reason is simply that in many cases, the label is very Little or little (so-called metadata about metadata) picture content).

Billions of pictures
The revolution in photography began with digital cameras. Suddenly, we were able to take as many photos as needed because the incremental cost of taking additional photos was practically zero.
With the advent of smartphones with (high-quality) cameras, more digital photos began to come online. As a result, the number of images on the Internet has grown exponentially. The photo site Flickr alone has 6 billion photos, and an estimated 1 trillion photos are uploaded to Facebook every year.

Some of these images are labeled, some are not. Usually, the tags assigned by users are sometimes incomplete, unreliable and very personalized. This makes it difficult to find specific photos, especially in large photo collections.

Photo: Chris_Parfitt (via

How did the experts do it?
The well-known image provider Getty Images “collected” only 80 million pictures. The entire team of experts is tagging each photo instead of using one tag or tag, but dozens of tags or tags. Take a picture on the right of a man with a dog in the snow. The number of labels that Getty Images puts on the image can easily range from 20 to 40 individual labels. Getty Images can use the following tags: People, cold temperature, casual clothes, friendship, happiness, enjoyment, get rid of everything, deep snow, nature, holiday, horizontal, full length, tomatoes, parsley, caucasian, animal, walking, dog, winter, day, snow, forest , Forest, one person, adult, middle-aged, 30-39 years old, color image, frozen, one animal, weekend activity, man, middle-aged man, only one middle-aged man, only one man, only one man, Leisure activities, photography, pets, warm clothing, adults only. Then, you might think of more, because a picture says more than a thousand words.

Photos on Flickr and Facebook or any other photo-sharing sites are not widely flagged. However, tags are important because as more relevant tags are used, the value of the Getty Images collection (or any other large collection of images) will increase because people/customers can actually find these images. Unfortunately, marking trillions of images cannot be done manually. Can it be done by computer and software?

Researchers at the University of Amsterdam have developed software (in the COMMIT SEALINC project) that can automatically determine which tags created by users are most relevant. The software can even suggest tags based on tags used in similar photos.

Automatic tagging using tag relevance. Picture: COMMIT.

Automatically tag photos
The main principle of automatically tagging images is very simple: take photos, such as bridges (see above). The added label is “bridge‘,’bicycle‘,’perfect‘in’My winner‘. Then, using image recognition, look for images similar to bridge images. If these images also use the same tags in most cases, it is likely that the tag (or tags) best describes the image.For the picture above, yesbridge‘. The relevance of each tag depends on the number of times that similar images are tagged with that particular tag. In the image above, you can see four similar images labeled “bridge‘. of’Label relevanceThen the ‘of this label is 4, and the label frequency is increased to 4.

giraffe. Photo: Doug Wheller (CC via

The software was cheated
In order for the software to work properly, many conditions must be met. For example, you should ignore certain tags. For example, a label describing a year such as 2001 or 2012. The relevance of these tags is limited because they do not describe the photos as much as they did when they were taken. The software must also be able to assume that the user is using the correct label. If everyone always labels a bridge as a “giraffe”, the software will take over and call each bridge a giraffe.

Researchers at the University of Amsterdam conducted experiments on 3.5 million Flickr images by Dr. Cees Snoek to understand how its algorithm can be improved Image ranking Improve and/or search”bridge‘(After determining the label relevance) actually produced more relevant test results.In each study, their algorithm did produce significantly better results compared to existing methods, such as Google’s visual page ranking. However, don’t expect the algorithm to generate tags similar to Getty Images. This is still a manual and laborious process, because not everything can be determined by image recognition. Special concepts such as “together” or the age of the person in the image are difficult (if not impossible) to determine through image recognition. But the algorithm that has been developed is a good start to realize automatic labeling and improve image labeling on the Internet by determining label correlation.

The COMMIT program is a public-private cooperation in the field of ICT research. The program has 15 different items, including items related to marking images. Other previous articles on the COMMIT project have been published on TaSST, the device can be touched remotely, and a study conducted by Erasmus University should give people a better grasp of continuous development Internet.