Go Beyond Keywords! Perform a Visual Image Search

Go Beyond Keywords! Perform a Visual Image Search

arge collections of diverse images, graphics and video are becoming common, both on the Web and in media asset databases. As the size of the image collections expand, it becomes increasingly difficult to classify and find images matching specific criteria. Current image searches rely on keywords and proximity text to find relevant images. However, such searches invariably miss or misclassify content because they don’t search the images (see the Sidebar: Why Text-based Image Searches Are Inadequate). Visual search rectifies that problem by matching color, shape, texture, and 3D shading directly within the image.

What you need:
Win NT4/2k, Solaris 2.7+, Redhat Linux 7 or Mac OS X

The general visual search paradigm is simple. A user selects a sample query image, and then the search engine finds and ranks visually similar images by matching objects and attributes like color, texture and shape. Essentially the user tells the computer: “Find an image that looks similar to this one.” For more information, see the Sidebar: How Object-based Visual Searches Work.

Java-based Visual Search toolkit
In this article, you’ll see how to create a visual search application using the Java-based eVe Visual Search engine SDK from eVision. The eVe toolkit is free to developers who want to evaluate the API and build prototype applications. The free version limits media collections to 500 images, however, that’s sufficient for experimentation and for validating the functionality of visual search on your own image collection. For some ideas, see the Sidebar: Applications for Visual Search.

I’ve tested the API by searching collections of images from PhotoDisc CDs. The results are exciting.

To create a visual search application with the eVe SDK you follow four basic steps:

  1. Analyze New Images?set the number of segmentation regions, load an image and automatically compute the Visual Signature for an image.
  2. Store Analyzed Images?save a copy of the image, its segmentation map, any metadata and its Visual Signature to a MediaObject and add it to a MediaCollection
  3. Segmentation Map & User Selection?let the user view the segmentation map and select relevant objects to search
  4. Search and Retrieval?select a query image, perform the search, and retrieve and display the resulting images.

Step 1: Analyze new images
Before you can use eVe to search and retrieve an image from a database or flat file, you must process the image. When the toolkit processes a new image, it automatically segments it into regions that correspond roughly to objects or parts of objects in the image. It then applies statistical pattern recognition techniques to automatically extract four distinct attributes from each region:

  • Color
  • Shape
  • Texture
  • 3D shading

eVe stores a condensed descriptor of these attributes in several vectors called Visual Signatures. During the Search phase, you compare the similarity of images based on any one of the attributes of color, shape etc., or a weighted sum of attributes (see Figure 1).

Figure 1: A Visual Signature contains four visual attributes for every object in the image: color, shape, texture, and 3D shading.

To organize and keep track of the images, eVe stores a thumbnail of the image and all the information related to that image in a MediaObject. The MediaObject includes keywords or descriptions (textual metadata), the file name and path, the Visual Signature, and the segmentation mask (see the section “Step 3: Segmentation Map & User Selection” later in this article).

To start the project, you create a new MediaObject, insert an image into it and analyze the newly created MediaObject with the analyze() method of the Analyze class from the eVe SDK.

First, you need to import the eVe SDK classes.

   import com.evisionglobal.eve.*;   import com.evisionglobal.eve.kernel.*;  ...

Next, create String objects that represent the input path to the image.

   String imagePath = new String       ("/myMediaCollection/image/myImage.jpg");

Now instantiate the MediaObject and load the image into it.

   MediaObject myMediaObject =      (MediaObject) Eve.newMediaObject();   myMediaObject.loadImage(imagePath);

Finally, instantiate an Analyze object and call the analyze() method, passing the MediaObject containing the image as a parameter. The analyze() method extracts the Visual Signature for the image. The file (an editable text file stored in /com/evisionglobal/eve/ defines the default values of the parameters. eVe.maxRegions and eVe.maxIterations are set to 3 and 999 respectively. eVe.maxRegions defines the maximum number of object regions into which the image will be divided. It is an upper limit. It is possible that the image may yield fewer regions, depending on the complexity of the image. eVe.maxIterations is the maximum number of iterations that the analysis is allowed to perform. The segmentation process starts with an initial partition and iteratively improves it. Some images require far fewer iterations than others do. Increasing the maxRegions and maxIterations results in increased accuracy, but setting them too high will result in very long analyze times.

   Analyze myAnalyzer = (Analyze) Eve.newAnalyze();   myAnalyzer.analyze(myMediaObject,Eve.maxRegions,      EVe.maxIterations);

At this point, you can also add metadata, such as keywords. For example, to add the original filename of the image to the MediaObject use: myMediaObject.setProperty(“originalFilename”, “myImage.jpg”); so the MediaObject will have a metadata key: “originalFilename” with the value: “myImage.jpg”.Step 2: Store Processed Images
You save your MediaObjects (the analyzed image data and its associated metadata) in one or more MediaCollections. A MediaCollection is a container or category that can be used to access a collection of MediaObjects. Searching operates on individual MediaCollections (such as a collection of animal images, undersea images, faces of people, or medical images etc.). You can extend your searches to span multiple media collections.

See also  Top 5 Internal Developer Portals of 2024 List

The Strings below represent the input path to the previously analyzed MediaObject and the output path for the new MediaCollection.

   String mediaCollectionPath = new String      ("/myMediaCollection/");   String analyzedImagePath = new String      ("/myMediaCollection/edf/myImage.edf");

Now instantiate a MediaCollection and create it on the disk.

   MediaCollection myMediaCollection =       (MediaCollection) Eve.newMediaCollection();   myMediaCollection.create (mediaCollectionPath);

Add the MediaObject to the new MediaCollection, and close the MediaCollection.

   myMediaCollection.add (myMediaObject)   myMediaCollection.close();   ...

Step 3: Segmentation Map & User Selection
When the toolkit processes a new image, it automatically segments the image into regions, which correspond roughly to objects or parts of objects in the image. The set of regions is called a segmentation mask or map. The toolkit stores the segmentation mask in the MediaObject. Using the toolkit, you can create interfaces that let the user select specific objects in the image to help create a more focused search?a partial image search (see Figure 2).

Figure 2

As part of the analysis process, each MediaObject automatically includes a copy of the image that was analyzed and a segmentation mask for that image. The segmentation mask is a two-dimensional (2D) byte array the same size as the image that maps each pixel in the image into a particular region. Therefore, for example, if you analyzed an image at a maximum resolution of 128, the analysis routines will store an image in the MediaObject that is 128×96 pixels in size. In addition, the MediaObject will also contain the segmentation mask, which, in this example, would be a 2D byte array (128×96). For an image with 3 possible unique regions, each byte would contain the value 0, 1, or 2 corresponding to the particular region the image’s pixel was mapped to (see Figure 3).


The getProperty method of the MediaObject can return both the image and the segmentation mask. For example, mediaObject.getProperty(“image”) returns a byte array, which is actually a raw jpg byte stream. Calling mediaObject.getProperty(“segmentationMask”) returns a 2D byte array?the segmentation mask.

To perform an object-specific search you:

  1. let users select search regions by clicking on them
  2. extract the Visual Signatures for the regions from the MediaObject
  3. construct a “dummy” MediaObject containing the Signatures extracted in step 2
  4. pass the “dummy” MediaObject to search as the source

Here’s an example using a previously analyzed MediaObject. This will be your query image.

   String analyzedImagePath =       new String("/myMediaCollection/edf/myImage.edf");   MediaObject myMediaObject = (MediaObject)       Eve.newMediaObject();   myMediaObject.loadFrom (analyzedImagePath);

Then create an empty “dummy” MediaObject that will be filled with just the object regions that the user selects.

   MediaObject tempMediaObject = Eve.newMediaObject();

To create the interface so the user can select particular objects, retrieve the segmentation mask from MediaObject.myMediaObject. By default, this will return a new image with three object regions separated by the colors red, green and blue. Alternatively, you can call getSegmentationMaskImageIcon() to retrieve an ImageIcon for use in a Swing application.

Authors’ Note: The number of object regions can be increased or decreased by editing the eVe.maxRegions of the file or changing the value passed to the analyze method when the engine analyzes the MediaObject.

   Image segmentationImage =      Eve.newImageManager().getSegmentationMaskImage       (myMediaObject);

Now that you have an image divided into three separate regions you need to specify which to search. This can be accomplished in a simple GUI by capturing the (x,y) coordinates of a mouse click and bouncing the results against our segmentation mask image.

For simplicity in this example, simply set an int array to meet this purpose. To search for red only set regions[] = {0}, for blue set regions[] = {1} and for green set regions = {2}. To search for multiple regions i.e. red and blue set regions = {0,1} to search for blue and green set regions = {1,2}. Note: searching for all three regions = {0,1,2} would be the same as searching the entire MediaObject, thus defeating the purpose of this section. The example code below searches only the red and green regions within the images.

   int [] regions ={0,2};  

Create a vector and add all of the MediaObject’s analyzed information to it. *Note: After each MediaObject is analyzed, data is stored in a Vector array of length 4, where Vector[0] holds information about the color, Vector[1] contains information about the texture, Vector[2] contains Shape attributes and Vector[3] holds information about the regions of the MediaObject. What you are doing is essentially building an analyzed MediaObject manually.

   Vector[] myMediaObjectVector =      {myMediaObject.getIndex(Eve.COLOR),      myMediaObject.getIndex(Eve.TEXTURE),      myMediaObject.getIndex(Eve.SHAPE),      myMediaObject.getIndex(Eve.REGION)};

Now create an array to setup a loop that adds the information about each attribute to the empty “dummy” MediaObject.

   int[] attributes =      {Eve.COLOR, Eve.TEXTURE, Eve.SHAPE, Eve.REGION};

Iterate once for each attribute COLOR, TEXTURE, SHAPE, REGION.

   for (int i = 0; i < attributes.length; i++) {         // Create a new vector that will temporarily       // hold the MediaObject attribute.         Vector tempAttributeVector = new Vector();         // Each iteration of this loop adds the       // current attribute data only      // for the region(s) we specified 0, 1, or 2.         for (int j = 0; j < myMediaObjectVector[i].size();            j++) {         for (int n = 0; n < regions.length; n++) {            if (regions[n] == j){               tempAttributeVector.addElement                  (myMediaObjectVector[i].elementAt(j));            }         }      }      // Add the attribute of the regions to      // the empty MediaObject.         tempMediaObject.setIndex(attributes[i],         tempAttributeVector);   }

Now you have a MediaObject with only the selected region(s). At this point, this MediaObject can be used as the query MediaObject for a search in a MediaCollection.

   myMediaObject = tempMediaObject;

Step 4: Search
eVe lets you perform visual searches, keyword searches, or a combination of the two. For example, you might start with a text (keyword) search to find images in a certain category, followed by visual queries to refine the search after retrieving an initial set of images (e.g. do a keyword search for "tigers" and then click on an image of a tiger to find examples of the animal tiger).

Search types are known as indexes in the eVe API. When you perform a search, you can specify the relative importance of each of the four basic indexes. The four indexes are Eve.COLOR, Eve.SHAPE, Eve.TEXTURE, and Eve.REGION.

Search by color: This is the easiest type of search to understand. It matches the predominant colors in the source and target images. The more colors the objects in the images have in common, and the greater their similarity to the source image, the higher the similarity score.

Search by shape: This type of search uses the two-dimensional outlines of objects within the image as patterns to match in the target image. The closer the shapes of the objects are, the higher the similarity score.

Search by texture: A texture search identifies unique patterns of light and dark pixels within objects in the image, and then ranks the target images based on their similarity to that texture.

Search by region: Region searches are the most advanced search type. In a region search, eVe models the color surface of the region. This essentially gives a 3D shading representation of each object in the image.

For example, if you had a collection of animal pictures, you could retrieve all orange and black tigers (color), or all elephants (shape), or all longhaired furry animals (texture). When the objects are not well defined from their background, search by region can usually detect patterns and similarities.

To run a search you simply:

Figure 4: From a database of faces, beach scenes and baby shots, eVe selects out the images that most closely resemble the query Image.
  1. use an existing MediaCollection to select a MediaObject as a query image source for the search method or pass the 'dummy" MediaObject created from the user selecting regions of the Segmentation Map
  2. define the SearchParameters
  3. perform the search, iterate through all the SearchResults and output the results (see Figure 4)

Here's an example using a previously analyzed MediaObject as the query image.

   String sampleMediaObjectPath =       new String("/myMediaCollection/edf/mySample.edf");   sampleMediaObject.loadFrom(sampleMediaObjectPath);

Load the MediaCollection that's most likely to contain search matches.

   MediaCollection myMediaCollection =       (MediaCollection) Eve.newMediaCollection();   String mediaCollectionPath = new String      ("/myMediaCollection/"); (mediaCollectionPath);

You need to retrieve the unique keys of all the MediaObjects in the MediaCollection and store them in long array. The keys identify the MediaObjects within the MediaCollection. Any MediaObject can be uniquely identified by referring to its key and the key of the MediaCollection in which it is stored.

   long keys[] = myMediaCollection.getKeys();

Create an instance of the SearchParameters class and customize it to your specifications. The setSearch() method accepts three parameters: an int representing the search type?shape, texture, color, or region; a Boolean value that determines whether to include that type of search in the query, and a weight value?the relative importance of the search type on a scale of 0.0 to 1.0. For example, the following SearchParameters instance uses Color and Region search types, and assigns them equal importance:

   SearchParameters mySearchParameters =       (SearchParameters) Eve.newSearchParameters();   mySearchParameters.setSearch(Eve.COLOR, true, 0.5);   mySearchParameters.setSearch(Eve.REGION, true, 0.5);

You perform the search by passing the method the sample MediaObject and the SearchParameters. The search method returns an array of SearchResults objects.

   SearchResults mySearchResults[] =,       mySearchParameters);

After obtaining the SearchResults array, you simply need to output the findings. While it would be nice to output the actual images in a GUI interface, this example just prints the filenames of the images that matched the search and their respective similarities to the sample MediaObject.

   for (int i = 0; i < mySearchResults.length; i++) {      MediaObject temp = myMediaCollection.getMediaObject      (mySearchResults[i].getKey());      System.out.println(i + " : " +          mySearchResults[0].getKey());      String filename = (String)          temp.getProperty("originalFilename");      System.out.println( "Target: " + i + " File: "       + filename + " Similarity: " +       mySearchResults[i].getSimilarity());   }

Further Steps
At this point, you could let the user select one of the search results as a new visual query image and further refine the search.

No search tool is perfect, and visual search is no exception. Sometimes the content of an image will be so obscured or so dependent on human semantic understanding that a visual search will turn up with few relevant results. However, when used with clear images and good search parameters, the eVe search results are usually quite accurate. When combined with a text search, visual search results can be absolutely astounding.

Although you've seen a very simple example in this article, more advanced applications are easy to create. Perhaps the most complex part is coming up with a graphical user interface that makes it clear to the user how to select objects and how to best search visually. No doubt, visual search interfaces will improve as visual search appears in more and more venues.

See also  Top 10 AI Tools for Developers in 2024

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist