SQL Simplicity for Java Value Mapping

SQL Simplicity for Java Value Mapping

ava does not have a convenient way to treat a function as an object and then pass it around, but suppose it could. Say a function were an implementation of the following interface:

interface Function {    Y apply(X x);}

This may not be as convenient as what Python, Ruby, or JavaScript offers, but it gives the programmer a new dimension. For instance, you could apply a function to a collection, element-wise, and produce another collection. (Functional programming has special terms for such an operation, but this example is in Java, so let’s not make things too complicated.)

In a sense, Map is very similar to Function. The main difference is that a map has a predefined set of keys, while a function is defined on the whole range of type values. With a Map, you can easily build a function based on that map:

Function function(Map map) {  return new Function() {    public Y apply(X x) {      return map.get(x);    }  }}

For the argument values that are not in the list of map keys, the function will return null. You could modify function() to take the second parameter, the default value for the function.

On the other hand, if you have a Function, you need a Set of keys to build a Map that corresponds to that function. But in essence, a Map is very different from a Function. When you create a map, you provide a collection of facts: certain keys are connected with certain values. You can add or remove these facts from the knowledge, and if, for a given key, a map produces a value associated with the key, it is just a convenience?like a select…where statement in SQL, it facilitates knowledge retrieval.

This article introduces maps that have two sets of keys, and shows how useful this class can be in everyday Java coding.

Introducing Two Keys
Often, mapping values of one type to another does not match the general knowledge of a domain. Suppose you have Hunters who from time to time bring home Mammoths, and you want to keep the data who, when, and how big. If you were using SQL, the solution would be obvious. You would use something like this:

CREATE TABLE Log (Hunter String, Time Timestamp, Weight Number);

In Java, however, this is hard to express. Even if you have a class Hunter, how do you link these three data items together? Solution one would be to create a special class:

class Event {  Hunter hunter;  Timestamp time;  double mammothWeight; // won't fit in float}

The disadvantage of this is that it does not help you to find facts about a certain hunter or a certain date. Even if you have a Collection, you will have to scan it all.

A natural, database-like solution would be to index such events. But what would be the key? J2EE suggests having a special class for keys, like this:

class EventKey {  Hunter hunter;  Timestamp time;}

Such a key, while useful for retrieving Entity Beans, does not make any practical sense. There is no such thing as “hunter-timestamp”?hunters are hunters and time is time. Moreover, this kind of “key” does not help you trace the history of successes (or failures) for any given hunter, nor the history of the tribe’s feasts and troubles. This means that you need to introduce separate indexes for hunters and for time. Depending on the problem you think you are solving, you can have one index or two:

Map> hunterIndex;Map> timeIndex;

Now imagine that in addition to a Collection you have to maintain two maps. Every time you add an event to the collection, you have to look up hunterIndex and check whether the entry exists. If it doesn’t, you create one with an empty map and then insert a new fact into that map. The same is true with deletion; only now you also have to ask a colleague whether it would be wise to remove empty secondary maps or if keeping them there is okay. Or maybe you know the answer, but your colleague who does your code review knows a different answer. Et cetera, et cetera, et cetera. I don’t know about you, but I create such cascading maps several times a year.

In practice, when people have such cascading maps, they rarely bother to keep a separate Collection because it seems to be a waste of time and space?except maybe when the collection is passed down from above or they have to recount the size of the collection. In that case, practical programmers employ one of two very different strategies:

  1. When requested, scan through hunterIndex, adding up the sizes of secondary maps.
  2. Keep a separate counter by “caching” it, and update it on each addition or deletion. In this case, the programmer must take care of threads and exceptions, and imagine the application running for months?and never recounting its mammoths.

As I see it, all this happens because Java programmers are used to thinking in terms of existing classes, and they just pick up whatever they find in java.util or java.lang. Python programmers do not even encounter such problems, and JavaScript programmers do not have a choice: their only option is associative array with strings as keys.

What Would a SQL Programmer Do?
A SQL programmer would have a Collection and, when specified, would also have the necessary indexes automatically updated on all changes. Maybe you could imitate this behavior. What if, in addition to the interface Map (which has been around since the mammoth times), you introduce one more key?just one more?and have a new interface, Map, where X and Y are key types and V is a value type? The following code is almost the same as Map but the entries have two keys, so you would have two sets of keys:

public interface Map2 {  int size();  boolean containsKeyPair(Object key1, Object key2);  V get(X key1, Y key2);  V put(X key1, Y key2, V value);  V remove(X key1, Y key2);  Set keySet1();  Set keySet2();  Collection values();  Map curry1(X key1);  Map curry2(Y key2);  interface Entry {    public X getKey1();    public Y getKey2();    public V getValue();    V setValue(V value);  }  Set> entrySet();}

Note two new methods, curry1 and curry2. They take one key and return a map from another key to values. The names come from currying, the functional programming term for this operation.

A default implementation, AbstractMap2, stores entries in a set, and retrieval of a value for a pair of keys amounts to scanning through the whole set of entries, which for small sets may not be bad at all. The only abstract method here is Set> entrySet(), which gives you the freedom to store the data any way you choose.

The default implementation is not very efficient, so let’s introduce an indexed map: IndexedMap2. This map maintains two indexes, for X and for Y, and Set> entrySet() remains abstract.

Two-Parameter Maps
As you have read, a relatively small group of classes solves a rather frequent problem: when you have a cascading map, which index goes first and how do you scan the whole collection? You could adapt the classes you know, Map and Set, by always designating whether hunters go first and own a collection of time-indexed events, or whether time goes first and each moment has a collection of hunter-indexed events. But the better solution is applying Map2, a map with two sets of keys.

Share the Post:
XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several

data fivetran pricing

Fivetran Pricing Explained

One of the biggest trends of the 21st century is the massive surge in analytics. Analytics is the process of utilizing data to drive future decision-making. With so much of

kubernetes logging

Kubernetes Logging: What You Need to Know

Kubernetes from Google is one of the most popular open-source and free container management solutions made to make managing and deploying applications easier. It has a solid architecture that makes

ransomware cyber attack

Why Is Ransomware Such a Major Threat?

One of the most significant cyber threats faced by modern organizations is a ransomware attack. Ransomware attacks have grown in both sophistication and frequency over the past few years, forcing

data dictionary

Tools You Need to Make a Data Dictionary

Data dictionaries are crucial for organizations of all sizes that deal with large amounts of data. they are centralized repositories of all the data in organizations, including metadata such as