As an integration developer, the programs I write at work generally make a lot of HTTP requests. But a single HTTP request involves a lot of steps in the background, from Layer 7 to Layer 1 of the TCP-IP/OSI Model:

  • Application layer => HTTP request is constructed by the HTTP client (like Axios, Got…).

  • Transport layer => The request (data packet) is then prepared by the HTTP client for transport through the TCP protocol, the famous TCP-Handshake is done: connection is initiated by the client (SYN flag/bit is raised) > server says “ok I hear you” (SYN-ACK flag is raised) > client says “oh great I hear you too let’s talk” (ACK flag is raised), then the request is split into multiple TCP segments (it is split because TCP has a max size limit for segments! – TCP is optimized for reliability!).

  • Network layer => The domain of the request is resolved to an IP adress through DNS lookup, then the IP packet is wrapped up: source IP, destination IP and the previous TCP segments are all cosy together. All that is sent to the nearest router and the internet routing does its thing with the help of default gateways. But careful, at each hop of the node, the following step is done.

  • Data link layer => For each Hop of the node, the IP packet is encapsulated inside an Ethernet or a Wi-Fi frame containing the source MAC adress (MAC adress of the client’s network card), the destination MAC adress and of course the IP packet. The correspondance between the IP adress and the MAC adress is done through ARP (Adress Resolution Protocol) and the ARP table. The frame is sent over the network again until destination is reached.

  • Physical layer => We arrived! The frame is converted into a bitstream (0s and 1s), these bits are transmitted through electrical signals (Ethernet), light pulses (optic fiber) or radio waves (Wi-Fi). This signal travels through the destination infrastructure to the targeted machine!

And this is only for the request, all these steps must be done in reverse for the response.

Moreover the API you talking to surely have a rate limiter in place, so the request might take even longer.

So, the less API calls you do, the better.

We had performance issues with one of our “connectors” as we call it. It took 6 hours to complete the data synchronization between the two systems. 6 hours, you can imagine the customer using the integration began to raise questions.

I analyzed the code (we write TypeScript and pure Node.js) and did some testing and quickly figured out two things:

  • We were rate-limited more than usual by the external API.
  • No memoization was done.

Now, I couldn’t do anything about the 1st point. I wrote some email to our contact to check if that was normal but that’s it.
But the second point is less frustrating.

What is memoization?

Well, it’s a really simple but powerful technique, it is a specific form of caching. In other words, it is a code optimization technique.
The idea is to store the result of an operation and reuse that result if the same input(s) occurs again. This way you never do the same operation/calculation twice. The caching is done at runtime, losing the cached data once the program has finished its execution.

In the integration program I investigated, for example, an API request was done for each Student ID that we had. As there was no caching in place, one student might be fetched 2, 10, 50 times! By applying the memoization technique, each student will be fetched one time and one time only.

The implementation is quite simple (here a TypeScript example):

// Typically elsewhere in a type definition file.
type Student = { id: string, name: string, email: string };

// Let's say I'm in a TypeScript class, 
// define whatever data structure you'd like for 
// caching your operation result.
private studentsCache: Map<string, Student> = new Map();

//...

async fetchStudent(studentId: string): Promise<Student> {
  // Read cache and return. 
  // The benefits of memoization happens here.
  if (studentId && this.studentsCache.has(studentId)) {
    return this.studentsCache.get(studentId);
  }

  // Does not exist in cache, 
  // so fetch it using whatever HTTP client you're using in the project.
  // Typically a GET request would be done using studentId as query parameter.
  const { data: student } = await axios(...);

  // Save to cache. The act of memoization happens here.
  // Notice that we save the INPUT (studentId) and not student.id 
  // which is a part of the result itself.
  // Next time, if we encounter a studentId input with the same value, 
  // the API call won't be executed as the function will return early with the previously saved result.
  if (studentId && student) {
    this.studentsCache.set(studentId, student);
  }

  return student;
}

Of course you might want to wrap your function body inside a try-catch or whatever error handling pattern you like, this is kept as minimal as possible for the sake of clarity.

And of course, the operation performed can be whatever it needs to be, I took the example of an HTTP call but it can be an expensive math calculation, a huge JSON (de)serialization, depends on your problem.

The integration program I’m talking about went from a 6 hours execution time to a 1 hour execution time. Just by using runtime caching (aka memoization) alone.

Several expensive operations remained, we are still handling ten of thousands resources and we were still very rate-limited by the API but 5 hours is a big win for such a simple technique applied.

Anyway, stay tuned for another optimization technique, and in the meanwhile, save your results! Cheers.