James Larisch

CS PhD student, learning by writing

Read this first

Synthesis: Time, Clocks, and the Ordering of Events in a Distributed System

Let’s assume you have a distributed system consisting of n nodes. These nodes have perfectly synchronized clocks. If node 1 sends a message to node 2 and attaches a timestamp to the message, node 2 can order this event with respect to local events. In other words, node 2 knows whether node 1 sent the message either before or after node 2 received, say, a message from node 3. Since these messages may be arbitrarily delayed, they may be received in a different order than they were sent. With synchronized timestamps, node 2 can resolve this.

In fact, with synchronized timestamps and a synchronous network (every message is delivered within a known, bounded amount of time and differences in processor speeds are known and bounded), one can achieve consensus.

Unfortunately, identical clocks will drift depending on things like, say, temperature. Protocols exist for the synchronization of

Continue reading →

A Distributed Systems Roadmap

Here lives a map for navigating the basics of distributed systems theory. This roadmap is mainly for programmers and researchers interested in learning enough theory to construct and reason about “real” distributed systems. Important proofs and results help ensure that such people do not attempt to solve problems that cannot be solved. Once you knows what is impossible under certain sets of assumptions, you can manipulate assumptions and designs in creative ways, to fruitful ends.

This is a working document. I have read some of these papers, and merely skimmed others. I plan on creating syntheses for as many of the papers as possible.

I am creating this document for another reason. The different possible failure and threat models in distributed systems theory make it difficult to classify systems. Raft handles fail-stops, but not Byzantine failures. A system with 2t+1 nodes can handle

Continue reading →

The Natural Laws of Paper Deadlines

  1. Yes, no one will read your code. But it’s worth taking the care to design your system, from architecture to logging, properly. Shortcuts compound.
  2. Keep a journal. You will encounter the same problems twice.
  3. Write down your methods, especially after you submit the paper. Who knows, it may get accepted, and you may have to run more experiments.
  4. Collect every statistic you can think of. If you don’t, you’ll have to rerun everything on deadline day.

This is a working list.

Continue reading →

2017 in Books

Here are the books I read (or rather, remember reading) in 2017. I don’t think star and number scales are particularly useful, so I mark each book as one of the following:

  • Not worth reading
  • Worth reading
  • Must read

 Historical Fiction

 Fall of Giants by Ken Follett

Historical fiction is the best way to learn about, well, history. Ken Follet’s trilogy provides an educational yet exhilarating glimpse into the tumultuous 1900s. Fall of Giants follows characters in Wales, Germany, England, USA, and Russia (and its transition into the Soviet Union) through World War I and the events leading up to it.

 Winter of the World by Ken Follett

Winter of the World picks up a few years after where its predecessor left off, with a new generation of characters (the old ones remain, though). The second book covers the rise of fascism in Europe and the subsequent World War II.

 Edge of Eternity by

Continue reading →

The Byzantine Generals Problem

The Byzantine Generals Problem is an abstract generalization for thinking about the failure in computer systems. Processors fail in multi-threaded systems, and nodes fail in distributed network systems. More specifically, processors and nodes can fail in strange and possibly malicious ways. How can we design computer systems to deal with these problems and what are the limitations of such solutions?

 The Original Problem

Several, separate divisions of the Byzantine army are camped outside an enemy city. Each division is commanded by its own general. Some part of the army observes the enemy and wants to communicate the plan of action to the rest of the army. The army wants to either retreat or attack as a whole, but the generals can only communicate with a messenger.

Unfortunately, some of the generals may be traitors. They may tell other generals to retreat when they should have

Continue reading →

Distributed Hash Tables: Introduction

This post assumes you have ~ 1 or more years of programming or Computer Science experience. I’d say if you’re comfortable with hash functions & some basic network ideas, there’s something here for you. This series of posts is designed to explore the purpose, design, and function of Distributed Hash Tables; specifically & eventually Kademlia, which I am implementing here.

 The Problem

Sharing files is not easy. Locating and hosting files across the internet is particularly challenging. As a webmaster, you are free to host files on your HTTP server and serve these files up to any user that visits your website. One might have a few complaints with this situation, however:

  • One can only see the files you choose to upload; the selection is limited.
  • You might take these files down, at which point one can no longer download them.
  • If your website becomes popular, you might have trouble

Continue reading →

What is a Bloom Filter, Really?

 When are sets not enough?

As we’ve seen, sets provide a powerful and efficient solution to specific problems. They allow you to efficiently determine if something belongs in a certain categorization, or not. Have I seen this song before? Have I seen this computer before? Have I seen this SSL certificate before?

Normal sets also store information about anything that has been inserted - this usually means you can enumerate over the inserted elements (in no particular order, of course).

Sets are fast - but normal sets take up quite a bit of space. If you want to view the contents of the set, you must store each element in its entirety. What if you forwent this ability and tried to store a shortened version of each element? Perhaps you only care if the inserted elements are in the set or not. Maybe the first 5 characters? Maybe a hash of the element? Both techniques suffer from the same

Continue reading →

What is a Bloom Filter?


Sets are useful mathematical abstractions. They are also useful and ubiquitous programming abstractions. Typically, we are concerned with the membership of elements within a set. Is the element we’re talking about in the set, or not?

Think about sets as unordered collections of unique elements. You could construct a set of all blue items in the world. The group of all real numbers is a set. The group of all integers is a set. So on and so forth.

Sets provide constant-time lookup operations on average. Let’s say your program builds up a group (read: not list) of a user’s favorite artists based on songs he or she has liked. When the user favorites a song performed by an artist already in the list, you don’t want to duplicate any work, so you use sets to ensure the items in your collection are unique. If you need to check if something is in your set, you can rest assured it will be

Continue reading →

A Pinch of Internet Protocol and a Dash of Routing

After typing in your favorite search engine, duckduckgo.com, into your web browser’s address bar, a complex series of events occurs once you hit the Return key.

It’s too much to cover in one blog post, but I’d like to cover part of the process of communication between your computer and DuckDuckGo’s server.

 IP Addresses

Every computer1 on the internet is assigned an Internet Protocol address, or IP address.

IP addresses take then form Four numbers from 0 - 255, separated by periods.

The range [0, 255] includes 256 numbers. 256 is equal to 2 to the power of 8, or 28.

Screen Shot 2016-05-29 at 7.09.20 PM.png

This means it takes 8 bits to represent all numbers from 0 to 256. Since we have 4 of these numbers, IP addresses are expressed in 32 bits. This means we can express 232 or 4,294,967,296 unique IP addresses using this scheme.

For now, let’s just say your computer is assigned one of these numbers

Continue reading →

What is SSL?

Secure Socket Layer, otherwise known as TLS (Transport Layer Security) keeps the internet safe. I’ll use the two acronyms interchangeably, despite TLS being the correct one.

If you’ve ever heard someone say, “don’t give sensitive information unless you see the green lock in the address bar”, you’ve used (and been instructed to use) SSL. If you’re using SSL, your address bar should say https://....

TLS/SSL facilitates an encrypted connection between a given client and a participating server. Let’s say I host and run a dating website. Technically speaking, this means I run a web server on a computer somewhere, connected to the internet. This web server runs on port 80.

You are a member of my dating website - you’re updating your profile while sitting at a Starbucks, chilling on public WiFi. Since the (TCP) connection between your web browser and my web server is not encrypted, all

Continue reading →