James Larisch

“we walk while asking questions”

Read this first

2018 in Books

Updated regularly.

 Seven Surrenders by Ada Palmer

 The Will to Battle by Ada Palmer

 Fahrenheit 451 by Ray Bradbury

 Dune by Frank Herbert

 Annihilation by Jeff Vandermeer

 Lord of the Flies by William Golding

 The Windup Girl by Paolo Bacigalupi

 Authority by Jeff Vandermeer

 How Not to Be Wrong by Jordan Ellenberg

 Infomocracy by Malka Ann Older

View →

Synthesis: Practical Byzantine Fault Tolerance

The Byzantine Generals Problem paper gave us the number of nodes a system must contain in order to deal with m faulty nodes (3m+1).

The original solution assumes the following:

  1. Every message that is sent is delivered correctly.
  2. The receiver of a messages knows who sent it.
  3. The absence of a message can be detected.

In Practical Byzantine Fault Tolerance, Castro and Liskov claim to provide a state-machine replication protocol that survives Byzantine faults in an asynchronous network. The term “state-machine replication” describes generic replicated computation. If you can replicate a state-machine, you can essentially replicate any ordered sequence of operations. You’ll see that a lot in distributed systems literature. Byzantine faults are completely arbitrary faults: a node might crash, restart, send conflicting messages to different nodes, and lie about information. Other nodes

Continue reading →

Synthesis: Time, Clocks, and the Ordering of Events in a Distributed System

Let’s assume you have a distributed system consisting of n nodes. These nodes have perfectly synchronized clocks. If node 1 sends a message to node 2 and attaches a timestamp to the message, node 2 can order this event with respect to local events. In other words, node 2 knows whether node 1 sent the message either before or after node 2 received, say, a message from node 3. Since these messages may be arbitrarily delayed, they may be received in a different order than they were sent. With synchronized timestamps, node 2 can resolve this.

In fact, with synchronized timestamps and a synchronous network (every message is delivered within a known, bounded amount of time and differences in processor speeds are known and bounded), one can achieve consensus.

Unfortunately, identical clocks will drift depending on things like, say, temperature. Protocols exist for the synchronization of

Continue reading →

The Natural Laws of Paper Deadlines

  1. Yes, no one will read your code. But it’s worth taking the care to design your system, from architecture to logging, properly. Shortcuts compound.
  2. Keep a journal. You will encounter the same problems twice.
  3. Write down your methods, especially after you submit the paper. Who knows, it may get accepted, and you may have to run more experiments.
  4. Collect every statistic you can think of. If you don’t, you’ll have to rerun everything on deadline day.

This is a working list.

View →

2017 in Books

Here are the books I read (or rather, remember reading) in 2017. I don’t think star and number scales are particularly useful, so I mark each book as one of the following:

  • Not worth reading
  • Worth reading
  • Must read

 Historical Fiction

 Fall of Giants by Ken Follett

Historical fiction is the best way to learn about, well, history. Ken Follet’s trilogy provides an educational yet exhilarating glimpse into the tumultuous 1900s. Fall of Giants follows characters in Wales, Germany, England, USA, and Russia (and its transition into the Soviet Union) through World War I and the events leading up to it.

 Winter of the World by Ken Follett

Winter of the World picks up a few years after where its predecessor left off, with a new generation of characters (the old ones remain, though). The second book covers the rise of fascism in Europe and the subsequent World War II.

 Edge of Eternity by

Continue reading →

The Byzantine Generals Problem

The Byzantine Generals Problem is an abstract generalization for thinking about the failure in computer systems. Processors fail in multi-threaded systems, and nodes fail in distributed network systems. More specifically, processors and nodes can fail in strange and possibly malicious ways. How can we design computer systems to deal with these problems and what are the limitations of such solutions?

 The Original Problem

Several, separate divisions of the Byzantine army are camped outside an enemy city. Each division is commanded by its own general. Some part of the army observes the enemy and wants to communicate the plan of action to the rest of the army. The army wants to either retreat or attack as a whole, but the generals can only communicate with a messenger.

Unfortunately, some of the generals may be traitors. They may tell other generals to retreat when they should have

Continue reading →

Distributed Hash Tables: Introduction

This post assumes you have ~ 1 or more years of programming or Computer Science experience. I’d say if you’re comfortable with hash functions & some basic network ideas, there’s something here for you. This series of posts is designed to explore the purpose, design, and function of Distributed Hash Tables; specifically & eventually Kademlia, which I am implementing here.

 The Problem

Sharing files is not easy. Locating and hosting files across the internet is particularly challenging. As a webmaster, you are free to host files on your HTTP server and serve these files up to any user that visits your website. One might have a few complaints with this situation, however:

  • One can only see the files you choose to upload; the selection is limited.
  • You might take these files down, at which point one can no longer download them.
  • If your website becomes popular, you might have trouble

Continue reading →

What is a Bloom Filter, Really?

 When are sets not enough?

As we’ve seen, sets provide a powerful and efficient solution to specific problems. They allow you to efficiently determine if something belongs in a certain categorization, or not. Have I seen this song before? Have I seen this computer before? Have I seen this SSL certificate before?

Normal sets also store information about anything that has been inserted - this usually means you can enumerate over the inserted elements (in no particular order, of course).

Sets are fast - but normal sets take up quite a bit of space. If you want to view the contents of the set, you must store each element in its entirety. What if you forwent this ability and tried to store a shortened version of each element? Perhaps you only care if the inserted elements are in the set or not. Maybe the first 5 characters? Maybe a hash of the element? Both techniques suffer from the same

Continue reading →

What is a Bloom Filter?


Sets are useful mathematical abstractions. They are also useful and ubiquitous programming abstractions. Typically, we are concerned with the membership of elements within a set. Is the element we’re talking about in the set, or not?

Think about sets as unordered collections of unique elements. You could construct a set of all blue items in the world. The group of all real numbers is a set. The group of all integers is a set. So on and so forth.

Sets provide constant-time lookup operations on average. Let’s say your program builds up a group (read: not list) of a user’s favorite artists based on songs he or she has liked. When the user favorites a song performed by an artist already in the list, you don’t want to duplicate any work, so you use sets to ensure the items in your collection are unique. If you need to check if something is in your set, you can rest assured it will be

Continue reading →

A Pinch of Internet Protocol and a Dash of Routing

After typing in your favorite search engine, duckduckgo.com, into your web browser’s address bar, a complex series of events occurs once you hit the Return key.

It’s too much to cover in one blog post, but I’d like to cover part of the process of communication between your computer and DuckDuckGo’s server.

 IP Addresses

Every computer1 on the internet is assigned an Internet Protocol address, or IP address.

IP addresses take then form Four numbers from 0 - 255, separated by periods.

The range [0, 255] includes 256 numbers. 256 is equal to 2 to the power of 8, or 28.

Screen Shot 2016-05-29 at 7.09.20 PM.png

This means it takes 8 bits to represent all numbers from 0 to 256. Since we have 4 of these numbers, IP addresses are expressed in 32 bits. This means we can express 232 or 4,294,967,296 unique IP addresses using this scheme.

For now, let’s just say your computer is assigned one of these numbers

Continue reading →