James Larisch

CS PhD student, learning by writing

Read this first

The Byzantine Generals Problem

The Byzantine Generals Problem is an abstract generalization for thinking about the failure in computer systems. Processors fail in multi-threaded systems, and nodes fail in distributed network systems. More specifically, processors and nodes can fail in strange and possibly malicious ways. How can we design computer systems to deal with these problems and what are the limitations of such solutions?

 The Original Problem

Several, separate divisions of the Byzantine army are camped outside an enemy city. Each division is commanded by its own general. Some part of the army observes the enemy and wants to communicate the plan of action to the rest of the army. The army wants to either retreat or attack as a whole, but the generals can only communicate with a messenger.

Unfortunately, some of the generals may be traitors. They may tell other generals to retreat when they should have

Continue reading →

Distributed Hash Tables: Introduction

This post assumes you have ~ 1 or more years of programming or Computer Science experience. I’d say if you’re comfortable with hash functions & some basic network ideas, there’s something here for you. This series of posts is designed to explore the purpose, design, and function of Distributed Hash Tables; specifically & eventually Kademlia, which I am implementing here.

 The Problem

Sharing files is not easy. Locating and hosting files across the internet is particularly challenging. As a webmaster, you are free to host files on your HTTP server and serve these files up to any user that visits your website. One might have a few complaints with this situation, however:

  • One can only see the files you choose to upload; the selection is limited.
  • You might take these files down, at which point one can no longer download them.
  • If your website becomes popular, you might have trouble

Continue reading →

What is a Bloom Filter, Really?

 When are sets not enough?

As we’ve seen, sets provide a powerful and efficient solution to specific problems. They allow you to efficiently determine if something belongs in a certain categorization, or not. Have I seen this song before? Have I seen this computer before? Have I seen this SSL certificate before?

Normal sets also store information about anything that has been inserted - this usually means you can enumerate over the inserted elements (in no particular order, of course).

Sets are fast - but normal sets take up quite a bit of space. If you want to view the contents of the set, you must store each element in its entirety. What if you forwent this ability and tried to store a shortened version of each element? Perhaps you only care if the inserted elements are in the set or not. Maybe the first 5 characters? Maybe a hash of the element? Both techniques suffer from the same

Continue reading →

What is a Bloom Filter?


Sets are useful mathematical abstractions. They are also useful and ubiquitous programming abstractions. Typically, we are concerned with the membership of elements within a set. Is the element we’re talking about in the set, or not?

Think about sets as unordered collections of unique elements. You could construct a set of all blue items in the world. The group of all real numbers is a set. The group of all integers is a set. So on and so forth.

Sets provide constant-time lookup operations on average. Let’s say your program builds up a group (read: not list) of a user’s favorite artists based on songs he or she has liked. When the user favorites a song performed by an artist already in the list, you don’t want to duplicate any work, so you use sets to ensure the items in your collection are unique. If you need to check if something is in your set, you can rest assured it will be

Continue reading →

A Pinch of Internet Protocol and a Dash of Routing

After typing in your favorite search engine, duckduckgo.com, into your web browser’s address bar, a complex series of events occurs once you hit the Return key.

It’s too much to cover in one blog post, but I’d like to cover part of the process of communication between your computer and DuckDuckGo’s server.

 IP Addresses

Every computer1 on the internet is assigned an Internet Protocol address, or IP address.

IP addresses take then form Four numbers from 0 - 255, separated by periods.

The range [0, 255] includes 256 numbers. 256 is equal to 2 to the power of 8, or 28.

Screen Shot 2016-05-29 at 7.09.20 PM.png

This means it takes 8 bits to represent all numbers from 0 to 256. Since we have 4 of these numbers, IP addresses are expressed in 32 bits. This means we can express 232 or 4,294,967,296 unique IP addresses using this scheme.

For now, let’s just say your computer is assigned one of these numbers

Continue reading →

What is SSL?

Secure Socket Layer, otherwise known as TLS (Transport Layer Security) keeps the internet safe. I’ll use the two acronyms interchangeably, despite TLS being the correct one.

If you’ve ever heard someone say, “don’t give sensitive information unless you see the green lock in the address bar”, you’ve used (and been instructed to use) SSL. If you’re using SSL, your address bar should say https://....

TLS/SSL facilitates an encrypted connection between a given client and a participating server. Let’s say I host and run a dating website. Technically speaking, this means I run a web server on a computer somewhere, connected to the internet. This web server runs on port 80.

You are a member of my dating website - you’re updating your profile while sitting at a Starbucks, chilling on public WiFi. Since the (TCP) connection between your web browser and my web server is not encrypted, all

Continue reading →

The Serial Intern

I worked at HubSpot for 7 months and have currently been working at Gem for almost 6 months. I have yet to graduate.

I, like almost every student at Northeastern University, am a serial intern.

Northeastern, like other co-op schools, sends students to work for 2-3 periods of 6 months within their 5 year stints as a ‘student’. Software engineering work and computer science study blend to perfect synergy. Looking past the mind-numbingly high price-tag of NEU, this medley of experience has been spectacular.

I think everyone should embrace the intern mentality - it’s learning at its purest.


Often the first few months at a job are the most stressful. But they also tend to be absolute catapult-periods in terms of knowledge and skill-acquisition.

You’ll almost certainly be

  • Learning a new style of project-management and process
  • Learning new technologies, languages, and

Continue reading →

Real Life™ + UI programming = FRP (or why Elm is awesome)


A piece of software that requires and reacts to user interaction is akin to a car. I’m no car guy - but when I press on the accelerator, my car’s velocity increases. When I hit the brakes, the brakepad hit my tires and the car begins to slow down.

I imagine slamming the brakes in a car is direct, short trip for the electrical system. When brakes are in ‘slammed’ state, the brakepad are hitting the tires. This makes sense & there’s no mental hurdle to jump to understand it.

In simple terms of input: I as the user am inputting ‘brake’. I get a car that is slowing down. In other words - whenever the brakes are down, the car is slowing down. Whenever the steering wheel is turned to the left, the wheels are turned to the left. It’s tempting to say that my input determines the state of the car, but I don’t think that’s ambitious enough.

The input is the state of the car. They

Continue reading →

Understanding Bitcoin mining & difficulty for fun and no profit

This post is for understanding the particulars of mining a block from the Bitcoin network. It assumes you have read Ken Shirrif’s post on the subject and have a vague understanding of it. If some of the details behind the Python mining simulator seemed a bit rushed, that’s what this post will attempt to ramify.

I will also attempt to make some of the details in the difficulty wiki article a bit more clear, so it’s worth giving that a skim.

Ken posted a small Python program to simulate how mining works here. He doesn’t really go into the details, so I did, and I’d like to share.

I will go through the code line by line, explaining what it does. After that I plan to convert the code to Ruby, using (I think) better naming conventions and function abstraction.

import hashlib, struct

ver = 2
prev_block = "000000000000000117c80378b8da0e33559b5997f2ad55e2f7d18ec1975b9717"
mrkl_root =

Continue reading →

Fix: Including modules on singleton objects in Ruby

Two weeks ago I posted this.

The idea behind adding methods to a single object still hold, but I made a rather egregious error.

I am pulling a row out of the database, (abstracted by ActiveRecord) and then extending that Ruby object with a module to add methods to it.

I was operating under the false assumption that that object, let’s call it x, will persist those methods after dying.

x is pulled out of the database. Methods a and b are added to that object (row). Then, the code moves on, or the user logs out. The data represented by x still exists in the database, but x itself does not. So, those methods get thrown away.

In my example, whenever a Membership was saved, it was extended. But again, that extends becomes useless after that particular object is no longer referenced.

A solution to this, the one I’m taking, is to extend that module whenever the Coinbase membership is

Continue reading →