Russell O’Connor’s Blog

On Building Consensus and Speedy Trial

2021-06-15T19:14:22Z

Last week, Taproot activation locked-in within the “Speedy Trial” activation parameters. Speedy Trial is often described as a kind of “try-and-see” or “fail-fast” approach to activation. While this is a fair description, it is not how Speedy Trial was designed. I would like to write about how Speedy Trial was designed and why it is the way it is. This article assumes the reader is already familiar with Speedy Trial, other Taproot activation proposals, and Bitcoin soft-fork activation in general.

While there is broad community consensus to deploy Taproot, there was not and there is not broad consensus on the entirety of soft-fork deployments. There is broad consensus that miner assisted activation is preferred if and when it is available. In miner assisted activation, miners signal to indicate that they willing, able, and planning to enforce new consensus rules in accordance to a particular activation plan. The economic majority of users then upgrade their nodes to reject new soft-fork invalid blocks in accordance with the activation plan, which holds the miners to their promise. In turn, when a super-majority of the miners reject soft-fork invalid blocks, that protects the minority of users who have not yet upgraded their nodes. This process elegantly keeps the blockchain unified, preventing soft-fork activation induced reorgs and any fraudulent double-spends that such reorgs might allow.

The issues without consensus surround what to do if the miners are unwilling to volunteer to enforce new consensus rules despite broad community support for a soft-fork. While there is a diverse set of people with a diverse set of opinions surrounding these matters, I found that there were three broad factions representing various concerns. In no particular order, they are as follows.

The “no-miner-veto” faction’s concern is that miners will simply never volunteer to enforce new rules either through laziness or outright malice. Miners are short-term thinkers, the argument goes, and the path of least resistance for miners is simply to do nothing. It only takes a small fraction of such miners to prevent reaching a 95% or 90% signalling threshold.

This faction notes that miner signalling is not a critical part of the activation process. Even without miners volunteering to signal, a clear and committed economic super-majority can still reject soft-fork invalid blocks, and the miners will, at least eventually, learn that the coinbase rewards produced in such invalid blocks are worthless and, eventually, stop mining soft-fork invalid blocks. This faction would generally support a flag day activation, with or without a preceding period of miner assisted activation. They may support a “LOT=true” setting where miner signalling is forced at the end of a miner assisted activation. They would not necessarily oppose a “LOT=false” deployment setting, one in which activation fails upon signalling timeout, because such clients are compatible with “LOT=true” which could be used as a follow-up client or with a configuration flag / alternative client. A successfull forced signalling imposed by “LOT=true” clients would activate those “LOT=false” clients, bringing them along, in a certain limited sense, as part of their economic super-majority.

Some members of this faction believe that a super-majority of economic nodes were already willing to run some mandatory activation client, even without broad consensus for it. I dispute the claim that such a super-majority actually exists. I do not see the evidence that this super-majority exists even though one person keeps claiming that the evidence is all around and is so “obvious” that I must be blind not to see it. I would think that if the evidence is so obvious that it would be easy to compile and present it. I have not seen a such presentation of this “obvious” evidence, but I do seen evidence to the contrary. But that is for another post.

The “devs-do-not-decide” faction’s concern is regarding the appearance of Bitcoin developers deciding the rules of Bitcoin. Of course, developers cannot dictate the rules of Bitcoin simply by virtue of the fact that developers have no power to force anyone to run any new version of Bitcoin software. However this faction is concerned that Bitcoin developers merely appearing to be deciding the rules of Bitcoin may put those developers at risk of lawsuits or other threats from those looking to force a change in the rules of Bitcoins for their own benefit. This faction opposes any flag day or any other forced activation sequence such as “LOT=true”. This faction would be fine with users building their own alternative client for forced activation, or a configuration flag for enabling some kind of forced activation that is not enabled by default. There is a possibility that this faction could be okay with a forced activation attempt after a failed miner assisted activation if developers can demonstrate that a super-majority of users did actively indicate their desire for the soft-fork by observing most nodes being upgrading to a soft-fork supporting node, though this question has not really been explored.

The “no-divergent-consensus-rules” faction is concerned about different users running incompatible consensus rules on the Bitcoin network. Of course, users are not forced to run any particular piece of software, and users can run whatever code they want, compatible or not with other clients. Nonetheless, this group is opposed to activation rules that would, from their perspective, encourage the running of incompatible clients, including clients that are only semi-compatible. This faction opposes adding configuration parameters that changes consensus behaviour. This faction opposes any “LOT=false” client that may encourage other users to deploy a semi-compatible “LOT=true” (see the “no-miner-veto” faction). This faction would accept flag-day activation, as it has no miner signalling that can be abused by a “LOT=true” client. This faction would, in principle, accept a “LOT=true” client since, by itself, it does not encourage clients with alternate incompatible consensus rules. That said, it makes very little sense to deploy a “LOT=true” client without the existence of any other “LOT=false” clients whose activation they would be forcing. (Yes, I am referring to the nonsensical design of a certain Taproot alternate client that allegedly has an economic super-majority that is supposedly supposed to be running it.)

While I am not an expert on the theory of governing by consensus, my understanding is that broad consensus is achieved when all reasonable raised concerns have been addressed. Raised concerns must be technical in nature, so that they at have the possibility of being addressable. There are a couple of ways of dealing with a concern that is raised. Everyone can agree that the concern is valid and work together to find a solution or solutions that address the concern. Alternatively, people can try to argue that the raised concern is not actually an issue. Some may argue the concern raised already exists prior to the new proposal and the proposal does not make it worse, or that the raised concern cannot actually happen for some reason, or otherwise convince the group raising the concern that their concern is not really an issue.

There is a third way of proceeding: One can address a concern without conceding that the concern is valid. This can be done if the concern can be addressed without otherwise harming the proposal. This is particularity useful for concerns about soft-fork activation because activation code is transient in nature and can be removed after activation is buried. It can be far easier to accommodate concerns that you do not find valid than it is to convince those raising the concerns that they are wrong. Speedy Trial is a activation proposal that was designed to fit the constrains imposed by the concerns raised by all three factions. This proposal can be accepted by all factions without anyone conceding that the concerns raised by those other factions are valid concerns. While no one will find Speedy Trial to be an ideal activation scheme, hopefully they all at least find Speedy Trial acceptable.

At first glance, the three factions seem all hopelessly incompatible with each other. The “devs-do-not-decide” faction oppose any sort of flag day or “LOT=true” client. The “no-divergent-consensus-rules” faction opposes “LOT=false” clients because of the existence of the “no-miner-veto” faction that is happy to use any “LOT=false” activation client as leverage and release their own semi-compatible “LOT=true” client. However, a more careful examination of the constrains imposed by the three factions reveals a narrow island of acceptable activation parameters that all three groups can be satisfied with. This island is where Speedy Trial lies.

Speedy Trial’s design is not based on any sort of activation philosophy about failing fast. Speedy Trial is designed to satisfy the constraints of many different activation philosophies, even though those philosophies are themselves at odds with each other. Speedy Trial is a “LOT=false” client. This plainly satisfies the “devs-do-not-decide” requirements. Speedy Trial’s short signaling period is meant to satisfy the “no-divergent-consensus-rules” faction because it is no longer reasonable to produce a “LOT=true” client that would add mandatory signalling to the end of this short signalling period. Mandatory signalling requires a long time interval for the super-majority of clients to upgrade first, much longer than the three months of Speedy Trial’s signaling period. Of course, this does not prevent users from running their own alternative activation client, which we noted before is impossible to prevent, but at least Speedy Trial does nothing to encourage such clients. Speaking of alternative activation clients, the segment of the “no-miner-veto” faction hell-bent on releasing their own activation client without achieving broad consensus (but with their alleged economic super-majority support) is somewhat satisfied in the sense that they can incorporate Speedy Trial activation parameters in a semi-compatible way so that they can have the same signalling start time, and same minimum-activation-height, which keeps their client in consensus in the case where Speedy Trial is successful. While this group views the Speedy Trial attempt as largely worthless, at least Speedy Trial parameters do not obstruct their ability to design a mandatory activation client.

The “no-divergent-consensus-rules” and the “no-miner-veto” factions would likely prefer a flag-day tacked on to the end of Speedy Trial. The “devs-do-not-decide” and the “no-miner-veto” factions may prefer some sort of mandatory activation configuration flag to be included. But each of these “improvements” would violate a constraint that is imposed by some other faction. While no one finds Speedy Trial ideal, Speedy Trial does capture activation parameters that are broadly considered acceptable.

There was a wide range of predictions as to whether miners would likely signal within Speedy Trial’s short activation window. I myself originally though it was unlikely that a 90% signalling threshold would be met through a combination of miner laziness and the fact that some miners have made statements suggesting that they prefer Bitcoin to fail in order to redirect their SHA-256 miners to some other SHA-256 altcoin that they rather support. However, people who actually talked with miners, or otherwise seemed to be more well-informed than I was, indicated that there was very broad support for activating Taproot among miners. I thought it was at least plausible that Speedy Trial could activate Taproot within its short window, which made it worth trying. I do not think I would have taken an even odds bet for Speedy Trial working, but if a bet paid out 2-to-1 I would have probably taken it.

Still, some folks within the “no-miner-veto” faction were very adamant that Speedy Trial would fail. One wrote that Speedy Trial’s failure was “likely” and another said that Speedy Trial would at least teach everyone the “obvious” fact that miners will not signal (in adequate numbers) if they are not forced to. These folks would do themselves a favour by going back to reassess the reasoning that caused them to have such strong beliefs, beliefs that turned out to be so wrong. Perhaps even reexamine other strongly held beliefs they might have that may be similarly wrong. (In Bayesian terms, this processes is called updating one’s priors.)

While I believe that Speedy Trial did achieve broad consensus before being merged, it was not without dissent. Rusty NACKed the Speedy Trail proposal, and I never did come to an understanding of why. I do understand that he prefers not to leave the question of what to do if miners fail to signal unanswered. But I fail to understand how not running Speedy Trial helps resolve this question. Nothing in Speedy Trial prevents people from continuing that discussion. At worst Speedy Trial may demotivate such discussion, but the idea holding back Taproot deployment to force such discussions strikes me as manipulative, so I expect that this is not what Rusty meant. As I stated above, raised concerns must be technical in nature in order for them to be addressable. Having never figured out the technical nature of Rusty’s objection to Speedy Trial, and I was not able to address whatever his objection was. “I prefer we do something else” does not count as a technical objection.

There was another objection in the technical minutia of the chosen Speedy Trial activation parameters where the start and end times of the signaling period were specified in terms of Bitcoin’s MTP time rather than based on height. The decision to use MTP for the signalling period was itself another example of achieving broad consensus by addressing concerns raised. There were concerns about testnet and signet signalling periods, whose alternate networks operate at a variety of different heights. On the other hand, there were concerns about how MTP time could cause the transition points in the activation’s state machine to shift by an entire retargeting period if MTP thresholds end up on the boundary of retargeting period and there if reorgs happen around that boundary. This was solved, not by a virtual coin-flip, as is often alleged, but by addressing both concerns. The MTP concerns are most acute for the final activation transition, while reorgs extending or shrinking the signalling period are not really problematic. On the other hand, testnet and signet concerns are only related to signalling period. These networks can use an minimum-activation-height of 0 since there is no concern about the economic majority needing time to upgrage since there is no economy on these test networks. Thus the solution of minimum-activation being height-based, and the signalling period based on MTP addressed all the concerns.

However there was still an objection raised to this solution. It was alleged that there was already consensus to have everything height based, and that doing anything else would be going against that consensus. This objection, again, had no technical component to it. It was simply a statement that the Speedy Trial proposal does not have consensus because some other thing supposedly already had consensus. Firstly, having consensus around one design does not preclude the possibility that another design also has consensus. Secondly, if there is in fact, no consensus with the proposed solution, then it must be due some unaddressed technical concern. If that is the case, they can simply state what that technical concern is. However no such technical objection was forthcoming. Thirdly, the claim that the height-based activation already had consensus is plainly false based on the fact that above we see that technical concerns do exist regarding testnet and signet activation. You could argue that those raised concerns are not actual problems. Or, instead of arguing, you could build consensus by addressing the raised concerns without conceding that those concerns are valid.

This group of people objecting appear to have no interest in actually building broad consensus by addressing raised concerns that they do not think are valid. Instead, one person lobbed accusations that the testnet and signet arguments were raised in bad-faith in order to somehow undermine this alternate Taproot client. I found those accusations disgusting. Even if I personally do not concede that the testnet/signet issue is an actual problem, I still can recognize that testing and testability are legitimate design considerations. And how exactly does the choice of MTP over height based signaling periods undermine this alternate Taproot client? The alternate client could have easily chosen to incorporate a MTP based signaling period over a height based signaling peroid if being semi-compatible was important to them. If semi-compatibility was not very important to them, well since they allegedly had a super-majority of the economic nodes that would run their nodes, the divergent Speedy Trial design would be Speedy Trial’s problem, not theirs.

I could write much more on the issue of that alternate Taproot client, its nonsensical design, and the lack of evidence for their alleged super-majority of economic users (e.g. 16627 /Satoshi:0.21.1/ nodes versus 21 /Satoshi:0.21.0/withTaproot:0.1/ nodes) but I will have to save that rant for another day.

Carbon Tax: Running my Numbers

2020-12-13T02:30:15Z

I am so excited to read today that the Liberal government is planning to raise the tax on carbon up to $170 per tonne by 2030. The biggest problem with the previous carbon pricing program was that $50 per tonne was way too low. The price needs to be large enough such that capturing carbon is preferable to paying the tax. This announcement would finally put us into that ballpark.

Last fall I was discussing with someone who claimed that the carbon tax was just a tax grab and did not believe that the Liberal government was really going to pay the funds back out through the Climate Action Incentive tax credit. I did believe it would be paid out, but to be certain, I figured I should run the numbers.

The major carbon tax payments for my household is (1) natural gas for heating, and (2) gasoline for driving. Between April 2019 and March 2020, I paid $61.83 in carbon taxes for natural gas heating. During the same period I bought approximately 651.612 litres of gasoline. At a price of 4.42 cents per liter, I estimate I paid about $28.80 in carbon taxes for that gasoline. That brings me to a total of $90.63 paid in carbon taxes for that period.

On my 2018 tax return I received a $231 Climate Action Incentive tax credit for my household. That leaves me with a net benefit of $140.37 as a reward for emitting less carbon than average. This is to be expected because average household carbon emissions are much higher than median household emissions due to a relatively few high emitters.

I am omitting carbon tax charges for other incidentals, most notably for airfare. If anyone has any methods for calculating the carbon tax on airfares, please let me know. However, I have little doubt in my mind that this $140.37 more than covers any other outstanding carbon tax costs. Also keep in mind that the Climate Action Incentive tax credit was paid in advance with my 2018 income tax return, before I paid any carbon taxes at all.

Some people on twitter were wrongly arguing that the Climate Action Incentive tax credit is income based. I filled out my income tax return and the $231 value was based on the size of my household and was independent of my income.

The carbon tax program is great because it lets market forces determine how best to reduce carbon emissions. Those few households that are emitting most of the carbon are financially incentivized to change their consumption patterns. For reasons that I do not understand, both the federal and Ontario Conservative parties hate market solutions. Their "solution" to the carbon problem is to have their governments pick winners and losers themselves. No doubt that way they can make sure their cronies just happen to be among the winners.

One could argue that I am being selfish, because the more the tax on carbon increases, the more rewards I earn. On the other hand, perhaps those Canadians who are arguing against the carbon tax should run their own numbers and see how much they benefit from this carbon program.

It Is Never a Compiler Bug Until It Is

2021-07-08T15:23:01Z

Last week I was trying to add some testing code to libsecp256k1 and I was pulling out my hair trying to get it to work. No amount of printf was working to illuminate what I was doing wrong. Finally, out of desperation, I thought I would do a quick check to see if there are any compiler bugs related to memcmp, and lo and behold, I found GCC bug #95189: memcmp being wrongly stripped like strcmp.

Honestly this was a pretty horrifying bug to read about. Under some circumstances GCC 9 and 10 will cause memcmp to return an incorrect value when one of the inputs is statically known array that contains NULL bytes. As I rushed to recompile my computer system using GCC 8, I contemplated what the vast consequences of such a bug could be, and pondered how it was possible that computers could function at all.

However over the week, with the help of my colleagues, we managed to get a better understanding of the scope of the bug. The bug can only convert non-zero values to zero values. The static array needs to have a NULL byte within the first 4 bytes. Most importantly, the memcmp result must not immediately be compared to 0 for equality or inequality, or any equivalent test. A different code path is taken in the compiler in that case. That explained why computers were still functioning. I expect the vast majority of the uses of memcmp does an immediate test for equality with 0.

I still wondered though, how much code was being affected. My colleague Tim suggested that it would be possible to instrument GCC to emit a message when it was about to miscompile a program. Together we came up with a patch to GCC 9 and 10 that would print a debugging message. Once again, I recompiled my entire system, to see what GCC was miscompiling. This is what I found:

On my entire system I only found 10 lines of code that were miscompiled. Three lines are tests. All of the lines could be rewritten as a comparison to 0. None of the lines looked that serious. I am not sure which one is the worse: the reduced message integrity code(?) from some ARCFOUR implementation or the something something from an ATM driver?

The mplayer miscompilation is the most mysterious. The code surrounding that function all appears to be immediately compare memcmp with 0. And given that my debug message refused to point to exactly what line is being miscompiled in that function, I fear some set of optimizations has happened to allow this code to be miscompiled in some way.

With more hardware I could do a more thorough investigation of the consequences of this GCC bug. Until then I am going to stick with GCC 8 until GCC 9 and 10 have a new point releases.

Update: Thanks goes to Marc ‘risson’ Schmitt, who had more hardware. Please check out his results.

Stochastic Elections Canada 2019 Results

2019-12-09T20:31:22Z

It is time to announce the results from Stochastic Elections Canada for the 43^rd General Election.

Every vote counts with the stochastic election process, so we had to wait until all election results were validated before we could announce our results. However, stochastic election results are not very sensitive to small changes to the number of votes counted. The distributions for each candidate are typically only slightly adjusted.

Now we can announce our MP selection.

2019 Stochastic Election Simulation Results
Party	Seats	Seat Percentage	Vote Percentage
Liberal	116	34.3%	33.1%
Conservative	102	30.2%	34.4%
NDP-New Democratic Party	61	18.0%	15.9%
Bloc Québécois	25	7.40%	7.69%
Green Party	23	6.80%	6.50%
People’s Party	6	1.78%	1.64%
Christian Heritage Party	1	0.296%	0.105%
Parti Rhinocéros	1	0.296%	0.0535%
Independent	3	0.888%

Results by province and by riding are available (electoral districts on page 2).

The results were generated from Elections Canada data. One hundred and eighty-one elected candidates differ from the actual 2019 election outcome.

The People’s Party holds the balance of power in this parliament. Assuming a Liberal party member becomes speaker of the house, that means the Liberals together with the Bloc Québécois and Green Party have 163 votes and the Conservative and NDP together have 163 votes. The People’s Party’s 6 votes that is enough to decide which side reaches 169.

The rise in the Green Party’s popular vote allowed them to gain more seats this election. The Green Party has close to the same number of seats as the Bloc Québécois which reflects the fact that they have close to the same popular vote, even though the Green Party’s votes are more dilluted throughout Canada. This illustrates how sortition is a form of proportional electoral system.

Many proportional election systems require candidates to run under a party, or at least it is advantageous to be a run under a party. One notable advantage of sortition is that independent or unaffiliated candidates are not disadvantaged. While we did not select Jody Wilson-Raybould for her riding, Jane Philpott was elected to Markham—Stouffville. Also Archie MacKinnon was elected to Sydney—Victoria. And, with sortition, even the 396 residents of Miramichi—Grand Lake get a turn to have their choice of Mathew Grant Lawson to represent them in parliament.

This is only one example of the results of a stochastic election. Because of the stochastic nature of the election process, actual results may differ.

In Canada’s election process, it is sometimes advantageous to not vote for one’s preferred candidate. The stochastic election system is the only system in which it always best to vote for your preferred candidate. Therefore, if the 2019 election were actually using a stochastic election system, people would be allowed to vote for their true preferences. The outcome could be somewhat different than what this simulation illustrates.

Related info

Stochastic Elections Canada 2019 Update

2019-10-30T03:53:52Z

The rule of the people has the fairest name of all, isonomia, and does none of the things that a monarch does. The lot determines offices, power is held accountable, and deliberation is conducted in public. — Herodotus

In Athenian democracy, sortition was used to select their magistrates in order to avoid the oligarchs buying their way into the office. What would happen if we used a form of sortition to to select our parliament? Since most people are too busy and unprepared to sit in parliament, I propose the next best thing: the drawing of lots in a riding to select a person to chose the representative for the riding. What would happen?

The resulting system is a unique system that provides local representation and approximately proportional representation. Each party gets a chance to represent a riding in roughly proportion to the amount of support they have in the riding. Democracy means “rule of people”, not “rule of the majority” (nor “rule of the plurality”). Not only is it perfectly democratic for the minority to get an opportunity to be represented in parliament, it is more democratic than what we have in Canada now.

Of course, directly selecting a random person in a riding is fraught with difficulties, so instead one would vote, as we do now, for one’s preferred candidate. Then, once the votes are tallied, a candidate is selected randomly with probability proportional to the vote they received. In this system it is always best to vote for your preferred candidate. There will be no more strategic votes or vote splitting. Voting participation would go up since every vote increases the chances of your preferred candidate being selected. The resulting parliament will be close to the proportion of the number of votes received for each party without having MPs selected from a party list.

Imagine a world where we have Stochastic Elections Canada. Stochastic Election law requires that all counts be validated and recounted, if requested, before seat selection takes place. Because in every vote influences the outcome, we must await the return of the writs, scheduled by electoral law for Monday, November 11, 2019. For now, we can bring you our seat expectation chart based on preliminary 2019 election results:

Expected Seat Distribution
Party	Expected Number of Seats (95% confidence)	Distribution Shape
Animal Protection Party	0 – 1
Bloc Québécois	17 – 33
CFF - Canada’s Fourth Front	0
Christian Heritage Party	0 – 2
Communist	0 – 1
Conservative	99 – 130
Green Party	14 – 31
Liberal	97 – 130
Libertarian	0 – 1
ML	0 – 1
National Citizens Alliance	0
Nationalist	0
NDP-New Democratic Party	43 – 68
Parti Rhinocéros Party	0 – 1
PC Party	0 – 1
People’s Party	1 – 10
Pour l’Indépendance du Québec	0
Radical Marijuana	0
Stop Climate Change	0
UPC	0
VCP	0 – 1
Independent	0 – 4
No Affiliation	0 – 1

Related info

Counterfactual Definiteness and the EPR paradox

2019-08-25T18:38:57Z

Many articles have been written on the EPR paradox and Bell’s inequality. I want to write down, for my own reference, what the crux of the paradox is, how it relates to a counterfactual definiteness, what the various philosophical resolutions are, and why I feel that Everett’s many worlds interpretation is the least objectionable. By and large, I will be following Guy Blaylock’s paper “The EPR paradox, Bell’s inequality, and the question of locality”, and you probably ought to be reading that paper instead of this blog post.

Counterfactual definiteness is the claim that experiments that were not performed but could have been performed would have had definite outcomes if they had been performed. For most people, counterfactual definiteness is intuitive, after all, science is all about making prediction about the outcomes of experiments that may or may not actually be performed. However, counterfactual definiteness is problematic in the face of predictions made by quantum mechanics and special relativity as we shall see.

Let us set up a standard EPR thought experiment. Suppose Alice and Bob are placed very far apart from each other and at rest relative to each other and have synchronized their clocks. They are each sent a stream of photons entangled with the other party’s stream; let us say a thousand pairs of entangled photons. While this experiment could be analyzed with just a single pair of entangled photons, the paradox is more clear with a stream of entangled photons. Alice and Bob simultaneously choose an angle to measure their photon streams at, and then simultaneously measure the polarization of their stream of photons at their chosen angle. Let us say they end up choosing the same angle, which we will label as measuring at 0°. We postulate Alice and Bob are far enough apart that all of Alice’s measurements are preformed in a space-like separated manner from all of Bob’s measurements. Alice and Bob record the results of their measurements and travel to meet up afterward to compare notes.

Let us say Alice tabulated the following results for her measurements: +---+--+-+--+--+…, where + means the photon was measured as parallel to the alignment of her detector and - means the photon was measured perpendicular to the alignment of her detector. Bob will have recorded the following result: +---+--+-+--+--+…. Their results are identical because the both performed measurements of entangled photons at the same measurement angle. Nothing surprising here.

But suppose, counterfactually, Alice had decided to measure her photons at an angle of 41.4° instead. What would have happened at Alice and Bob’s meeting? Presumably since Bob’s experiment has not changed, and his experiment was space-like separated from Alice’s experiment, his results do not depend on what experiment Alice decides to perform, so his notes would still record the result: +---+--+-+--+--+…. Quantum mechanics predicts that counterfactual Alice and Bob’s results should differ in about 25% of the entries in this counterfactual scenario. So counterfactual Alice’s notes would have perhaps recorded something like -+--+--+--+-+--+…, or perhaps something different. But whatever she recorded, it would be something that differed in about 25% of the entries when compared to Bob’s result. So far so good.

Now suppose, counterfactually, Bob decided to measure at an angle of -41.4° and it was Alice who kept her measurement at 0°. What would have happened at Alice and Bob’s meeting in this case? By the same logic, Alice’s measurements at 0° would still get the result +---+--+-+--+--+…, and it is counterfactual Bob whose reported measurement differ in 25% of the entries. Because counterfactual Bob measures at a negative angle, we don’t expect it to necessarily agree with the previous notes of counterfactual Alice. Maybe counterfactual Bob’s notes would have recorded something like ++--++-+-+--+--+…, or perhaps something different, or perhaps it might even agree the notes of counterfactual Alice. Everything is still okay, but maybe we are getting a little nervous.

Finally, let us suppose, counterfactually, Alice had decided to measure her photons at an angle of 41.4° and Bob had decided to measure his photons at an angle of -41.4°. What would have happened at Alice and Bob’s meeting in this case? Alice and Bob’s experiments are space-like separated so neither of their choices should influence the outcomes of each other’s experiments. Presumably Alice’s notes would be the same as what we wrote above for counterfactual Alice’s notes: -+--+--+--+-+--+…. Similarly Bob’s notes would be the same as what we wrote above for counterfactual Bob’s notes: ++--++-+-+--+--+…. Here is the crux of the EPR paradox. Quantum mechanics predicts that counterfactual Alice and counterfactual Bob’s notes ought to differ in approximately 87.5% of the entries in this scenario! But no matter how we rearrange counterfactual Alice and counterfactual Bob’s notes, they can only differ between 0% and 50% of their entries on average. This is what it means for Bell’s inequality to be violated.

Clearly something is wrong in our naive description of the hypothetical experiments above. What are some proposed philosophical resolutions to this EPR paradox?

One possible resolution is that Alice’s choice in her measurement does somehow affect the outcome of Bob’s experiment! The problem with this is that Alice and Bob’s experiments are space-like separated. This implies that an observer traveling rapidly towards Bob and away from Alice will observe that Bob’s experiments conclude before Alice even begins her experiment when she makes her choice to whether to measure at angle 0° or 41.4°. According to this resolution, this observer sees Alice’s choices affecting the outcome of already completed experiments!

A symmetric possible resolution is that it is Bob’s choice in his measurement that affects the outcome of Alice’s experiment. But we have the same issue as above. There still exists an observer, this time traveling towards Alice and away from Bob, who observes that Alice completes her experiments before Bob begins his experiment.

Non-local interpretations of quantum mechanics, including the Copenhagen interpretation and hidden variable interpretations such as pilot wave theory, resolve the EPR paradox in the above manner. They suggest there is some special global reference frame that is used to absolutely decide which of Alice and Bob’s experiments are performed first and whichever experiment comes first is this special reference frame is the one whose outcome affects the other’s experiment. They suggest that the rest of the laws of physics conspire to keep all agents in the dark about which reference frame is this special global reference frame, as there are no experiments that can determine which reference frame is the special one. In particular, we cannot acutally perform an experiment where we go back in time to see would have happened if Alice or Bob had choosen a different angle of measurement.

Furthermore, in general relativity, I suspect it is more difficult, and probably impossible, to come up with any globally consistent universal reference frame to resolve the order of all events.

Of course, it could also be the case that both Alice and Bob’s choice affect the outcome of each other’s experiments. But this only makes the problem worse as it would mean that in every reference frame there are future events affecting past outcomes.

Another resolution to the EPR paradox is that Alice and Bob could not have chosen different angles of polarization; if they both measure at angle 0° then that is the only choice they could have made and Alice and Bob do not have free choices in the matter. This resolution is called superdeterminism. We can make our thought experiment more extreme by taking Alice and Bob’s free choice out of the picture. Instead we have Alice first measures the polarization of a CMB photon coming from the constellation Leo and chose her measurement setting, 0° or 41.4°, based on the outcome of that measurement. We have Bob measure the polarization of a CMB photon from Aquarius on the opposite side of the visible universe to chose his setting, 0° or -41.4°. Now superdeterminism requires that the universe has been conspiring since near the beginning of time so that the plasma of the early universe would cause two photons photons to be released and travel for 13 billion years to a point where life developed and would be setting up a quantum correlation experiment and pass through their measuring devices in such a way to force them to align their measurement settings to get exactly the correlation in their records that is predicted by quantum mechanics.

Furthermore, in a superdeterminstic world there could be arbitrarily extreme violations of Bell inequalities, even beyond the violations predicted by quantum mechanics. Yet the cosmic conspiracy chooses never to produce statisitical results that exceed the Bell-style voilations predicted by quantum mechanics for some reason.

A third resolution to the EPR paradox is to say that the question of what would have happened if Alice or Bob had done a different experiment is not a well-formed question. This is the resolution captured by the “shut-up and calculate” interpretation of quantum mechanics. There is not much else to say about this resolution beyond saying that I do not find the rejection of the very question to be particularly satisfying.

Lastly we come to Everett’s many world interpretation. This interpretation resolves the EPR paradox by saying that all possible experimental outcomes of Alice and Bob’s experiment all happen and they exist together in a superposition. The phase of Alice and Bob’s superposition changes based on the polarization they choose to make their measurements with, but no matter their measurement choice, all 2¹⁰⁰⁰ possible outcomes of Alice experiment happen and exist together and similarly for Bob. Later, when Alice and Bob meet to compare their notes, the superposition of Alices and the superposition of Bobs interfere with each other and split up in such a way that "most" of the Alices meet up with a version of Bob whose recorded outcomes have the correlations predicted by the quantum mechanics (or in the case of perfect phase alignment "all" of the Alices meet up with the corresponding Bob who has identical recorded noted).

This resolution violates counterfactual determinism because it does not predict any specific outcome for counterfactual Alice. It predicts a similar superposition of Alices but in a different phase. In that situation, the various Bobs in superposition could have met up with any number of possible different counterfactual Alices when they interfered. Furthermore, the different phase that the counterfactual Bobs would have been in would definitely influence this interference when meeting up with the superpositions of counterfactual Alices. It is not the case that Bob’s experimental choices affects Alice’s results, but his choices does affect the interference that happens when Alice and Bob meet, and does influence which version of the superposition of Alices he (or rather they) meet up with.

The many worlds interpretation is not without its own problems. If multiple words are all equally as real, why is it that we assign less probability to those worlds with the lesser probability amplitudes. After all, those words are, in some sense, just as real as the worlds associated with larger probability amplitudes. A better way of phrasing the problem might be: why is it rational to behave as if we expect outcomes with probability in accordance to the probability amplitudes of quantum mechanics?

How can basic arithmetic make a self-referential sentence?

2019-02-23T16:16:25Z

On Hacker news, imh asks,

I never understood the step about how a system that can do basic arithmetic can express the "I am not provable in F" sentence. Does anyone have an ELI30 version of that?

I think this is a great question and I would like to try to answer it.

First of all, it is important to understand that we have quite a bit more than just basic arithmetic. Let us define basic arithmetic as a language with symbols for the constants 0 and 1, the operations for + and ×, and the = relation (anything similar to this will work as well).

On top of basic arithemetic we add the language of first-order logic. That is we add logical operations for and, or, not, etc., we add an infinite number of logical variables x, y, etc., and we add universal quantification, ∀, and existential quantification, ∃.

We provide logical and arithmetic axioms that defines these symbols to be our usual interpretation of them. The domain of the quantifiers is implicit in first-order logic. We intend our domain to be the natural numbers, and our axioms are compatible with this interpretation. Technically there will be other valid interpretations of our axioms, but that will not matter for our purposes today.

Our language of first-order logic over basic arithmetic is vastly more expressive than basic arithmetic alone. Building a formula in this language that expresses “I am not provable in F” requires three key ingredients.

The first ingredient is the ability to define data structures. To do this, I will be using the Cantor pairing function, to define a functional relation that maps ordered pairs of natural numbers to a single natural number:

Pair(x, y, p) := p = ((x + y + 1) × (x + y)) ÷ 2 + y

To encode a pair of numbers, say 2 and 3, we plug them into the x, y parameters. Then the variable p is forced to be the value 18, which represents the ordered pair, ⟨2, 3⟩. To decode a number, say 7, we plug it into the p parameter. Then the variables x, and y, which are natural numbers, are forced to be the values 2 and 1, which is the ordered pair that 7 represents.

We have a problem that the our definition of Pair does not quite fit our definition of basic arithemtic because of the ÷ operator. However this is easily fixed by providing an equivalent definition that does not use ÷.

Pair(x, y, p) := 2 × p = (x + y + 1) × (x + y) + 2 × y

where 2 := 1 + 1 (in general we will let n denote the sum of n occurances of the numeral 1).

Using first-order logic we can compose this ‘Pair’ relation to encode and decode nested pairs and start building fancy data structures that you find in computer programming. In particular, we can build data structures like lists or structures for the language of basic arithmetic or the language of first-order logic. For example we may choose to represent a formula from basic arithemetic with the following recursive structure.

⸢0⸣ := ⟨0, 0⟩
⸢1⸣ := ⟨1, 0⟩
⸢X + Y⸣ := ⟨2, ⟨⸢X⸣, ⸢Y⸣⟩⟩
⸢X × Y⸣ := ⟨3, ⟨⸢X⸣, ⸢Y⸣⟩⟩
⸢X = Y⸣ := ⟨4, ⟨⸢X⸣, ⸢Y⸣⟩⟩

By using the Cantor pairing function, we can encode and decode these sturctures, letting us represent any basic arithemetic formula as a single natural number.

I should note that Gödel did not use the Cantor pairing function, and instead use prime decomposition. However, the computer scientist in me perfers the pairing function, and it has a simpler arithmetic relation that defines the encoding.

The second ingredient is Gödel’s β function. While the Cantor pairing function can be use to build lists, the problem is that it only possible to access fixed location of that list. The β function encodes lists in such a way that it is easy to access a variable locations within the list. The β function is defined as

β(x, y, i) := x mod (1 + (i + 1) × y)

Gödel’s β funciton lets us encode a list of numbers, a₀, a₁, ..., a_n with a pair of numbers x and y. in such a way that for all i less than n, the β(x, y, i) = a_i. It is possible to choose y’s value to find arbitrarly large and arbitrarily long arithemetic sequences of mutually coprime numbers. Then, given a sufficent y value, the Chinese remainder theorem lets us encode the content of the list as a single value x.

Below we define β as functional arithemetic relation.

BETA(x, y, i, a) := ∃q. x = q × (1 + (i + 1) × y) + a and ∃z. a + z = (i + 1) × y

BETA is defined so that BETA(x, y, i, a) holds if and only if β(x, y, i) = a.

Using the Cantor pairing function we could pair up the values x and y and n into a single number that represents any list of numbers. Using this new ability we can now define a structure that represents a trace for arbitrary computations. For example we can define a relation that defines the substitutions for our encoding of formulas of first-order logic.

SUB(f, i, t, a) := “there exists a trace of the computation of substituting the arithmetic term encoded by t
into the variable labeled by i within the formula f and the result of that computation is a.”

the result being that the functional relation SUB(⸢φ⸣, i, ⸢t⸣, a) that holds if and only if a equals ⸢φ[x_i↦t]⸣.

Similarly we can define a functional relation CODE(x, y) such that it holds if an only if y equals ⸢1 + 1 + … + 1⸣ where there are x number of occurances of 1. We can also define predicate PROVABLE_IN_F(x) that defines what it means for an encoding of a formula to be provable in some axiomatic system F, as long as it is decidable what the axioms of F are.

We now have a basic idea how we can specify datastructures and computations of those data structures using arithemtic. But how do we build a self-referential formula? While we can define arbitrary computations using first-order logic, it is not like we can write a computer program with a variable that has a contains a copy of the entire program itself. Or can we?

A Quine is a program that that prints out its own source code. However, it is just as easy to write a program that creates a variable contains the entire source code of program itself; a self-referential program! The technique to do this is the same as found in the proof of Kleene’s recursion theorems, and the implementation of the Y Combinator. In this way we can build self-referential arithemtic formulas.

Theorem. For any formula ψ(y), there exists a formula φ such that φ is equivalent to ψ(⸢φ⸣).

In other words, if we have a formula with a free variable y, we can build a self-referential formula by subsituting that variable with the encoding of the resulting formula. Like the Y Combinator, the proof of this theorem is both short and confusing.

Proof. Let θ := ∃y. ψ(y) and ∃z. SUB(x₀, 0, z, y) and CODE(x₀, z). Let φ := θ[x₀↦⸢θ⸣]. Then φ is equal to ∃y. ψ(y) and ∃z. SUB(⸢θ⸣, 0, z, y) and CODE(⸢θ⸣, z). By the property of CODE, the variable z, must be equal to ⸢⸢θ⸣⸣, and by the property of SUB, the variable y, must be equal to ⸢θ[x₀↦⸢θ⸣]⸣, which is equal to ⸢φ⸣. Thus φ is equivalent to ψ(⸢φ⸣). Qed.

Now we can finally build the self-referential Gödel formula by applying ψ(y) := not (PROVABLE_IN_F(y)) to the above theroem to get a formula φ such that φ is equivalent to not (PROVABLE_IN_F(⸢φ⸣)).

Why Is It Taking 20 Minutes to Mine This Bitcoin Block?

2018-02-25T16:05:48Z

Does this sound familiar?

You have just made a Bitcoin transaction and you are eager to see if it appears in the next block. You know that the expected time between Bitcoin blocks is 10 minutes. You check the log of your Bitcoin node. It has been 7 minutes since the previous block. You recall that blocks occurrences in Bitcoin are a Poisson process, which is memoryless. Even though it has been 7 minutes since the previous block, you still expect to wait another 10 minutes.

Five minutes pass. No new blocks have appeared. You have been staring at your Bitcoin node’s log this entire time. It has now been 12 minutes since the previous block. All your waiting has not changed anything. Even though you have been waiting for 5 minutes, the math says that you are still expected to wait 10 minutes before the next block will appear. A Poisson process is memoryless.

After staring at your Bitcoin node’s log for a further 8 minutes, you finally see a new block. “I swear that this always happens to me,” you say to yourself. “Whenever I’m waiting for my transaction to be confirmed, it always seems that the particular block I’m waiting for takes like 20 minutes to mine.”

My friend, if this has happened to you, you are not alone. This phenomenon is real.

Under the simplifying assumption that Bitcoin’s hashrate is constant, we know that a new block is mined once every 10 minutes on average, and this mining process can be well modeled by a Poisson process. Because Poisson processes are memoryless, at any given time we always expect that the next block will appear, on average, in 10 minutes. This holds no matter how long we have already been waiting. This memorylessness property applies just as well backwards in time as it does forwards in time. That is, if you pick a random point in time, on average, the previous block will have been mined 10 minutes earlier.

This is clear because if you sample a series of events from a Poisson process and take a second sample and reverse the occurrences of that series of events, the two samples will be indistinguishable. Therefore, by this symmetry, it must be the case that when you pick a random point in time, the expected time until the next event is the same as the expected time since the previous event.

“Wait a minute. You are saying that, if I pick a random point in time, we expect the previous block to have been mined 10 minutes in the past, and we expect that the next block will be mined 10 minutes in the future. Doesn’t that mean that we expect a total of 20 minutes between blocks?”

Correct, that is exactly what I am saying. If you pick a random point in time, you expect 20 minutes between the previous block and the next block on average.

“That cannot be true because we know that there are, on average, 10 minutes between blocks, not 20 minutes.”

This apparent paradox is essentially the same as the hitchhiker’s paradox. To resolve this paradox we need to understand that the question, “What is the expected time between blocks?” is underspecified. To compute an expected value we need to know which distribution we are computing the expected value with respect to.

Suppose we observe the Bitcoin blockchain for a while, and we make a list of the time between each successive block. When we average this list of numbers, we will get a value that is close to 10 minutes. Averaging this way corresponds to a distribution where each block interval is sampled with equal probability.

More precisely, the pdf for this distribution of non-negative interval durations is the exponential distribution pdf₁(t) = N₁ e^−λ t, where λ is 0.1 min^-1, Bitcoin’s block rate, and where N₁ is a normalization constant (which in this case is also 0.1 min^-1). The expected value of this distribution is ∫t pdf₁(t) dt = 10 min.

pdf₁: exponential distribution

Suppose we observe the Bitcoin blockchain for a while, and every day we write down the duration of the block whose interval crosses the 9:00 am mark. When we average this list of numbers, we will get a value that is close to 20 minutes. Averaging this way corresponds to a distribution where each block interval is sampled, not with equal probability, but proportional to how long the interval lasts. For example, we are twice as likely to sample an interval that lasts for 14 minutes than we are to sample an interval that lasts for 7 minutes simply by virtue of the fact that 14 minute intervals last twice as long as 7 minute intervals.

We can take the pdf for the exponential distribution above and multiply it by a linear factor to reweight the probabilities in accordance with how long the interval is. After normalization, the resulting pdf for this distribution is the gamma distribution (which shape parameter 2) pdf₂(t) = N₂ t e^−λ t (whose normalization constant N₂ is 0.01 min^-2). The expected value of this distribution is ∫t pdf₂(t) dt = 20 min.

pdf₂: gamma distribution with shape parameter 2

We can double-check this result by recalling the time reversing symmetry argument above. When we pick a random point in time, the time until the next block is some random variable X whose pdf is pdf₁, and the time since the previous block is a random variable Y whose pdf is also pdf₁. Therefore, the total time between the last block and the next block is the random value X + Y. We can compute the distribution for this sum by taking the convolution of pdf₁ with itself, and we indeed get pdf₂ as a result.

The bias towards picking longer blocks intervals by using the second sampling method accounts for the discrepancy between the two different results when computing average block interval durations. However, the word “bias” is not meant to be pejorative. This other sampling method is not incorrect or with prejudice; it is simply a different way of sampling. The distribution of intervals you need to use depends on the application you are using it for. If you want to compute the throughput of the Bitcoin, you will need to use the exponential distribution. If you want to know “why is does it take 20 minutes to mine the Bitcoin block with my transaction in it?”, you need to use this gamma distribution.

Verifying Bech32 Checksums with Pen and Paper

2018-01-06T16:40:28Z

Today we are going to learn how to verify a Bech32 checksum using only pen and paper. This is useful in those cases where you need to urgently validate the checksum of a Bech32 address, but the electricity has gone out in your home or office and you have lost your smartphone.

We are going to do a worked example of verifying the checksum of BC1SW50QA3JX3S, which is one of the test vectors from the Bech32 specification. However, before we begin, we need to make some preparations. We will need three tables.

The table of power
`a`	`xe86fe`
`c`	`wt5v4t`
`d`	`4vljgv`
`e`	`ukpcrk`
`f`	`0reszr`
`g`	`a7vy57`
`h`	`k5glc5`
`j`	`7xmfyx`
`k`	`yfatwf`
`l`	`t2ymv2`
`m`	`39zex9`
`n`	`vmwajm`
`p`	`ja45ka`
`q`	`qqqqqq`
`r`	`lwk4nw`
`s`	`n4cgp4`
`t`	`zs638s`
`u`	`5yjwly`
`v`	`832x73`
`w`	`2zf8mz`
`x`	`hu9r0u`
`y`	`60xz20`
`z`	`dnrp9n`
`0`	`clundl`
`2`	`sd093d`
`3`	`pgduhg`
`4`	`m8t7a8`
`5`	`f672t6`
`6`	`rchdsc`
`7`	`eh306h`
`8`	`9pshep`
`9`	`gjnkuj`

The matrix of wisdom
	`acde`	`fghj`	`klmn`	`pqrs`	`tuvw`	`xyz0`	`2345`	`6789`
`a`	`q9sy`	`5420`	`tzxw`	`ua7d`	`kp3n`	`melj`	`hvgf`	`8r6c`
`c`	`9q4p`	`3s02`	`w8rt`	`ecmg`	`ny5k`	`7u6h`	`jfdv`	`zxla`
`d`	`s4q5`	`y96l`	`mjk7`	`vdwa`	`x3pr`	`tf0z`	`8uce`	`hn2g`
`e`	`yp5q`	`s3wt`	`0xz2`	`ce6f`	`j94h`	`lamk`	`ngvd`	`r87u`
`f`	`53ys`	`qp7m`	`lkj6`	`gf2e`	`z498`	`0dtx`	`rcua`	`nhwv`
`g`	`4s93`	`pql6`	`7hnm`	`fgtc`	`r5yx`	`wv28`	`zeau`	`jk0d`
`h`	`206w`	`7lq9`	`pgvy`	`kh58`	`utme`	`3n4c`	`axzr`	`dfsj`
`j`	`02lt`	`m69q`	`ydfp`	`nj3z`	`ew7u`	`5ksa`	`cr8x`	`gv4h`
`k`	`twm0`	`l7py`	`qfd9`	`hk4x`	`a26c`	`sj5e`	`u8rz`	`vg3n`
`l`	`z8jx`	`khgd`	`fqyv`	`7lu0`	`5rn3`	`emas`	`4w2t`	`9pc6`
`m`	`xrkz`	`jnvf`	`dyqg`	`6mct`	`s8h4`	`ale5`	`32w0`	`p9u7`
`n`	`wt72`	`6myp`	`9vgq`	`jnsr`	`c0la`	`4h3u`	`ezx8`	`fd5k`
`p`	`uevc`	`gfkn`	`h76j`	`qpz3`	`2ad0`	`89rw`	`ts54`	`mlxy`
`q`	`acde`	`fghj`	`klmn`	`pqrs`	`tuvw`	`xyz0`	`2345`	`6789`
`r`	`7mw6`	`2t53`	`4ucs`	`zrqn`	`gl0d`	`98pv`	`fjkh`	`eayx`
`s`	`dgaf`	`ec8z`	`x0tr`	`3snq`	`mvu7`	`k5jl`	`6p9y`	`2wh4`
`t`	`knxj`	`zrue`	`a5sc`	`2tgm`	`qh89`	`d0fy`	`p67l`	`34vw`
`u`	`py39`	`45tw`	`2r80`	`aulv`	`hqsj`	`6c7n`	`kdfg`	`xzme`
`v`	`35p4`	`9ym7`	`6nhl`	`dv0u`	`8sqz`	`2gwr`	`xaec`	`kjtf`
`w`	`nkrh`	`8xeu`	`c34a`	`0wd7`	`9jzq`	`g2vp`	`ylm6`	`5sft`
`x`	`m7tl`	`0w35`	`sea4`	`8x9k`	`d62g`	`qzyf`	`vhnj`	`ucpr`
`y`	`eufa`	`dvnk`	`jmlh`	`9y85`	`0cg2`	`zqxt`	`w43s`	`76rp`
`z`	`l60m`	`t24s`	`5ae3`	`rzpj`	`f7wv`	`yxqd`	`gnhk`	`cu98`
`0`	`jhzk`	`x8ca`	`es5u`	`w0vl`	`ynrp`	`ftdq`	`976m`	`43g2`
`2`	`hj8n`	`rzac`	`u43e`	`t2f6`	`pkxy`	`vwg9`	`qml7`	`s5d0`
`3`	`vfug`	`cexr`	`8w2z`	`s3jp`	`6dal`	`h4n7`	`mqy9`	`t0k5`
`4`	`gdcv`	`uaz8`	`r2wx`	`54k9`	`7fem`	`n3h6`	`lyqp`	`0tjs`
`5`	`fved`	`aurx`	`zt08`	`45hy`	`lgc6`	`jskm`	`79pq`	`w2n3`
`6`	`8zhr`	`njdg`	`v9pf`	`m6e2`	`3xk5`	`u7c4`	`st0w`	`qyal`
`7`	`rxn8`	`hkfv`	`gp9d`	`l7aw`	`4zjs`	`c6u3`	`50t2`	`yqem`
`8`	`6l27`	`w0s4`	`3cu5`	`x8yh`	`vmtf`	`pr9g`	`dkjn`	`aeqz`
`9`	`cagu`	`vdjh`	`n67k`	`y9x4`	`weft`	`rp82`	`05s3`	`lmzq`

The list of courage
`bc1`	`rzqrrp`
`tb1`	`z5qrrp`

Print out these tables and keep them with your emergency supplies so that you can find them when you need them.

Now we can begin. Split the message BC1SW50QA3JX3S into its prefix, BC1, and suffix SW50QA3JX3S. Take the suffix and write it vertically on a piece of paper, leaving a gap after each letter and then a line.

S

─────────────
W

─────────────
5

─────────────
0

─────────────
Q

─────────────
A

─────────────
3

─────────────
J

─────────────
X

─────────────
3

─────────────
S

─────────────

Find the prefix in the list of courage and write the associated word after the first letter, S, placing a diagonal line between each letter. For new prefixes, you may need to add them to the list of courage beforehand.

S\r\z\q\r\r\p

─────────────
W

Take the last letter, which is p, and look it up in the table of power to find its associated word, which is ja45ka. Write this word in the gap under the Srzqrrp, extending the diagonal lines between each letter.

S\r\z\q\r\r\p
\j\a\4\5\k\a\
─────────────
W

For each letter of this power word, we need to use the matrix of wisdom to add it to the letter above and to the left of it. For example, we look up row S and column j in the matrix of wisdom and we find the letter z. Write z after the W below the line, separating it with a diagonal line again.

S\r\z\q\r\r\p
\j\a\4\5\k\a\
─────────────
W\z

We look up row r and column a in the matrix of wisom to find the number 7. We add 7 after the z, and keep doing this until every pair of letters is done. The matrix of wisdom is symmetric, so you do not have to worry about whether you are looking up by row/column or column/row.

S\r\z\q\r\r\p
\j\a\4\5\k\a\
─────────────
W\z\7\h\5\4\7

We repeat this process with the next line. First we lookup 7 in the table of power to find eh306h and write it underneath.

W\z\7\h\5\4\7
\e\h\3\0\6\h\
─────────────
5

Then, for each pair of letters, we add them using the matrix of wisdom.

W\z\7\h\5\4\7
\e\h\3\0\6\h\
─────────────
5\h\4\0\c\w\z

We keep doing this until we go through all the letters of the suffix.

S\r\z\q\r\r\p
\j\a\4\5\k\a\
─────────────
W\z\7\h\5\4\7
\e\h\3\0\6\h\
─────────────
5\h\4\0\c\w\z
\d\n\r\p\9\n\
─────────────
0\e\y\k\w\a\a
\x\e\8\6\f\e\ 
─────────────
Q\f\q\r\v\8\y
\6\0\x\z\2\0\
─────────────
A\6\x\x\p\x\g
\a\7\v\y\5\7\
─────────────
3\q\y\2\z\4\c
\w\t\5\v\4\t\
─────────────
J\l\t\s\x\h\7
\e\h\3\0\6\h\
─────────────
X\t\g\6\l\u\q
\q\q\q\q\q\q\
─────────────
3\x\t\g\6\l\u
\5\y\j\w\l\y\
─────────────
S\9\z\e\x\9\m
\3\9\z\e\x\9\
─────────────
 \p\q\q\q\q\q

The final result should be pqqqqq, where q is the most powerful letter and p is the wisest letter. If you did not get this result, start over from the beginning because you probably made a mistake. Remember to mind your p's and q's.

After a couple years of practice doing this by hand, the operations become natural. For example, you learn that x and y equals z, and so forth.

Exercise for the reader: Create a variant of this procedure for computing Bech32 checksums.

P.S. This article is not meant to be taken seriously.

Functor-Oriented Programming

2017-10-10T00:17:46Z

My style of Haskell programming has been evolving over the 15 years that I have been working with it. It is turning into something that I would like to call “functor oriented programming”. The traditional use of typed functional programming focuses on data types. One defines data types to model the data structures that your program operates on, and one writes functions to transform between these structures. One of the primary goals in this traditional methodology is to create data structures that exclude erroneous states to the extent that is reasonably possible. As long as one ensures that pattern matching is complete, then the type system catches many errors that would otherwise lead to these erroneous states, which have been crafted to be unrepresentable.

Functor oriented programming is a refinement of this traditional focus on data types. I was reminded of this concept recently when I was working with wren’s fantastic unification-fd library. With functor oriented programming, one divides data structures into layers of functors that, when composed together, form the data structures that your program operates on. Instead of writing transformations between data structures, one writes natural transformations between functors, where a natural transformation between functors F and G is a polymorphic function of type forall a. F a -> G a. While traditional functions often operate on products of multiple inputs and/or outputs, with functor oriented programming one will often see functions operating on compositions of functors, including but not limited to distribution functions of type forall a. F (G a) -> G (F a) and half-way distribution functions forall a. F (G a) -> G (H a), and many others.

By dividing data structures up into layers of functors, one can create a separation of concerns that does not occur in traditional functional programming. With functor oriented programming, polymorphism is not necessarily about using functions polymorphically. Instead, polymorphism provides correctness guarantees by ensuring that a particular function can only touch the specific layers of functors in its type signature and is independent of the rest of the data structure. One benefits from polymorphism even when a function is only ever invoked at a single type.

The appearance of many natural transformations is one hallmark of functor oriented programming. Higher-order natural transformations will invoke Rank2Types and RankNTypes, which is another hallmark of functor oriented programming. Other hallmarks of functor oriented programming include open recursive types, which allows one to divide up recursive types into their layers of recursion and create natural transformations that operate on a single layer at a time. Open recursive types plays an important role in wren’s unification library.

As fine of a language that Haskell is, it is not actually that suitable for functor oriented programming. The problem is that, under normal circumstances, there is no reduction or equivalence classes at the type level. For example, the identity functor does not transparently disappear during composition, the Compose functor is not transparently associative, and the Swap functor composed with itself does not reduce to the identity functor. To cope with this one must litter one’s code with newtype wrapper and unwrappers to make all these natural transformations explicit. In principle, these transformations should have no run-time consequences, but when they are used in higher-order ways, unfortunately they sometimes do. Despite the problems, I am not aware of any another practical language that better supports this style of programming. I think Haskell’s higher-kinded type classes and the progression of Monad, Applicative, Foldable, Traversable, etc. classes have been instrumental in leading to the development of this style of programming as they further motivate the division of one’s data structures into these layers of functors.

I have been thinking about writing this post for a few years now, and wanted to write something convincing; however, I do not think I am up to the task. Instead of trying to persuade the reader, I have elected to try to simply name and describe this style of programming so that the reader might notice it themselves when reading and writing code. Hopefully someone more capable than me can evangelize this approach, and perhaps even create a practical language suitable for this style of programming.