Skip to content

đź—ł Make Your Voice Heard!

Go to https://apps.moonbeam.network/moonbeam page to Launch app

On April 5, 2023, the Moonbeam Network experienced a brief interruption to block production as an unintended consequence of the approval of Referendum 88. The issue was a result of an on-chain referendum that was approved just prior to the runtime upgrade, but the call was scheduled to be executed right after. This article provides a detailed post-mortem analysis of the incident, outlining the sequence of events that led to the network halt and the subsequent steps taken to resolve the issue and prevent its recurrence.

Summary

Referendum 88Go to https://moonbeam.subscan.io/referenda/88 page to Referendum 88, which included a system.remark call, was approved through community governance on block 3276000Go to https://moonbeam.subscan.io/block/3276000?tab=event&event=3276000-0 page to 3276000 and scheduled for execution on block 3291300Go to https://moonbeam.subscan.io/block/3276000?tab=event&event=3276000-2 page to 3291300.

Some blocks earlier (at block 3290853Go to https://moonbeam.subscan.io/extrinsic/3290853-0?event=3290853-0 page to 3290853) before Referendum 88 is executed, the runtime upgrade RT2201 was successfully applied. This new runtime included a low-level change in SubstrateGo to https://github.com/paritytech/substrate/pull/12451 page to low-level change in Substrate that altered the call index of system.remark, causing it to match the call index for system.setHeapPages.

Due to this change, the scheduled system.remark call was unintentionally switched to a system.setHeapPages call. The new call had an invalid value, which prevented collators from producing blocks and ultimately led to the network halt.

The last block before the halt, block 3291299Go to https://moonbeam.subscan.io/block/3291299 page to 3291299, was produced on April 5, 2023, at 14:43:24 UTC. The subsequent block, block 3291300, could not be produced because it included dispatching the scheduled call with the new, incorrectly configured HEAP_PAGES parameter.

Due to the prompt investigation from Moonbeam engineering contributors and Parity, a new client was published and made available to all nodes. This enabled the network to resume producing blocks after a downtime of approximately 4 hours.

Summary

Root Cause

Runtime 2201Go to https://github.com/moonbeam-foundation/moonbeam/releases/tag/runtime-2201 page to Runtime 2201 included a low-level change in SubstrateGo to https://github.com/paritytech/substrate/pull/12451 page to low-level change in Substrate that altered the call index of system.remark, causing it to match the call index for system.setHeapPages. Under normal circumstances, this is not a problem because a call done on the new runtime is already assigned to the new call index.

Referendum 88Go to https://moonbeam.subscan.io/referenda/88 page to Referendum 88 included a system.remark call and was initiated on RT2100. For this runtime, this call was assigned a call index of 1. When the referendum was approved, the network automatically scheduled the call to be dispatched on block 3291300Go to https://moonbeam.subscan.io/block/3276000?tab=event&event=3276000-2 page to scheduled the call to be dispatched on block 3291300. Nevertheless, this block was part of RT2201Go to https://github.com/moonbeam-foundation/moonbeam/releases/tag/runtime-2201 page to RT2201.

When trying to produce block 3291300, the execution of the newly mapped system.setHeapPages meant that a non-critical on-chain configuration value was changed so that collators were not able to produce blocks. Consequently, on April 5, 2023, at 14:43:24 UTC, the network stopped producing blocks.

Runtime upgrades go through several test networks, where they are thoroughly tested before reaching Moonbeam mainnet. The issue was not due to the runtime upgrade itself but due to a call scheduled on one runtime but then executed on another runtime, where the call indexes changed in between.

Resolution

The Moonbeam team released a new client, version 0.30.3Go to https://github.com/moonbeam-foundation/moonbeam/releases/tag/v0.30.3 page to version 0.30.3, to address the issue. The updated client ignores the incorrect HEAP_PAGES value stored on-chain, allowing collators to resume block production.

At 18:55:48 UTC, approximately 4 hours, 12 minutes, and 24 seconds after the initial issue, block production resumed with the creation of block 3291300.

As collators updated to the new client (v0.30.3), the network gradually began producing blocks at a regular cadence. The speedy upgrades by community collators to the new client were hugely important in helping the network return to its normal block production.

Next Steps

The Moonbeam Network halted due to Referendum 88 approval and the subsequent unintended switch from a system.remark to a system.setHeapPages call serves as an essential learning experience for the community.

The swift response of the Moonbeam engineering contributors to release a new client that addressed the issue demonstrates the project’s commitment to maintaining a secure and reliable network. The team also received invaluable help from BastiGo to https://twitter.com/bkchr page to Basti, a member of the Parity team. The incident highlights the importance of thorough testing and of both the runtime upgrades themselves, and situation-based on-chain governance scenarios.

An already-implemented solution has since been mergedGo to https://github.com/paritytech/substrate/pull/12891 page to already-implemented solution has since been merged to prevent such changes in call indices in future runtime releases. For future instances, two main points will be addressed during runtime upgrades:

  • A checklist of release conditions reviewed by all technical teams at least a day prior to updates to the client or runtime
  • Improve testing tools to include verifying future referenda with new client/runtimes
  • Going forward, the Moonbeam team and the community will continue to work together to enhance the network’s resilience and ensure its robust performance.

Event Summary