Checklist for staging, experimenting with, and shipping WebAssembly features

This document provides checklists of engineering requirements for staging, experimenting with, and shipping WebAssembly features in V8. These checklists are meant as a guideline and may not be applicable to all features. The actual launch process is described in the V8 Launch process.

Overview #

A feature can be anything from a visible addition to the WebAssembly API which is driven by a W3C WebAssembly community group proposal to a larger architectural change that improves performance, stability or user experience.

For W3C WebAssembly proposals, we always follow this process even if the proposal is comparably small. In that case, the trials can be skipped if there is enough confidence in the design. But all other requirements are mandatory. For non-proposals, the application of this process depends on the complexity and the risk associated with it. E.g., a simple compiler optimization would not require going through the steps while adding a new compiler all together certainly would. As a rule of thumb, if a feature is complex enough to require adding a feature flag during development, then it's likely worth following this process. If an optimization can be merged in a few CLs during one milestone development phase, it's small enough to ship directly.

Features of this complexity start off behind an experimental flag which enables the feature for developers that would like to try it out and provide feedback and allows us to test the feature in a limited capacity. As these features require explicit command line arguments, we don't expect users to enable them and if they do, it's at their own risk.

Once we consider a feature sufficiently stable that we consider experimentation or even shipping, we (pre-)stage it. This enables the feature on our fuzzers, test and benchmarking infrastructure and allows us to detect issues early on. Once it has shown to be sufficiently stable (usually after ~2 weeks without major incidents), we open it to the Vulnerability Reward Program (VRP) to allow external security researchers to test it too and file bugs on it.

Some features might ship directly from this phase, if we don't expect to gain any insights from further experimentation. Others will go through one or more phases of experimentation, e.g. developer trial, origin trial or Finch trial where we collect data from partners or in-the-wild usage.

An overview over the shipping phases together with their respective requirements is shown here:

Overview of WebAssembly shipping phases

Flags #

We usually define one or more command line flags that guard the feature from being active in production environments before it's ready for general use. These flags allow fine-grained control for testing and debugging and can be kept beyond the release of a feature to switch it off when needed. This is mostly not necessary and not worth maintaining the alternative code path, but can sometimes be useful (e.g. we kept the flags for lazy compilation and dynamic tiering).

Wasm feature flags vs. V8 flags #

In WebAssembly, we have the option of using a Wasm feature flag (--experimental-wasm-*) which is defined via a macro in src/wasm/wasm-feature-flags.h (different macros for different phases of development). These flags are usually used for new functionality, e.g. related to a new WebAssembly proposal.

Alternatively, one can use a regular V8 flag as defined in src/flags/flag-definitions.h. These flags are commonly used for architectural changes or optimizations. In early stages, you should use DEFINE_EXPERIMENTAL_FEATURE().

Flags for (pre-)staging #

There are also common flags which bundle multiple experimental flags together through implications. --experimental-fuzzing is for enabling experimental features on our fuzzers in the pre-staging phase. Wasm feature flags defined in the FOREACH_WASM_PRE_STAGING_FEATURE_FLAG macro are automatically implied by this flag. V8 flags for pre-staged features require an explicit implication in src/flags/flag-definitions.h.

Wasm feature flags also require a use counter to be added (or explicitly disabled this using kIntentionallyNoUseCounter). It's generally advisable to add a use counter to track adoption. You can pick a WebFeature or a WebDXFeature for your implementation. If it's linked to a W3C WebAssembly proposal, WebDXFeature is recommended. Otherwise, a WebFeature can be used which requires no approval process.

For staged features, that are ready for public evaluation (including the VRP) before their launch, we have the --wasm-staging flag which implies all Wasm feature flags defined in the FOREACH_WASM_STAGING_FEATURE_FLAG and covers new functionality about to be launched in the near future. For features that are non-functional like optimizations, one can add an explicit implication from --future. This flag is also used for benchmarking the performance of upcoming V8 versions.

Phases #

Inception #

This is the phase in which implementation in V8 is starting, but there might not be a Chrome feature entry or even a proper name for the feature. Code might be in local branches only or submitted to the main branch, guarded behind a feature flag.

Developer trial (optional) #

We can optionally ask external partners for feedback on the scope, interface or performance of the feature. During the developer trial, they can only test locally, because enabling the feature requires explicitly enabling the feature flag via the command line. A developer trial may start before staging and can continue until shipping.

(Pre-)Staged #

Once we believe the feature is mature enough to consider user testing or even shipping, we stage it for at least one milestone. This increases coverage on our test and fuzzing infrastructure. The pre-staging phase is enabled by adding the feature flag as an implication to --experimental-fuzzing.

After a short time in this stage, we will move the implication to --wasm-staging or --future depending on whether it's a feature or an optimization/architectural change respectively. This will open it for the VRP to encourage external researchers to find issues with the code. During this phase, we usually hold a shipping review where the development team assesses the test and fuzzer coverage and decides on requirements for the following phases.

Origin/field trial #

If we need more data to decide on the readiness of a feature, we can schedule a trial. This can either be an origin trial in tight collaboration with partners or a broader field trial (Finch). Origin trials tend to run for longer than field trials, but complex features might also spend several months in a field trial until they are sufficiently mature.

Shipped #

Once a feature is stable, complete and fully spec'd (phase 4 in the WebAssembly Community Group), we can ship it. This enables the feature for all users, even though only a tiny fraction of websites might use it in the beginning. We keep the flag around for 1-2 more milestones to be able to switch the feature off in case of unexpected side-effects.

Clean up #

After 1-2 milestones, we can remove the flag, outdated code and do other clean-up work. For some features, it might be worth keeping the flag around to allow easier debugging, A/B comparisons, etc.

Staging #

When to stage a WebAssembly feature #

The staging of a WebAssembly feature defines the end of its implementation phase. The implementation phase is finished when the following checklist is done:

Note that the stage of the feature proposal in the standardization process does not matter for staging the feature in V8. The proposal should, however, be mostly stable.

How to stage a WebAssembly feature #

Staging Wasm feature flags #

Pre-stage the feature to collect fuzzer coverage for two weeks

After two weeks of fuzzer coverage, we can open the feature to the VRP to encourage external bug reporting.

Staging other feature flags #

Pre-stage the feature to collect fuzzer coverage for two weeks

After two weeks of fuzzer coverage, we can open the feature to the VRP to encourage external bug reporting.

Experimentation (optional) #

There are multiple ways of experimenting with a new feature and gathering information on its stability and viability. The successful completion of the staging phase ensures that our users are not exposed to experimental code that might be harmful to them. However, full stability is not always guaranteed which is why such experimentation must be executed with great care.

Developer trial #

This is the easiest trial to run. It often does not require any changes to the code, but developers are encouraged to try it out. This can happen via the existing command line flag, by adding a Chrome flag that developers can enable via the chrome://flags or by staging a Wasm feature flag which automatically adds it to the existing Experimental WebAssembly option there (chrome://flags#enable-experimental-webassembly-features). Because the latter option might be switched on by users accidentally (e.g. because they tried another feature earlier and forgot to disable it afterwards), the bar for adding features there is higher and one should carefully evaluate if the feature meets the criteria for staging before choosing this option.

Steps to enable a developer trial #

Origin trial #

Features that web developers want to try out with their own users are ideal for an origin trial. This is often a new WebAssembly proposal that requires feedback from real-world scenarios to evaluate its shape and potential readiness for publication. Developers can set up their own trials where they compare different populations that have the feature enabled or disabled. Sometimes, even different versions of an API can be compared against each other.

The feedback can be collected from partners or via Chrome's metric collection. It is usually reported back to the W3C WebAssembly community group and to the Blink API owners.

Steps to launch an origin trial #

To get the experiment going, do the following

To get an extension (up to 3 months/milestones)

Finch trial #

When a feature does not require any changes to user code, Chrome can decide to run a trial without partner engagement. Such trials are ideal for performance improvements or larger architectural changes. Chrome's metric collection can then be used to compare different configurations and their impact on common performance and stability metrics.

Steps to launch a Finch trial #

The longer experimentation time at 10% of stable users is to accommodate for manually detected bugs and reporting which tend to have a longer lead time than signals gathered from metrics and automated testing. At 10% the impact of the experiment is still limited while providing good visibility for partners to identify issues.

Shipping #

When is a WebAssembly feature ready to be shipped? #

How to ship a WebAssembly feature #

Prerequisites #

Ship Wasm feature flags #

Ship other feature flags #

After enabling the feature #

Disabling an already shipped feature #

If there are any issues during early stages, a reverse Finch trial can disable the feature if the flag has not been removed yet and the Finch config is still there. After a prolonged time, this might not be a viable option anymore even if the feature flag is still active, because the alternative code path is no longer tested.