With the JSON ⊂ ECMAScript proposal, JSON becomes a syntactic subset of ECMAScript. If you’re surprised that this wasn’t already the case, you’re not alone!
The old ES2018 behavior #
In ES2018, ECMAScript string literals couldn’t contain unescaped U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR characters, because they are considered to be line terminators even in that context:
const LS = '
';
const PS = eval('"\u2029"');
This is problematic because JSON strings can contain these characters. As a result, developers had to implement specialized post-processing logic when embedding valid JSON into ECMAScript programs to handle these characters. Without such logic, the code would have subtle bugs, or even security issues!
The new behavior #
In ES2019, string literals can now contain raw U+2028 and U+2029 characters, removing the confusing mismatch between ECMAScript and JSON.
const LS = '
';
const PS = eval('"\u2029"');
This small improvement greatly simplifies the mental model for developers (one less edge case to remember!), and reduces the need for specialized post-processing logic when embedding valid JSON into ECMAScript programs.
Embedding JSON into JavaScript programs #
As a result of this proposal, JSON.stringify can now be used to generate valid ECMAScript string literals, object literals, and array literals. And because of the separate well-formed JSON.stringify proposal, these literals can safely be represented in UTF-8 and other encodings (which is helpful if you’re trying to write them to a file on disk). This is super useful for metaprogramming use cases, like dynamically creating JavaScript source code and writing it to disk.
Here’s an example of creating a valid JavaScript program embedding a given data object, taking advantage of the JSON grammar now being a subset of ECMAScript:
const data = {
LineTerminators: '\n\r
',
};
const jsObjectLiteral = JSON.stringify(data);
const program = `const data = ${ jsObjectLiteral };`;
saveToDisk(filePath, program);
The above script produces the following code, which evaluates to an equivalent object:
const data = {"LineTerminators":"\n\r
"};
Embedding JSON into JavaScript programs with JSON.parse #
As explained in the cost of JSON, instead of inlining the data as a JavaScript object literal, like so:
const data = { foo: 42, bar: 1337 };
…the data can be represented in JSON-stringified form, and then JSON-parsed at runtime, for improved performance in the case of large objects (10 kB+):
const data = JSON.parse('{"foo":42,"bar":1337}');
Here’s an example implementation:
const data = {
LineTerminators: '\n\r
',
};
const json = JSON.stringify(data);
const jsStringLiteral = JSON.stringify(json);
const program = `const data = JSON.parse(${ jsStringLiteral });`;
saveToDisk(filePath, program);
The above script produces the following code, which evaluates to an equivalent object:
const data = JSON.parse("{\"LineTerminators\":\"\\n\\r
\"}");
Google’s benchmark comparing JSON.parse with JavaScript object literals leverages this technique in its build step. The Chrome DevTools “copy as JS” functionality has been simplified significantly by adopting a similar technique.
A note on security #
JSON ⊂ ECMAScript reduces the mismatch between JSON and ECMAScript in the case of string literals specifically. Since string literals can occur within other JSON-supported data structures such as objects and arrays, it also addresses those cases, as the above code examples show.
However, U+2028 and U+2029 are still treated as line terminator characters in other parts of the ECMAScript grammar. This means there are still cases where it’s unsafe to inject JSON into JavaScript programs. Consider this example, where a server injects some user-supplied content into an HTML response after running it through JSON.stringify():
<script>
</script>
Note that the result of JSON.stringify is injected into a single-line comment within the script.
When used like in the above example, JSON.stringify() is guaranteed to return a single line. The problem is that what constitutes a “single line” differs between JSON and ECMAScript. If ua contains an unescaped U+2028 or U+2029 character, we break out of the single-line comment and execute the rest of ua as JavaScript source code:
<script>
</script>
<script>
alert('XSS');
</script>
Note: In the above example, the raw unescaped U+2028 character is represented as <U+2028> to make it easier to follow.
JSON ⊂ ECMAScript doesn’t help here, since it only impacts string literals — and in this case, JSON.stringify’s output is injected in a position where it does not produce a JavaScript string literal directly.
Unless special post-processing for those two characters is introduced, the above code snippet presents a cross-site scripting vulnerability (XSS)!
Note: It’s crucially important to post-process user-controlled input to escape any special character sequences, depending on the context. In this particular case, we’re injecting into a <script> tag, so we must (also) escape </script, <script, and <!--.
JSON ⊂ ECMAScript support #