Skip to content

Handle large (>4GB) SARIF results files #735

@aeisenberg

Description

@aeisenberg

When trying to open the interpreted results of a query run that has produced a sarif results file of >4GB, we get an error like this:

[2021-01-28 18:21:22] CSV_IMB_QUERIES: Query,edges#query#ffffffffffffff nodes#query#fffffffff #select#query#ffffffffffffffffffffff,padlockws2-2.ql,26,Success,291.651,407918,291939
Exception during results interpretation: Reading output of interpretation failed: RangeError [ERR_FS_FILE_TOO_LARGE]: File size (6638382197) is greater than possible Buffer: 4294967295 bytes. Will show raw results instead.

Node limits the size of strings and buffers to 4294967295 bytes, even on machines that have enough ram to support more.

The parsed version of the sarif results could fit in memory, even if the string cannot. It's possible that a streaming JSON parser, like JSONStream could work, but I need to explore this library in more detail and make sure it is safe and stable before we can use.

I don't think it is a good idea to roll our own streaming parser if there is a suitable OSS one available since there would be a fair amount of work involved and getting the edge cases to work is tricky.


Suggested breakdown:

  • Get an example (from the team) of a large SARIF file
  • Add JSONSchema as a dependency
  • Use JSONSchema when reading the SARIF file produced by results interpretation
    • either do this unconditionally
    • or use it as a fallback only when we hit the RangeError
  • Ensure we have tests for both regular and large SARIF files

Metadata

Metadata

Labels

VSCodebugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions