-
Notifications
You must be signed in to change notification settings - Fork 229
Handle large (>4GB) SARIF results files #735
Copy link
Copy link
Closed
Labels
Description
When trying to open the interpreted results of a query run that has produced a sarif results file of >4GB, we get an error like this:
[2021-01-28 18:21:22] CSV_IMB_QUERIES: Query,edges#query#ffffffffffffff nodes#query#fffffffff #select#query#ffffffffffffffffffffff,padlockws2-2.ql,26,Success,291.651,407918,291939
Exception during results interpretation: Reading output of interpretation failed: RangeError [ERR_FS_FILE_TOO_LARGE]: File size (6638382197) is greater than possible Buffer: 4294967295 bytes. Will show raw results instead.
Node limits the size of strings and buffers to 4294967295 bytes, even on machines that have enough ram to support more.
The parsed version of the sarif results could fit in memory, even if the string cannot. It's possible that a streaming JSON parser, like JSONStream could work, but I need to explore this library in more detail and make sure it is safe and stable before we can use.
I don't think it is a good idea to roll our own streaming parser if there is a suitable OSS one available since there would be a fair amount of work involved and getting the edge cases to work is tricky.
Suggested breakdown:
- Get an example (from the team) of a large SARIF file
- Add
JSONSchemaas a dependency - Use
JSONSchemawhen reading the SARIF file produced by results interpretation- either do this unconditionally
- or use it as a fallback only when we hit the
RangeError
- Ensure we have tests for both regular and large SARIF files
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Fields
Give feedbackNo fields configured for issues without a type.