feat: [OSS-22] `Suite`: run batches of scenarios with dynamic binding by henchaves · Pull Request #2283 · Giskard-AI/giskard-oss

henchaves · 2026-02-27T16:58:20Z

Description

refactored Interact class to set outputs as NOT_PROVIDED by default, raising error if still NotProvided during interaction generation
refactored Scenario and Scenario.run to include a target attribute, which will be used to replace on run time the non provided outputs
created a new class Suite (and SuiteResult) to run a batch of scenarios

Related Issue

OSS-22 (available on Linear)

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

linear · 2026-02-27T16:58:23Z

OSS-22 Suite: run suites of scenarios with dynamic binding

…-with-dynamic-binding

gemini-code-assist · 2026-02-27T16:58:52Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility and scalability of scenario execution by introducing the concept of a Suite for running batches of scenarios. It also enables dynamic binding of a System Under Test (SUT) to scenarios and individual interactions, allowing for more adaptable and reusable scenario definitions. These changes streamline the process of testing against different SUTs and provide clearer aggregated results for multiple test runs.

Highlights

New Suite Class for Batch Scenario Execution: Introduced a new Suite class that allows running multiple Scenario instances together. This enables batch execution of scenarios and provides a mechanism for sharing a common System Under Test (SUT) across a collection of scenarios.
Dynamic Target Binding for Scenarios and Interactions: Implemented dynamic binding of a 'target' SUT (System Under Test) at various levels: Scenario, Suite, and during the run() call. This allows Interact components within scenarios to have their outputs dynamically provided by the bound target if they were initially NOT_PROVIDED.
Refactored Interact Class for Flexible Output Handling: The Interact class was refactored to set outputs as NOT_PROVIDED by default. If outputs remain NOT_PROVIDED during interaction generation and no target is bound, a ValueError is now raised, ensuring explicit handling of interaction outputs.
New SuiteResult for Aggregated Metrics: Added a SuiteResult class to aggregate results from multiple scenario executions within a Suite. It provides summary statistics such as passed_count, failed_count, errored_count, skipped_count, and pass_rate for the entire suite.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

libs/giskard-checks/CODEMAP.md
- Updated Scenario description to include target SUT support.
- Added Suite and SuiteResult to the core abstractions documentation.
- Updated workflow section to reflect new Suite usage and result inspection.
libs/giskard-checks/README.md
- Added a new section 'Running Multiple Scenarios with Suite' with code examples.
- Updated 'Concepts' section to define Suite and clarify Scenario's target attribute.
- Introduced a 'Binding a Target SUT' section explaining target precedence at scenario, suite, and run levels.
libs/giskard-checks/src/giskard/checks/init.py
- Imported SuiteResult and Suite to be exposed at the top level.
libs/giskard-checks/src/giskard/checks/core/init.py
- Imported SuiteResult to be exposed from the core module.
libs/giskard-checks/src/giskard/checks/core/interaction/interact.py
- Imported NOT_PROVIDED and NotProvided for explicit output handling.
- Modified outputs field to default to NOT_PROVIDED and accept NotProvided type.
- Updated _validate_injection_mappings to conditionally create output value provider.
- Added validation in generate to raise an error if outputs are NotProvided and no target is bound.
libs/giskard-checks/src/giskard/checks/core/result.py
- Added SuiteResult class for aggregating scenario results, including pass/fail counts and pass rate.
libs/giskard-checks/src/giskard/checks/core/scenario.py
- Imported NOT_PROVIDED and NotProvided.
- Added target attribute to the Scenario model for scenario-level SUT binding.
- Modified run method to accept an optional target parameter, passing it to the runner.
libs/giskard-checks/src/giskard/checks/scenarios/init.py
- Imported Suite to be exposed from the scenarios module.
libs/giskard-checks/src/giskard/checks/scenarios/builder.py
- Imported NOT_PROVIDED and NotProvided.
- Added target attribute to ScenarioBuilder.
- Modified interact method to allow outputs to be NOT_PROVIDED by default.
- Added with_target method for setting scenario-level target.
- Updated build method to pass the target to the Scenario constructor.
- Modified run method to accept and pass an optional target to the built scenario.
- Updated scenario helper function to accept an optional target parameter.
libs/giskard-checks/src/giskard/checks/scenarios/runner.py
- Imported NOT_PROVIDED, NotProvided, and Interact.
- Modified run method to accept an optional target parameter.
- Implemented logic to dynamically bind the target to Interact components if their outputs are NOT_PROVIDED, respecting target precedence.
libs/giskard-checks/tests/core/test_scenario.py
- Added Equals import for new test cases.
- Added fixtures echo_sut and echo_upper_sut for testing dynamic binding.
- Added TestScenarioDynamicBinding class with tests for scenario-level, run-level, and override target binding.
- Added a test case to verify error handling when no target is provided for interactions with NOT_PROVIDED outputs.
libs/giskard-checks/tests/core/test_suite.py
- Added a new test file test_suite.py to cover Suite functionality.
- Added fixtures sut1, sut2, sut3, and identity_sut for testing.
- Included tests for suite target precedence, run target precedence, mixed targets, and SuiteResult aggregation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a Suite class for running batches of scenarios, along with dynamic target binding for Scenarios. The changes are well-structured and include corresponding documentation and tests.

I've found a couple of minor issues:

A documentation error in CODEMAP.md regarding the pass_rate attribute.
A discrepancy between the implementation and documentation of the Suite.append method, which could lead to runtime errors.

My detailed feedback is in the review comments. Overall, this is a great addition to the library.

libs/giskard-checks/src/giskard/checks/scenarios/suite.py

libs/giskard-checks/CODEMAP.md

henchaves · 2026-02-27T17:08:02Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a Suite class for running batches of scenarios, along with dynamic target binding for Scenario and Interact. The changes are well-structured and include corresponding documentation and tests.

My review includes a few suggestions:

Improving the Suite.append method to automatically handle ScenarioBuilder instances, which aligns with the intent described in the documentation.
Updating the README.md example to reflect this improved usability.
Refactoring a piece of logic in the ScenarioRunner to avoid using cast and private method calls, improving maintainability.
Adjusting the pass_rate calculation for empty suites to provide a safer default metric.

gemini-code-assist · 2026-02-27T17:10:07Z

libs/giskard-checks/src/giskard/checks/core/result.py

+        if not self.results:
+            return 1.0


Returning 1.0 for an empty suite can be misleading. An empty test suite should generally not be considered 100% successful. This could lead to CI gates passing incorrectly if they check for a 100% pass rate. Returning 0.0 would be a safer and more conservative default.

Suggested change

if not self.results:

return 1.0

if not self.results:

return 0.0

Well I think Gemini is not wrong here, the usual convention is that empty = failure. It's the same for pytest for example, if no test ran it will exit with non zero status code.

it's a bit contradictory with the scenario logic

@computed_field @property def status(self) -> ScenarioStatus: """The status of the scenario.""" if not self.steps: return ScenarioStatus.PASS

libs/giskard-checks/src/giskard/checks/scenarios/runner.py

libs/giskard-checks/src/giskard/checks/scenarios/suite.py

libs/giskard-checks/README.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

davidberenstein1957

left some comments

libs/giskard-checks/src/giskard/checks/core/result.py

davidberenstein1957 · 2026-03-02T06:17:41Z

libs/giskard-checks/src/giskard/checks/scenarios/suite.py

+            The scenario to add to the suite. For scenario builders, use `scenario.build()` first.
+        """
+
+        self.scenarios.append(scenario)


should we add a check to see if the scenario has be build?

we can also implement auto-build, might be easier for users

libs/giskard-checks/src/giskard/checks/scenarios/suite.py

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

mattbit

Looks ok to me. There's some improvements that we can plan for later:

cleaning up the code for the binding logic
allowing to add a list of scenarios to the Suite
making sure we are tolerant if ScenarioBuilder is passed instead of Scenario

henchaves added 8 commits February 27, 2026 16:14

make Interact.outputs as NOT_PROVIDED by default

ba38edb

add target attribute to Scenario

6d919d8

make possible to pass target when calling Scenarion.run

1fe1d5e

add tests for scenario dynamic binding

8ad82c5

add Suite and SuiteResult

3e37573

add tests for Suite

68f02e3

update README and CODEMAP

2d9dc12

fix warning in Suite class definition

6afa5b6

github-actions bot added the Scope: Checks label Feb 27, 2026

henchaves changed the title ~~feat: [OSS-22] Suite: run suites of scenarios with dynamic binding~~ feat: [OSS-22] Suite: run batches of scenarios with dynamic binding Feb 27, 2026

henchaves self-assigned this Feb 27, 2026

Merge branch 'main' into feature/oss-22-suite-run-suites-of-scenarios…

e785ffc

…-with-dynamic-binding

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

libs/giskard-checks/src/giskard/checks/scenarios/suite.py Outdated Show resolved Hide resolved

libs/giskard-checks/CODEMAP.md Outdated Show resolved Hide resolved

henchaves added 3 commits February 27, 2026 18:04

fix Suite docstrings

f45dd9c

fix CODEMAP

1bc0a05

fix Suite.append docstring

8688b43

henchaves requested a review from mattbit February 27, 2026 17:08

gemini-code-assist bot reviewed Feb 27, 2026

View reviewed changes

henchaves and others added 5 commits February 27, 2026 19:21

fix readme

79d1945

Update libs/giskard-checks/src/giskard/checks/scenarios/runner.py

25bdd53

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add proper typing for target

2062b07

add __rich_console__ to SuiteResult

bd77262

improve SuiteResult.__rich_console__

86425f4

davidberenstein1957 reviewed Mar 2, 2026

View reviewed changes

henchaves and others added 3 commits March 2, 2026 10:46

Update libs/giskard-checks/src/giskard/checks/core/result.py

57ac8d5

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

Update libs/giskard-checks/src/giskard/checks/core/result.py

4044c6b

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

allow passing instances of ScenarioBuilder to Suite.append

9a058b2

henchaves requested a review from davidberenstein1957 March 2, 2026 11:11

fix docstrings of Result classes

9bcae01

mattbit approved these changes Mar 3, 2026

View reviewed changes

mattbit merged commit 4ddcd79 into main Mar 3, 2026
23 checks passed

mattbit deleted the feature/oss-22-suite-run-suites-of-scenarios-with-dynamic-binding branch March 3, 2026 16:12

Uh oh!

Conversation

henchaves commented Feb 27, 2026

Description

Related Issue

Type of Change

Uh oh!

linear bot commented Feb 27, 2026

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

henchaves commented Feb 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

mattbit Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

henchaves Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davidberenstein1957 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davidberenstein1957 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

henchaves Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattbit left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants