Skip to content

feat: [OSS-22] Suite: run batches of scenarios with dynamic binding#2283

Merged
mattbit merged 21 commits intomainfrom
feature/oss-22-suite-run-suites-of-scenarios-with-dynamic-binding
Mar 3, 2026
Merged

feat: [OSS-22] Suite: run batches of scenarios with dynamic binding#2283
mattbit merged 21 commits intomainfrom
feature/oss-22-suite-run-suites-of-scenarios-with-dynamic-binding

Conversation

@henchaves
Copy link
Copy Markdown
Member

Description

  • refactored Interact class to set outputs as NOT_PROVIDED by default, raising error if still NotProvided during interaction generation
  • refactored Scenario and Scenario.run to include a target attribute, which will be used to replace on run time the non provided outputs
  • created a new class Suite (and SuiteResult) to run a batch of scenarios

Related Issue

OSS-22 (available on Linear)

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

@linear
Copy link
Copy Markdown

linear bot commented Feb 27, 2026

@henchaves henchaves changed the title feat: [OSS-22] Suite: run suites of scenarios with dynamic binding feat: [OSS-22] Suite: run batches of scenarios with dynamic binding Feb 27, 2026
@henchaves henchaves self-assigned this Feb 27, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the flexibility and scalability of scenario execution by introducing the concept of a Suite for running batches of scenarios. It also enables dynamic binding of a System Under Test (SUT) to scenarios and individual interactions, allowing for more adaptable and reusable scenario definitions. These changes streamline the process of testing against different SUTs and provide clearer aggregated results for multiple test runs.

Highlights

  • New Suite Class for Batch Scenario Execution: Introduced a new Suite class that allows running multiple Scenario instances together. This enables batch execution of scenarios and provides a mechanism for sharing a common System Under Test (SUT) across a collection of scenarios.
  • Dynamic Target Binding for Scenarios and Interactions: Implemented dynamic binding of a 'target' SUT (System Under Test) at various levels: Scenario, Suite, and during the run() call. This allows Interact components within scenarios to have their outputs dynamically provided by the bound target if they were initially NOT_PROVIDED.
  • Refactored Interact Class for Flexible Output Handling: The Interact class was refactored to set outputs as NOT_PROVIDED by default. If outputs remain NOT_PROVIDED during interaction generation and no target is bound, a ValueError is now raised, ensuring explicit handling of interaction outputs.
  • New SuiteResult for Aggregated Metrics: Added a SuiteResult class to aggregate results from multiple scenario executions within a Suite. It provides summary statistics such as passed_count, failed_count, errored_count, skipped_count, and pass_rate for the entire suite.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • libs/giskard-checks/CODEMAP.md
    • Updated Scenario description to include target SUT support.
    • Added Suite and SuiteResult to the core abstractions documentation.
    • Updated workflow section to reflect new Suite usage and result inspection.
  • libs/giskard-checks/README.md
    • Added a new section 'Running Multiple Scenarios with Suite' with code examples.
    • Updated 'Concepts' section to define Suite and clarify Scenario's target attribute.
    • Introduced a 'Binding a Target SUT' section explaining target precedence at scenario, suite, and run levels.
  • libs/giskard-checks/src/giskard/checks/init.py
    • Imported SuiteResult and Suite to be exposed at the top level.
  • libs/giskard-checks/src/giskard/checks/core/init.py
    • Imported SuiteResult to be exposed from the core module.
  • libs/giskard-checks/src/giskard/checks/core/interaction/interact.py
    • Imported NOT_PROVIDED and NotProvided for explicit output handling.
    • Modified outputs field to default to NOT_PROVIDED and accept NotProvided type.
    • Updated _validate_injection_mappings to conditionally create output value provider.
    • Added validation in generate to raise an error if outputs are NotProvided and no target is bound.
  • libs/giskard-checks/src/giskard/checks/core/result.py
    • Added SuiteResult class for aggregating scenario results, including pass/fail counts and pass rate.
  • libs/giskard-checks/src/giskard/checks/core/scenario.py
    • Imported NOT_PROVIDED and NotProvided.
    • Added target attribute to the Scenario model for scenario-level SUT binding.
    • Modified run method to accept an optional target parameter, passing it to the runner.
  • libs/giskard-checks/src/giskard/checks/scenarios/init.py
    • Imported Suite to be exposed from the scenarios module.
  • libs/giskard-checks/src/giskard/checks/scenarios/builder.py
    • Imported NOT_PROVIDED and NotProvided.
    • Added target attribute to ScenarioBuilder.
    • Modified interact method to allow outputs to be NOT_PROVIDED by default.
    • Added with_target method for setting scenario-level target.
    • Updated build method to pass the target to the Scenario constructor.
    • Modified run method to accept and pass an optional target to the built scenario.
    • Updated scenario helper function to accept an optional target parameter.
  • libs/giskard-checks/src/giskard/checks/scenarios/runner.py
    • Imported NOT_PROVIDED, NotProvided, and Interact.
    • Modified run method to accept an optional target parameter.
    • Implemented logic to dynamically bind the target to Interact components if their outputs are NOT_PROVIDED, respecting target precedence.
  • libs/giskard-checks/tests/core/test_scenario.py
    • Added Equals import for new test cases.
    • Added fixtures echo_sut and echo_upper_sut for testing dynamic binding.
    • Added TestScenarioDynamicBinding class with tests for scenario-level, run-level, and override target binding.
    • Added a test case to verify error handling when no target is provided for interactions with NOT_PROVIDED outputs.
  • libs/giskard-checks/tests/core/test_suite.py
    • Added a new test file test_suite.py to cover Suite functionality.
    • Added fixtures sut1, sut2, sut3, and identity_sut for testing.
    • Included tests for suite target precedence, run target precedence, mixed targets, and SuiteResult aggregation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Suite class for running batches of scenarios, along with dynamic target binding for Scenarios. The changes are well-structured and include corresponding documentation and tests.

I've found a couple of minor issues:

  • A documentation error in CODEMAP.md regarding the pass_rate attribute.
  • A discrepancy between the implementation and documentation of the Suite.append method, which could lead to runtime errors.

My detailed feedback is in the review comments. Overall, this is a great addition to the library.

@henchaves
Copy link
Copy Markdown
Member Author

/gemini review

@henchaves henchaves requested a review from mattbit February 27, 2026 17:08
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Suite class for running batches of scenarios, along with dynamic target binding for Scenario and Interact. The changes are well-structured and include corresponding documentation and tests.

My review includes a few suggestions:

  • Improving the Suite.append method to automatically handle ScenarioBuilder instances, which aligns with the intent described in the documentation.
  • Updating the README.md example to reflect this improved usability.
  • Refactoring a piece of logic in the ScenarioRunner to avoid using cast and private method calls, improving maintainability.
  • Adjusting the pass_rate calculation for empty suites to provide a safer default metric.

Comment on lines +467 to +468
if not self.results:
return 1.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning 1.0 for an empty suite can be misleading. An empty test suite should generally not be considered 100% successful. This could lead to CI gates passing incorrectly if they check for a 100% pass rate. Returning 0.0 would be a safer and more conservative default.

Suggested change
if not self.results:
return 1.0
if not self.results:
return 0.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think Gemini is not wrong here, the usual convention is that empty = failure. It's the same for pytest for example, if no test ran it will exit with non zero status code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit contradictory with the scenario logic

    @computed_field
    @property
    def status(self) -> ScenarioStatus:
        """The status of the scenario."""
        if not self.steps:
            return ScenarioStatus.PASS

Copy link
Copy Markdown
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments

The scenario to add to the suite. For scenario builders, use `scenario.build()` first.
"""

self.scenarios.append(scenario)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a check to see if the scenario has be build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also implement auto-build, might be easier for users

henchaves and others added 3 commits March 2, 2026 10:46
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
Copy link
Copy Markdown
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me. There's some improvements that we can plan for later:

  • cleaning up the code for the binding logic
  • allowing to add a list of scenarios to the Suite
  • making sure we are tolerant if ScenarioBuilder is passed instead of Scenario

@mattbit mattbit merged commit 4ddcd79 into main Mar 3, 2026
23 checks passed
@mattbit mattbit deleted the feature/oss-22-suite-run-suites-of-scenarios-with-dynamic-binding branch March 3, 2026 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants