Skip to content

fix(cli): add Copilot variant to CliProviderType enum#3

Open
htekdev wants to merge 20 commits intomainfrom
fix-copilot
Open

fix(cli): add Copilot variant to CliProviderType enum#3
htekdev wants to merge 20 commits intomainfrom
fix-copilot

Conversation

@htekdev
Copy link
Copy Markdown
Owner

@htekdev htekdev commented Mar 31, 2026

Summary

Adds the missing Copilot variant to the CliProviderType enum and its as_str() match arm, completing Copilot provider support in the CLI layer.

Related Issue

Follow-up to the Copilot provider addition (commit eff88b7). The provider backend (registry, discovery, policy) was fully wired, but the CLI enum was missing Copilot, causing:

  • provider create --type copilot to be rejected by the CLI argument parser
  • sandbox create -- copilot to fail with "missing provider type copilot"

Changes

  • Added Copilot variant to CliProviderType enum in crates/openshell-cli/src/main.rs
  • Added Self::Copilot => "copilot" match arm in CliProviderType::as_str()

Testing

  • cargo build --workspace — passes
  • cargo test -p openshell-providers — all 13 tests pass
  • Manual: sandbox create -- copilot successfully infers copilot provider type and proceeds to sandbox creation

Checklist

  • Conventional commit format
  • Build passes
  • Tests pass

latenighthackathon and others added 20 commits March 29, 2026 16:24
…muggling (CWE-444) (NVIDIA#663)

* fix(l7): reject duplicate Content-Length headers to prevent request smuggling

Both parse_body_length() in rest.rs and try_parse_http_request() in
inference.rs silently accepted multiple Content-Length headers,
overwriting with the last value seen. Per RFC 7230 Section 3.3.3,
a message with multiple Content-Length headers with differing values
must be rejected to prevent HTTP request smuggling (CWE-444).

An attacker could send conflicting Content-Length values causing the
proxy and downstream server to disagree on message boundaries.

Fix:
- rest.rs: detect duplicate CL headers with differing values and
  return an error before forwarding
- inference.rs: add ParseResult::Invalid variant; detect duplicate
  CL headers and return Invalid with a descriptive reason
- proxy.rs: handle ParseResult::Invalid by sending HTTP 400 and
  denying the connection

Closes NVIDIA#637

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

* fix(l7): address review feedback on Content-Length smuggling defense

- inference.rs: reject unparseable Content-Length values instead of
  silently defaulting to 0 via unwrap_or(0)
- rest.rs: reject unparseable Content-Length values so a valid+invalid
  duplicate pair cannot bypass the differing-values check
- rest.rs: fix Transfer-Encoding substring match (.contains("chunked")
  → split/trim exact match) to align with inference.rs and prevent
  false positives on values like "chunkedx"
- proxy.rs: log parsing details server-side via tracing::warn and
  return generic "Bad Request" body instead of leaking internal
  parsing reasons to sandboxed code
- Add tests for all new rejection paths in inference.rs and rest.rs

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

* style(l7): apply cargo fmt formatting

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

---------

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
Co-authored-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
* refactor(l7): export evaluate_l7_request for cross-module use

Make evaluate_l7_request() public so the forward proxy path can
evaluate individual requests against L7 policy without going
through the full relay_with_inspection() loop.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

* fix(proxy): add L7 inspection to forward proxy path

The forward proxy previously rejected all requests to endpoints with
L7 rules (blanket 403), forcing clients through the CONNECT tunnel.
This meant policies like read-only (allow GET, block POST) had no
effect on plain http:// requests through the forward proxy.

Replace the blanket rejection with actual L7 evaluation:
- Query L7 config for the endpoint (same as before)
- Clone the OPA engine and evaluate the request method/path
- Allow if L7 policy permits, deny with 403 if enforcement is enforce
- Audit mode: log but allow (matching CONNECT path behavior)
- Fail-closed: deny on evaluation errors

The forward proxy uses Connection: close (one request per connection),
so a single evaluation suffices — no relay loop needed.

Update e2e tests to validate the new behavior:
- GET /allowed → 200 (L7 policy allows)
- POST /allowed → 403 (L7 policy denies, enforcement: enforce)

Update security-policy.md to reflect the new forward proxy L7 behavior.

Closes NVIDIA#643

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

* style(proxy): apply cargo fmt formatting

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

---------

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
Co-authored-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
Fork PRs receive a read-only GITHUB_TOKEN that cannot push to gh-pages,
causing the deploy step to fail with a 403. Skip the deploy for fork PRs
while still running the docs build to validate changes compile cleanly.
…VIDIA#686)

The base sandbox image installs Python via uv at
/sandbox/.uv/python/*/bin/python*, but the proxy resolves binary
identity via /proc/PID/exe (the real path, not the symlink). The test
policy only listed /usr/bin/python* and /usr/local/bin/python*, so OPA
denied the connection at L4 before L7 evaluation could run.

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…ser (CWE-444) (NVIDIA#671)

The CL/TE desynchronisation guard added in NVIDIA#663 for the REST path was
not applied to the inference request parser.  A request containing both
Content-Length and Transfer-Encoding headers could be interpreted
differently by the proxy and the upstream server, enabling HTTP request
smuggling (CWE-444, RFC 7230 Section 3.3.3).

Add the same rejection check and tests mirroring the REST parser
coverage, including TE substring validation.

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
Co-authored-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
…ntire ruleset (NVIDIA#677)

* fix(sandbox): handle per-path Landlock errors instead of abandoning entire ruleset

A single missing path (e.g., /app in containers without that directory)
caused PathFd::new() to propagate an error out of the entire Landlock
setup closure. Under BestEffort mode, this silently disabled all
filesystem restrictions for the sandbox.

Changes:
- Extract try_open_path() and classify_path_error() helpers that handle
  PathFd failures per-path instead of per-ruleset
- BestEffort mode: skip inaccessible paths with a warning, apply
  remaining rules
- HardRequirement mode: fail immediately on any inaccessible path
- Add zero-rule safety check to prevent applying an empty ruleset that
  would block all filesystem access
- Pre-filter system-injected baseline paths (e.g., /app) in enrichment
  functions so missing paths never reach Landlock
- Add unit tests for try_open_path, classify_path_error, and error
  classification for ENOENT, EACCES, ELOOP, ENAMETOOLONG, ENOTDIR
- Update user-facing docs and architecture docs with Landlock behavior
  tables, baseline path filtering, and compatibility mode semantics
- Fix stale ABI::V1 references in docs (code uses ABI::V2)

Closes NVIDIA#664

* fix(sandbox): use debug log for NotFound in Landlock best-effort mode

NotFound errors for stale baseline paths (e.g. /app persisted in the
server-stored policy but absent in this container) are expected in
best-effort mode. Downgrade from warn! to debug! so the message does
not leak into SSH exec stdout (the pre_exec hook inherits the tracing
subscriber whose writer targets fd 1).

Genuine errors (permission denied, symlink loops, etc.) remain at warn!
for operator visibility.

Also move custom_image e2e marker from /opt to /etc (a Landlock baseline
read-only path) since the security fix now properly enforces filesystem
restrictions.

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Missed "--from openclaw" input parameter.

Without this input sandbox container didn't install "openclaw":
'Stop with: openshell forward stop 18789 eager-dragonfly
'/bin/bash: line 1: openclaw-start: command not found
* feat(sandbox): add L7 query parameter matchers

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>

* fix(sandbox): decode + as space in query params and validate glob syntax

Three improvements from PR NVIDIA#617 review:

1. Decode + as space in query string values per the
   application/x-www-form-urlencoded convention. This matches Python's
   urllib.parse, JavaScript's URLSearchParams, Go's url.ParseQuery, and
   most HTTP frameworks. Literal + should be sent as %2B.

2. Add glob pattern syntax validation (warnings) for query matchers.
   Checks for unclosed brackets and braces in glob/any patterns. These
   are warnings (not errors) because OPA's glob.match is forgiving,
   but they surface likely typos during policy loading.

3. Add missing test cases: empty query values, keys without values,
   unicode after percent-decoding, empty query strings, and literal +
   via %2B encoding.

* fix(sandbox): add missing query_params field in forward proxy L7 request info

* style(sandbox): fix formatting in proxy L7 query param parsing

---------

Signed-off-by: John Myers <9696606+johntmyers@users.noreply.github.com>
Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…ution (NVIDIA#555)

* perf(sandbox): streaming SHA256, spawn_blocking for identity resolution

Key changes:
- Replace full file read + SHA256 with streaming 64KB-buffered hash
  (saves 124MB allocation for node binary)
- Wrap evaluate_opa_tcp in spawn_blocking to prevent blocking tokio
  runtime during heavy /proc I/O and SHA256 computation
- Add file-based perf logging for profiling proxy latency phases

Profiling data (node binary, 124MB):
- Cold TOFU: ~890ms (read+hash), warm: 0ms (cache hit)
- evaluate_opa_tcp: cold=1002ms, warm=11ms
- OPA evaluation: 1ms
- DNS+TCP connect: 166-437ms

Made-with: Cursor
Signed-off-by: Rafael Koike <koike.rafael@gmail.com>

* refactor(sandbox): replace perf_log with tracing::debug

Replace the custom file-based perf_log() helper with standard
tracing::debug!() macros as requested in PR review. This removes
the custom log file writes to /var/log/openshell-perf.log and
routes all performance timing through the tracing framework at
DEBUG level, consistent with the rest of the codebase.

Made-with: Cursor

* refactor(sandbox): reduce tracing to 6 key diagnostic logs

Address PR review feedback:

1. Remove ~20 inner-phase timing logs, keeping only the 6 that tell
   the full diagnostic story:
   - evaluate_opa_tcp TOTAL (proxy.rs)
   - dns_resolve_and_tcp_connect (proxy.rs)
   - file_sha256 (procfs.rs)
   - verify_or_cache CACHE HIT / CACHE MISS / TOTAL cold (identity.rs)

2. Restore intent-describing comments that were replaced by timing logs:
   - "TOFU verify the immediate binary" (proxy.rs)
   - "Walk the process tree upward to collect ancestor binaries" (proxy.rs)
   - "Collect cmdline paths for script-based binary detection." (proxy.rs)
   - "First: scan descendants of the entrypoint process" (procfs.rs)
   - "Fallback: scan all of /proc in case the process isn't in the tree" (procfs.rs)
   - "Skip PIDs we already checked" (procfs.rs)

3. Preserve file path in file_sha256 read errors instead of losing
   context via into_diagnostic().

4. Tests: 293 passed, 1 pre-existing failure (drop_privileges), 1 ignored.

Made-with: Cursor

* style(sandbox): apply rustfmt formatting to debug macros

---------

Signed-off-by: Rafael Koike <koike.rafael@gmail.com>
Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
* feat(inference): add timeout

* feat(inference): fix dynamic timeout change

* feat(inference): update docs

* feat(inference): fix formatting
…VIDIA#687)

Replace flat pty_master/input_sender/pty_request fields in SshHandler
with a HashMap<ChannelId, ChannelState> so each channel tracks its own
PTY resources independently. This fixes window_change_request resizing
the wrong PTY when multiple channels are open simultaneously.

Also fixes ioctl UB in set_winsize (pass &winsize not winsize by value)
and adds warn! logging for unknown channels across all handlers.

Resolves NVIDIA#543
…DIA#495)

* feat(bootstrap): switch GPU injection to CDI where supported

Use an explicit CDI device request (driver="cdi", device_ids=["nvidia.com/gpu=all"])
when the Docker daemon reports CDI spec directories via GET /info (SystemInfo.CDISpecDirs).
This makes device injection declarative and decouples spec generation from consumption.

When the daemon reports no CDI spec directories, fall back to the legacy NVIDIA device
request (driver="nvidia", count=-1) which relies on the NVIDIA Container Runtime hook.
Failure modes for both paths are equivalent: a missing or stale NVIDIA Container Toolkit
installation will cause container start to fail.

CDI spec generation is out of scope for this change; specs are expected to be
pre-generated out-of-band, for example by the NVIDIA Container Toolkit.

---------

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Co-authored-by: Piotr Mlocek <pmlocek@nvidia.com>
* feat(sandbox): switch device plugin to CDI injection mode

Configure the NVIDIA device plugin to use deviceListStrategy=cdi-cri so
that GPU devices are injected via direct CDI device requests in the CRI.
Sandbox pods now only require the nvidia.com/gpu resource request —
runtimeClassName is no longer set on GPU pods.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
The provider backend (registry, discovery, policy) already supported
copilot, but the CLI CliProviderType enum was missing the Copilot
variant. This caused 'provider create --type copilot' to be rejected
and 'sandbox create -- copilot' to fail at the CLI layer.
…sic auth encoding

Add two credential injection capabilities to the L7 proxy's SecretResolver:

1. Query parameter rewriting: resolve placeholder tokens in URL query
   parameter values (e.g. ?api_key=openshell:resolve:env:KEY) with proper
   percent-encoding of the resolved secret.

2. Basic Authorization header encoding: decode base64 Basic auth tokens,
   resolve placeholder tokens in the decoded username:password string,
   and re-encode before forwarding upstream.

Both features operate within the existing rewrite_http_header_block flow
and require no changes to the network policy file spec or proto schema.

Closes NVIDIA#630
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants