improvement(models): tighten model metadata and crawl discovery by waleedlatif1 · Pull Request #3942 · simstudioai/sim

waleedlatif1 · 2026-04-04T18:05:58Z

Summary

improve model metadata around published output limits and structured output support
add explicit crawl discovery coverage with sitemap updates and an llms.txt route

Type of Change

Improvement

Testing

Tested manually

bun run lint
bun run test
bun run type-check currently fails in existing auth/better-auth code unrelated to this branch

Checklist

Code follows project style guidelines
Self-reviewed my changes
Tests added/updated and passing
No new warnings introduced
I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Made with Cursor

Made-with: Cursor

gitguardian · 2026-04-04T18:06:04Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
29606901	Triggered	Generic High Entropy Secret	`35c939e`	apps/sim/providers/utils.test.ts	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

cursor · 2026-04-04T18:06:05Z

PR Summary

Medium Risk
Adjusts model capability/limit derivation and expands published maxOutputTokens data, which can affect displayed limits and any runtime logic that depends on output-token defaults. Also changes crawl discovery outputs (sitemap/llms.txt), which is low-risk but externally visible.

Overview
Improves the public models catalog metadata by making bestFor optional/conditional, tightening “structured outputs” labeling (treats it as supported for non-deepResearch models and preserves native wording where applicable), and changing max-output display/copy to only show published limits (otherwise “Not published”).

Expands provider model definitions with many explicit maxOutputTokens values (and updates some existing ones, e.g. Claude Sonnet 4.6), refactors getMaxOutputTokensForModel to prefer exact ID matches before suffix/prefix matching, and adds new unit tests for catalog capability facts and max-output-token lookups.

Enhances crawl discovery by updating sitemap.ts to include hub pages (including /partners) and set more meaningful lastModified values for provider pages, and revises the llms.txt route content (and makes it synchronous) to point crawlers at preferred URLs and canonical docs links.

^{Reviewed by Cursor Bugbot for commit 569031c. Configure here.}

vercel · 2026-04-04T18:06:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Apr 4, 2026 6:45pm

greptile-apps · 2026-04-04T18:10:17Z

Greptile Summary

This PR tightens model metadata across the catalog (publishing explicit maxOutputTokens for GPT-4o, GPT-5 variants, Gemini 2.5 Pro, Claude models, and others), fixes the overly-broad "Best for production workflows" copy by scoping it back to nativeStructuredOutputs only, and adds SEO/crawler discovery via a new /llms.txt route and expanded sitemap coverage for provider and model pages.

Key changes:

getEffectiveMaxOutputTokens now returns null (shown as "Not published") instead of silently falling back to a generic label when no limit is declared
buildBestForLine and computeModelRelevanceScore both scoped back to nativeStructuredOutputs only — directly addressing the previous review concern about near-universal "Best for typed outputs" copy
getMaxOutputTokensForModel refactored to test exact model ID matches before prefix matches, eliminating a subtle ambiguity in iteration order
New /llms.txt route returns a text/markdown discovery file with a 24-hour cache policy
sitemap.ts extended with per-provider and per-model catalog pages for complete crawl coverage
Test coverage added in both providers/utils.test.ts and a new models/utils.test.ts to pin the newly-published token limits and capability facts

Confidence Score: 5/5

Safe to merge — all changes are additive metadata improvements with no runtime risk.

The only finding is a P2 style issue (relative import in a new test file). All substantive logic changes are well-tested, and the prior P1 concern about overbroad structured-output copy has been fully addressed. No data loss, security, or correctness risk present.

apps/sim/app/(landing)/models/utils.test.ts — single relative import to fix

Important Files Changed

Filename	Overview
apps/sim/app/(landing)/models/utils.ts	Adds `getEffectiveMaxOutputTokens` (returns null when not published), tightens `buildBestForLine` to use `nativeStructuredOutputs` instead of the overly-broad prompt-injection check, and keeps `computeModelRelevanceScore` scoped to native structured outputs — all directly addressing the prior review concern.
apps/sim/app/(landing)/models/utils.test.ts	New test file covering `getEffectiveMaxOutputTokens`, `buildModelCapabilityFacts`, and `bestFor` heuristics. One relative import (`./utils`) violates the project's absolute-import rule.
apps/sim/providers/models.ts	Adds `maxOutputTokens` to several models (GPT-4o, GPT-5 variants, Gemini 2.5 Pro, Claude models, etc.) and refactors `getMaxOutputTokensForModel` to prefer exact matches before falling back to prefix-matching, eliminating a potential wrong-model-matched edge case.
apps/sim/app/(landing)/models/[provider]/[model]/page.tsx	Uses `getEffectiveMaxOutputTokens` for the Max output stat card, guards `bestFor` with conditional rendering, and removes stale `compact` props — clean UI tidying.
apps/sim/app/llms.txt/route.ts	New route serving an `llms.txt` discovery file with correct `text/markdown` content-type and a 24-hour cache policy. Content is factual and references valid Sim URLs.
apps/sim/app/sitemap.ts	Adds per-provider and per-model pages (sourced from the catalog) to the sitemap. `Math.max` spread is safe because `MODEL_PROVIDERS_WITH_CATALOGS` only contains providers with at least one model.
apps/sim/providers/utils.test.ts	Extends existing test suite with explicit assertions for newly-populated `maxOutputTokens` values across several key models; also removes the now-incorrect `gpt-4o → 4096` expectation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Model capabilities] --> B{deepResearch?}
    B -- yes --> C[bestFor: null\nstructuredOutputs: Not supported]
    B -- no --> D{reasoningEffort\nor thinking?}
    D -- yes --> E[bestFor: reasoning-heavy tasks]
    D -- no --> F{contextWindow\n≥ 1M tokens?}
    F -- yes --> G[bestFor: long-context workflows]
    F -- no --> H{nativeStructuredOutputs?}
    H -- yes --> I[bestFor: typed outputs\n+4 relevance score]
    H -- no --> J{cheap pricing?}
    J -- yes --> K[bestFor: cost-sensitive workloads]
    J -- no --> L[bestFor: null]

    A --> M{maxOutputTokens set?}
    M -- yes --> N[Display published limit\ngetEffectiveMaxOutputTokens]
    M -- no --> O[Display: Not published]

_{Reviews (3): Last reviewed commit: "models" | Re-trigger Greptile}

apps/sim/app/(landing)/models/utils.ts

Made-with: Cursor

waleedlatif1 · 2026-04-04T18:30:30Z

@greptile

waleedlatif1 · 2026-04-04T18:30:34Z

@cursor review

apps/sim/app/(landing)/models/utils.ts

Made-with: Cursor

waleedlatif1 · 2026-04-04T18:39:42Z

Restored the section, but only as an optional high-signal field. It now shows up for clearly differentiated models (for example deep research, reasoning-heavy, long-context, native structured outputs, or unusually cheap models) and stays hidden for generic models instead of falling back to repetitive copy.

waleedlatif1 · 2026-04-04T18:46:41Z

@cursor review

waleedlatif1 · 2026-04-04T18:46:46Z

@greptile

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 569031c. Configure here.}

* improvement(models): tighten model metadata and crawl discovery Made-with: Cursor * revert hardcoded FF * fix(models): narrow structured output ranking signal Made-with: Cursor * fix(models): remove generic best-for copy Made-with: Cursor * fix(models): restore best-for with stricter criteria Made-with: Cursor * fix * models

improvement(models): tighten model metadata and crawl discovery

35c939e

Made-with: Cursor

revert hardcoded FF

e935e29

vercel bot temporarily deployed to Preview April 4, 2026 18:07 Inactive

greptile-apps bot reviewed Apr 4, 2026

View reviewed changes

apps/sim/app/(landing)/models/utils.ts Outdated Show resolved Hide resolved

fix(models): narrow structured output ranking signal

ef14ab8

Made-with: Cursor

vercel bot temporarily deployed to Preview April 4, 2026 18:29 Inactive

fix(models): remove generic best-for copy

8d1e89b

Made-with: Cursor

vercel bot temporarily deployed to Preview April 4, 2026 18:29 Inactive

cursor bot reviewed Apr 4, 2026

View reviewed changes

apps/sim/app/(landing)/models/utils.ts Show resolved Hide resolved

fix(models): restore best-for with stricter criteria

8c28629

Made-with: Cursor

vercel bot temporarily deployed to Preview April 4, 2026 18:39 Inactive

fix

ca3374b

vercel bot temporarily deployed to Preview April 4, 2026 18:44 Inactive

models

569031c

vercel bot temporarily deployed to Preview April 4, 2026 18:45 Inactive

waleedlatif1 closed this Apr 4, 2026

waleedlatif1 reopened this Apr 4, 2026

cursor bot reviewed Apr 4, 2026

View reviewed changes

waleedlatif1 merged commit 4a9439e into staging Apr 4, 2026
15 of 20 checks passed

waleedlatif1 deleted the feat/models-page-fix branch April 4, 2026 18:53

waleedlatif1 mentioned this pull request Apr 4, 2026

v0.6.24: copilot feedback wiring, captcha fixes #3944

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvement(models): tighten model metadata and crawl discovery#3942

improvement(models): tighten model metadata and crawl discovery#3942
waleedlatif1 merged 7 commits intostagingfrom
feat/models-page-fix

waleedlatif1 commented Apr 4, 2026

Uh oh!

gitguardian bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

cursor bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

vercel bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waleedlatif1 commented Apr 4, 2026

Summary

Type of Change

Testing

Checklist

Uh oh!

gitguardian bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Uh oh!

cursor bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

vercel bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

waleedlatif1 commented Apr 4, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gitguardian bot commented Apr 4, 2026 •

edited

Loading

cursor bot commented Apr 4, 2026 •

edited

Loading

vercel bot commented Apr 4, 2026 •

edited

Loading

greptile-apps bot commented Apr 4, 2026 •

edited

Loading