Skip to content

Privacy: ToS and Privacy Policy contradict each other on model training for free tier users #145

@dxvidparham

Description

@dxvidparham

Summary

There is a direct contradiction between the Terms of Service and the Privacy Policy (both last updated April 13, 2026) regarding whether free tier user data is used for model training. This creates ambiguity that users — especially developers indexing proprietary codebases — cannot resolve without clarification from the team.


The Contradiction

Terms & Conditions §3.1.1 states:

"Under the Free Tier, the Company collects all Usage Data, including but not limited to queries and any other input provided by the User. This data is stored in its native form and may be used for a variety of purposes, including, without limitation, model training, analytics, service improvement, and research activities."

Privacy Policy §2.3 (Free Tier) states:

"Such data is used for service improvement, analytics, and research. We do not use any data from Free Tier users for model training."

One of these statements is inaccurate. Users have no way to determine which reflects actual practice.


Why This Matters

mgrep watch uploads the full contents of a user's codebase to Mixedbread's cloud storage. This can include:

  • Proprietary source code
  • Internal business logic
  • Configuration files with sensitive structure

Under the free tier, this data is retained indefinitely with no guaranteed deletion on request (ToS §3.1.2, Privacy Policy §5). If that data is also used for model training, users indexing private or confidential repositories are unknowingly consenting to training a third-party commercial model on their intellectual property.

This is a meaningful risk that is currently undisclosed at the point of installation or first use (mgrep login, mgrep watch). The README's caution block mentions background sync but makes no mention of data retention or potential training use.


Requests

  1. Resolve the contradiction — update either the ToS or Privacy Policy so they agree on whether free tier data is used for model training.
  2. Surface the policy at onboarding — consider printing a one-line notice during mgrep login or mgrep watch that links to the relevant policy section, so users are informed before any data is uploaded.
  3. Clarify deletion — either provide a reliable deletion mechanism or clearly state that none exists, so users can make an informed choice before indexing sensitive repos.

Additional Note

ToS §3.2.3 states that if a user upgrades from free to paid, previously collected free-tier data remains governed by free-tier rules. This means users who upgrade after indexing do not get the stronger paid-tier protections retroactively. This should be called out explicitly in the upgrade flow.


I'm genuinely interested in this tool and want to continue using it. Resolving this ambiguity would make it much easier to recommend mgrep to teams working with sensitive codebases. Thanks for considering this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions