GitHub Enables Copilot Data Collection for AI Training by Default With Opt-Out Setting

GitHub has announced that interactions with GitHub Copilot will be used to train its AI models, with all personal account users enrolled by default. This change applies to Copilot Free, Copilot Pro, and Copilot Pro+ accounts.

Users can choose to disable data collection through their account settings. According to GitHub, the data collected includes input and output data, code snippets, comments, documentation, file names, and repository structure.

The company states that the purpose is to improve the performance of their models for all Copilot users. GitHub Copilot is available across Visual Studio Code, the GitHub website, the Copilot CLI, and other GitHub services.

Who Is Affected By GitHub Copilot Training Data Changes

The automatic enrollment applies to personal Copilot accounts: Free, Pro, and Pro+. Copilot Business and Copilot Enterprise accounts are not subject to the same default data collection, according to the announcement.

Users who have never used any Copilot feature are not affected. Users who have used code completion in Visual Studio Code, asked Copilot questions on the GitHub website, or interacted with any related AI feature may have interactions and code snippets included in training data going forward.

How To Opt Out Of GitHub Copilot Training Data Use

GitHub offers an option to turn off data collection in your account settings. You can find this in the Copilot features page, located within the Privacy section of your GitHub account settings. To do this:

Log in to your GitHub account and go to your account settings.
From there, navigate to the Copilot features page.
Find the option labeled "Allow GitHub to use my data for AI model training" under Privacy.
Set the dropdown to "Disabled" to turn off data collection.

If you have multiple GitHub accounts, you'll need to repeat this process for each one, as the setting applies individually to each account.

Why GitHub Says It Is Using Copilot Data For Training

GitHub announced that its initial Copilot models were built using publicly available data and carefully selected code samples. The company reported performance improvements after adding data from Microsoft employees and now plans to expand this approach to a wider user base.

GitHub notes that this practice aligns with "established industry practices" and says the updates will lead to more accurate code suggestions, better detection of potential bugs, and a deeper understanding of development workflows. These claims are made by the company itself.

Scope Of Copilot Data Collection And What GitHub Hasn’t Clarified

The announcement doesn’t specify a minimum interaction threshold or explain how data is anonymized before it's used for training. GitHub hasn’t provided details on what technical controls are in place to prevent sensitive code or proprietary logic from being used in model training, aside from the opt-out option.

Users on Copilot Business and Copilot Enterprise plans are excluded from the default data collection, but GitHub hasn’t elaborated on this in the announcement. The company also hasn’t stated when data collection began or whether interactions before the announcement are included.

Thank you for being a Ghacks reader. The post GitHub Enables Copilot Data Collection for AI Training by Default With Opt-Out Setting appeared first on gHacks.