Slack uses customer data to train its non-generative AI models
We missed this earlier: On May 17, a user posted on the developer community forum Hacker News that the business communication platform Slack is using customer data to train its services, according to a report by TechCrunch. The user said that when they opted out of the same, they were informed that Slack has “platform-level machine learning models for things like channel and emoji recommendations and search results. We do not build or train these models in such a way that they could learn, memorize, or be able to reproduce some part of customer data.”
Based on Slack’s privacy policy, the company uses customer data to develop non-generative artificial intelligence/ machine learning (AI/ML) models for features such as emojis and channel recommendations. This customer data includes messages, content and files shared on Slack and other information (including usage information). The policy states that if users want their data to be excluded from Slack’s global models, they can opt out.
Key causes for concern with Slack’s data usage:
The silent opt-in: The main cause for concern for most users on the community forum (and later on X once the news became bigger) was that users automatically opted-in to share their data with Slack. They never explicitly consented to their data being used by the company. As such, Slack’s silent opt-in deprived users of agency, transparency, and control over how their potentially sensitive data is being utilized by Slack.
How does the opt-out work? It is unclear whether opting out would mean all the information you shared across the platform before the opt-out would also be excluded from model training data. One of the concerns people have previously raised about the leak of sensitive data to AI models is that once the information is a part of the dataset, there’s no way of getting it deleted. If this is the case with Slack as well, then users have no recourse for the data that Slack’s models have already used.
Use of sensitive communication in model training: Slack says that users’ data will not leak outside their workspace and that it does not build its models in such a way that they could learn, memorize, or be able to reproduce some part of customer data. Despite that, the training process itself could lead to the exposure of sensitive information to those involved in training the models. There are also future risks of Slack’s models being misused or hacked which raise privacy concerns for Slack’s users.
[Note: We have reached out to Slack and its parent company Salesforce for comments on the situation. The story will be updated to reflect their response.]
How does Slack AI factor into all of this?
Based on the information shared with the user, this data isn’t used to train the large language models (LLM) used by Slack AI. Slack AI was rolled out in February this year. This AI tool can create summaries of long conversation threads and give answers to questions asked by the users with citations to relevant Slack messages.
Slack clarified (both to the original user and on social media afterwards) that Slack AI is “a separately purchased add-on that uses Large Language Models (LLMs) but does not train those LLMs on customer data.” The tool uses LLMs hosted within Slack’s Amazon Web Services (AWS) infrastructure so that customer data remains in-house.
The general perception of Slack’s privacy policies:
Also read: