User data from social media platform Bluesky was reportedly scraped and built into a dataset and published on AI platform Hugging Face. The dataset contained a million Bluesky posts along with user information, per a report by 404 Media.

AI researcher Daniel van Strien obtained the data through Firehose API and then put the dataset on a public repository, the report added. He wanted to use the data to develop AI models and analyse social media trends, content moderation and patterns of posting.

The posts also contained metadata with the users’ decentralised identifiers (DIDs) and a search function to look for content from specific users. 

While Bluesky has said that it doesn’t train AI models with user data, it still doesn’t have a solution to stop third-parties from doing the same. 

“Bluesky won’t be able to enforce this consent outside of our systems. It will be up to outside developers to respect these settings. We’re having ongoing conversations with engineers & lawyers and we hope to have more updates to share on this shortly!” the company said in a post. 

The report shared that Bluesky’s API along with the public and decentralised Authenticated Transfer (AT) Protocol it has been built on, keeps content accessible to third party developers. 

After taking over Twitter, Elon Musk started charging users for X’s API access to halt free data scraping. A month ago, it was reported that X had hiked the fee up further.

Published - November 28, 2024 11:27 am IST