Unauthorized Data Set for AI Models Removed by BREIN
A copyright enforcement group called BREIN has successfully taken down a significant language dataset that was being used without authorization for training AI models. This dataset contained information gathered from tens of thousands of books, news sites, Dutch language subtitles from various films and TV series. The director of BREIN, Bastiaan van Ramshorst, mentioned that they are unsure if the dataset had already been utilized by AI companies. To prevent potential legal issues in the future, they acted promptly to remove the dataset.
Regulatory Implications for AI Firms
European Union’s AI Act mandates AI companies to disclose the datasets employed in training their models. There have been instances, like the case involving Microsoft-backed OpenAI in the U.S., where lawsuits were filed for unauthorized use of copyrighted material in training AI models. It is crucial for AI firms to adhere to copyright regulations to avoid legal repercussions.