Amazon is taking unprecedented steps to gather data from GitHub by asking its employees to create multiple accounts on the platform. This strategy aims to bypass GitHub’s rate limits on data collection requests, thus accelerating the data scraping process essential for training Amazon’s new AI models. The internal memo revealed that the company’s Artificial General Intelligence Group is spearheading this initiative, emphasizing the critical need for both quantitative and qualitative metadata from GitHub repositories. This move is part of Amazon’s broader effort to develop its “most ambitious” AI model yet, a project led by Rohit Prasad, the head scientist of the AGI group. However, this method raises ethical and legal questions, especially regarding the potential violation of GitHub’s terms and the rights of open-source developers. Microsoft, the owner of GitHub, is likely to be displeased with Amazon’s aggressive data collection tactics. This incident underscores the growing competition among tech giants to acquire high-quality data for AI training, often leading to controversial practices.

Amazon’s Controversial Data Harvesting Tactics for AI Development
Amazon asks employees to create GitHub accounts to speed up AI data collection.
1–2 minutes










