The Blackbox Dataset, maintained by the University of Kent, has been collecting data through the BlueJ Pedagogic Programming IDE for several years and is continuously logging data from over 250,000 programmers at thousands of institutions worldwide. In addition to compilation events the database contains a large amount of naturally accumulating programming process data including location, source code, line-edits to source code, errors, method invocations including parameters and return values, IDE actions and other metadata. As of 2015 the dataset contains over 40 million compilation events and represents a trove of data that can be used to develop pictures and profiles of how students learn and develop code that as of 2016 is largely unexplored and unexploited. Knowledge gained from this data could uncover trends including error frequency, error and behaviour evolution, and country-specific behaviours and differences. Such knowledge could be useful for many reasons important to student and educator efficacy. In addition insights into how students learn to program on a global scale can be used to help inform the global community working towards addressing the current shortage of students in STEM disciplines; to combat the global problem of high computer science and ICT dropout rates; and in helping alleviate the lack of enrolment by underrepresented groups including the small percentages of women undertaking CS/ICT programmes at third-level.

A potential project would be developing a dashboard application providing customised access to this data, providing responses to queries of data already logged, as well as providing an ability to view real-time activity of the network of IDEs logging data to the database. This could be stand-alone software, web-based, or a mixture of both.