There are many definitions attempting to provide clarity to the term 'Big Data'. Many of these encompass the notion of bigness, where the magnitude of the data exceeds the storage capacity and computing capacity. Somewhat like the overwhelming feeling of trying to take a drink of water from a fire hose.
Within higher education, a definition that I find both interesting and concerning focuses on the position of combining data from multiple sources to improve student success. At first glance, who would argue against creating an environment of enabling students to achieve their goals. Probably no one! But on closer examination, what are the ethical responsibilities for managing and using student data? In other words, it's not the amount of data that’s important, but what the school does with the data that’s important.
Some of the earliest attempts to manage data emerged with data warehousing and executive information systems. These applications achieved operational success by enabling improved decisions, operational efficiency, and educational effectiveness. Transactional data from registrar and admission offices, and legacy data from student information systems, were released to academic and administrative leaders across campus. These data-rich resources supported predictive analytics for admission decisions, course offering patterns constrained by limited resources, improved graduation rates, best practices with peer institutions, and a host of other student-life practices.
Expanded Data Collection
The growth of technology driven applications has enabled the expansion of data collection opportunities. For example, a student’s interaction with the learning management system provides pertinent information to the course instructor about engagement and participation in course activities. Card-swipe systems provide information to campus administrators about the patterns of places (classrooms, resident hall, food service, bus transportation, recreation, etc.) the student frequents. Social media interactions with school services and applications result in data collection about the student’s interests. Interaction with many school endorsed, third-party hosted services (i.e. scholarship searches, career advisory networks, advising/counseling services, etc.) result in the collection of both personally identifiable data as well as data pertinent to the service. [IMPORTANT NOTE: Credentials Solutions does not maintain student's personal identifiable information. A simple record of the transaction is maintained, but detailed information about the student is not maintained.]
Because You Can, Should You?
The proliferation of data sources provides opportunities to create big-data and the temptation to utilize these data sources to improve student success is great. Technically, this coalescing of data may be very simple. The more important issue is should all this data, that was never intended to be connected, be connected? Are the student's interests being served, or are the school’s interests being served? What is the accountability? Do students know how their data is being used? The school has ethical responsibilities when managing and co-mingling big-data including commitments to honor the integrity, discretion, and privacy of the student.
Further complicating this is the emerging discussion differentiating “student” from “learner”. Students are generally considered to be within a more structured educational environment – they sign an admission application, pay tuition, register for courses, earn credits, receive transcripts/diplomas, and so on. Learners are less defined and structured – they have loose participation in a learning program, receive no credentials, and may never be challenged to provide personal identity. Given these distinctions, do learners have the same or less protection in the world of big-data?
As discussed in previous articles, the context of “the permanent record” is changing. Historically, each student had one record at each school; the school held the record exclusively and in perpetuity. Available data was limited and difficult to integrate with other data. The change now occurring is the rapid proliferation of providers, like MOOCS, certificate providers, and social media platforms. Schools are losing their complete control and the opportunities to create big-data is moving beyond the campus boundaries. As students assume control over their record, the school’s ability to create big-data becomes more of a clouded issue.
From a public policy perspective, the issue of big-data is critically important. As student enrollment patterns continue to become more complex by enrollment at multiple schools, policy makers have an increased challenge to manage public funding issues. The complete ability to track student enrollment from start-to-end is both compelling and reasonable in defining student success.
Responsible Use of Student Data
Most would agree that the real benefit of big-data is about the group, and not the individual. The Stanford University Center for Advanced Research through Online Learning has been developing principles for the responsible use of student data within the context of big-data. These emerging principles center on four basic beliefs:
1. Shared Understanding. All parties involved in the generation of data involving student interactions with school officials are to be provided with a clear explanation regarding the existence and use of this data.
2. Transparency. Students are entitled to complete explanations of how they are being assessed through the presence of collected data.
3. Informed Improvement. All student data used in research, including improving operational efficiencies and effectiveness, are subject to the school’s governance process.
4. Open Futures. Education should enable opportunity. Instructional, advisement, and assessment systems must be used in ways that enable students to demonstrate achievements.
For more information on these developing policies, feel free to visit the Stanford CAROL project page.
It is fair to say that computing capacity and storage are no longer the limitations of large data sets. The era of big-data is here, and its future is likely to expand. The issues still to be defined will center on the moral, ethical, and privacy issues of building and using big-data.