Keyboard shortcuts
Change font size: + -

Data Science Instruction for All Disciplines

 
 

Data Science Instruction for All Disciplines:
Summary of the Report

 

1.      The importance of data science

Data science involves collecting, storing, and analyzing data and developing tools for this work. The data comes from a broad range of academic disciplines and commercial applications. Many developments in recent years have contributed to the surge in data science, including the ubiquitous use of social networks, developments in the field of databases (big data), cloud storage, and machine learning in various fields, especially machine vision and language analysis. Data science is increasingly important for society and the state. The interaction between the state and its citizens is becoming digital, generating data that can be used to draw conclusions and make predictions. The data is sometimes needed to defend the state, monitor crime, and contend with epidemics. In industry, especially in the high-tech sector, there is a high demand for professional personnel with strong abilities in data science, and the current shortage is expected to grow.

2.      Forming the committee

The importance of data science led the Israel Academy of Science to establish this committee, to assess the need for data science training in academic studies, and to identify the ways to provide this training. The committee believes that Israeli academia is committed to imparting basic data science skills to its graduates, just as it is committed to introducing them to various scientific approaches and teaching them to embrace critical thinking. The committee recommends teaching data science in all of the academic institutions in Israel while allowing each the flexibility to implement the program in accordance with its academic needs and the nature of its curriculum. This recommendation offers an opportunity for all academic institutions in Israel to innovate and mobilize resources for the emerging curriculum.

3.      The objectives of teaching data science on campus according to the target population

  • Undergraduates in data science

This type of program is generally offered in one or more of the following departments: computer science, statistics, industrial engineering and management, electrical engineering, management and information systems, and information science. The current report does not address this group.

  • Students in departments relevant to data science but not in a data science program

This group may include students in departments where data science analytical and computational requirements for the degree are essential for professional work in the field. For these students, the undergraduate degree serves two purposes: 1. to enable employment in the business and public sectors in positions that involve data analysis; 2. to facilitate the transition to advanced studies in data science with relatively little additional study.

  • Undergraduate students across the rest of the campus

For these students, the instructional objectives are 1. imparting the ability to identify a need for data and the ability to use data for study, research, and work; 2. imparting an understanding of the advantages and limitations of using data-intensive methods, as well as applying a critical approach to results; 3. developing sensitivity and a critical perspective for the processes of data research and for ethical problems that arise.

4.      The content and scope of teaching

The conventional way of acquiring a comprehensive picture of the content and scope of data science is through the data cycle - a schematic model that characterizes any data-based process of research or decision-making. The data cycle, as its name indicates, is a process that repeats itself.

The data cycle is described graphically in the diagram below. All data scientists accept its components, but the number of stages can range from six to nine, according to the different definitions. A relatively simple structure is displayed here:

 

 

The full report describes the tools applied in each of the stages of the data cycle. All students should become familiar with these tools and learn how to use some of them. We distinguish between the minimal tools that each student on campus should be capable of using and more advanced tools intended for students with the appropriate background.
 

5.      Ethics and critical thinking in data analysis

In the process of data analysis, ethical questions arise vis-à-vis the individual and the society; some of the questions have legal implications. The training of data scientists requires awareness of these questions and familiarity with the ways of addressing them. The question of privacy should be addressed when the data includes personal information, such as medical or legal procedures. The committee calls for maintaining a balance between the societal advantages of full disclosure of data and potential harm to the individual. In addition, the advantage that large commercial companies gain should be weighed against the interests of society and the individual. Collecting information also raises copyright issues. It is imperative to strictly ensure the transparency of the process and access to the data. Finally, researchers‘ ethical conduct includes proper disclosure of conflicting interests, without bias or distortion of the process.

6.      Recommendations for flexible implementation in various disciplines

The committee recommends exposing every student on campus to all of the data cycle topics during their undergraduate studies. This familiarity may be accomplished by developing an Introduction to Data Science course or by enriching existing courses with research methods and statistics in order to cover all of the topics in the data cycle. It is also possible to combine these two alternatives.

The committee recommends flexibility in adopting the recommendations of this report in the different academic units, according their respective resources and needs and the nature of the curriculum in each. However, the system should also offer options to students in all fields who are interested in pursuing further study of data science.

The committee recommends that all students learn the data cycle model that corresponds to their respective fields of study. It also recommends that the course dedicated to this objective should be modular, starting with the fundamental and generic core, based on the data cycle, and followed by topics appropriate to the needs of the various fields of research and study.

6.1.  The core component

Scope: 2 hours of lectures per week, 2 practice hours per week

Course topics:

  1. The data cycle
  2. A survey of tools that support the various stages of the data cycle
  3. Types of data in the digital space: numerical, textual, structured, and unstructured
  4. Basic implementation of data collection, integration of data from different sources, data analysis, algorithmic learning, and visualization of data
  5. Critical thinking throughout the data cycle
  6. Sharing data (open data, data repositories) and rules for their academic citation

6.2.  Disciplinary components

The full report presents examples of a syllabus for the introductory course for the following disciplines: humanities, social sciences and management, law, life sciences and medicine, and exact sciences and engineering.

7.      Collaboration in teaching data science

University research centers for data science: Facilitated by steps taken by the Planning and Budgeting Committee of the Council for Higher Education, research centers for data science were established at the research universities in Israel. The success of the current initiative largely depends on collaboration with these university centers.

Developing and providing access to special data files for Israel: The teaching of data science, as recommended in this report, requires sufficient resources in each of the fields. Creating these resources is a national imperative, as part of the establishment of national science infrastructures. The report discusses resources in the local languages (Hebrew and Arabic), legal data, and social science data. Each of the sections surveys the current situation and proposes directions for future development.

Synergy with academic libraries: Library services include developing and providing access to information infrastructures, as well as support for learning and research processes. In the past decade, there has been particularly rapid growth in the fields of data science and big data; libraries must be prepared to respond. This growing demand is an opportunity for libraries to launch a new surge of activity and innovation. The report recommends that libraries establish training centers for data science and develop physical infrastructure and areas with advanced equipment to meet research and teaching needs.

Data science in pre-university education: The correct way to inculcate the ability to work with data is not through lessons devoted to data science, but in everyday tasks of data use in each of the subjects taught in schools, from history to mathematics, at all ages. Schools that have moved to learn through multidisciplinary projects can include data science skills in the projects. Like all fundamental skills, the use of data should not be left for intensive study in high school before the matriculation exams. To incorporate data science in the education system, teachers — including those who are already part of the system and those who are now being trained in teachers’ colleges — need to be trained in this subject. The Ministry of Education formed a professional committee on data science, which is currently developing a high school specialization in this field. Collaboration with the ministry’s committee is recommended.