Skip to main content

Course Descriptions

Find out more about Data Science Academy courses: browse the course description for each course.

In this course students will learn to articulate the benefits and potential risks of biomedical data sharing. They will explore the evolution of data sharing policy, from initial conception to current and future considerations. We will discuss FAIR principles of data sharing, as well as the technical and ethical barriers to sharing present in many existing data sets. Students will examine how private information (such as personal identifying information (PII) and personal health information (PHI)) are safeguarded in data sharing and learn how data are collected, deposited and how access to those sets is managed in federal repositories. They will discuss the differences in domestic versus foreign data sharing policies and how they impact research and international collaboration. Finally, students will investigate the role of the private sector in data sharing from research, discovery and legal perspectives. The final project will give students the opportunity to put those ideas into practice.

No prerequisites required.

Interested in using data science to study written information? This course will introduce students to a data science technique called topic modeling that is used in many different contexts to analyze data that is composed of words. Students will learn to preprocess words and texts to prepare them for machine learning models by turning words into numbers that an algorithm can take in and process. Then students will learn to use models that can help you decide how many clusters are appropriate for a particular dataset. Students will characterize clusters by identifying the main intents (actions) in each cluster and the most common entities (words) in that cluster. Finally students will use models to make decisions about how to label each cluster.

All students are welcome – no prerequisites required.

Many disciplines have their own specialized language or jargon. It’s easy to fall into the habit of using that jargon to communicate data analysis results or concepts to the public. It’s also easy to “cherry-pick” or selectively present data or results to convey a specific idea. This course will help fine-tune the student’s awareness regarding the use of discipline-specific language, and introduce skills to present scientific results and concepts clearly, unambiguously and with a minimum of jargon.

No prerequisites required.

This course will provide a framework to analyze privacy and control of information/big data through the lens of ethical implications of data collection and management. Students will evaluate datasets and relevant case studies to evaluate the broader impact of data science on government policy and society using principles of fairness, accountability and open-data. Students will integrate web scraping and textual analysis to examine the need for transparency while also learning best practices for responsible data management.

No prerequisites required.

A data physicalization (or simply physicalization) is a physical artifact whose materiality encodes data. Data physicalization engages its audience and communicates data using tangible data representations. This course covers topics such as visualization aesthetics, the data-object, data sculptures, critical making and wearable/art technology. Students will analyze current examples of data physicalization, discuss visualization in the context of cultural and historical practice, and evaluate scholarship that recognizes intersections among physicalization, record keeping and data literacy.

No prerequisites required.

In 2021 alone, a cyber attack occurred once every 39 seconds on average. Data Science for Cybersecurity will introduce students to the use of data to discover, explore, and address relevant cybersecurity use cases. Students will become familiar with fundamental approaches to tackle common cybersecurity problems using Python in this introductory-level course.

Some familiarity with Python recommended.

Data is a fundamental part of learning more about the most effective ways to improve communities, including learning about what policies work and why. Data Science for Policy will introduce students to the role of data as evidence in the policy process, including identifying cause and effect in complex social environments. Students will discuss the fundamental problem of causal inference, and explore the ways statistical modeling can assist policymakers in identifying effective public policy.

No prerequisites required.

Data science offers powerful tools for addressing a multitude of societal challenges, yet it is no panacea and will require collaboration and commitment from across society to fulfill its promise. Data Science for Social Good will introduce students to the growing use of data science in the social impact space, drawing from real-world examples aligned with the United Nation’s Sustainable Development Goals. These examples will span practice areas and approaches, including machine learning, natural language processing, and image recognition. Students will discuss the challenges of implementing data science for social good solutions, including considerations of community involvement, bias, and equity and identify best practices.

No programming experience required.

Data science and sustainability are two buzzwords that dominate industry, academia and social sectors. This course will explore the intersectionality of data science and sustainability to solve existential problems facing the modern world. Data science for sustainability will introduce issues like missing data, data availability and small data sets as it relates to climate change, plastic waste, public health and related topics.

No previous programming is required, but it will be fine to use SQL if you have some experience.

It is often said that 80% of the time spent on analyzing data is on finding, cleaning and preparing data for analysis. This course will focus on how to format data sets for subsequent analysis, tools for manipulating and cleaning data sets, methods for reading data from tables on web pages, and techniques for merging multiple data sets. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops.

Requires some basic programming experience.

In this course, participants will experience a practical approach to employing design thinking, computational thinking and problem solving through data science. Students will be able to enhance their 21st-century skills (communication, collaboration, critical thinking and creativity) and be able to incorporate problem-solving frameworks to solve global challenges that impact the society.

No prerequisites required.

In this course students will manipulate and analyze large data sets to understand the unequal distribution of disease, illness, injuries, disability and death within a population. Epidemiology is a data driven field driven by the past, current and potential future history of a population to develop resilient public health and other public policies. To achieve these objectives, participants will conduct an elementary epidemiological study of Covid19 in North Carolina. Students will learn about data sources, data retrieval and data scrubbing. They will then be introduced to causality and data interpretation using the basic quantitative analyses of epidemiology. Using these skills, they will then create their own study of interest like gun violence and climate change disparities. Students will use Excel and have the option to use other tools as well.

Prerequisites: Students should have experience with basic statistical concepts such as scatter plots, median, mean and variance.

Exploratory data analysis (EDA) focuses on summarizing the main characteristics of data sets, often using visualization methods. The goal is not formal modeling or hypothesis testing, but understanding and exploring data to formulate hypotheses for further investigation. This course will present the techniques of EDA and generalize those to large data sets. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops.

Requires some basic programming experience.

This course develops the introductory skills in R and Python that students need for data science. Topics include data types, data structures, control structures, good coding practices and reproducible coding. Students will become acquainted with basic data science algorithms and their implementations in R and Python. 

No prerequisites required.

Machine learning (ML) is the “field of study that gives computers the ability to learn without being explicitly programmed.” In this course we will deconstruct the fundamental ideas behind popular ML algorithms, such as logistic regression or k-means, using a projects-center approach. We will draw our projects from successful ML use cases like image recognition and anomaly detection. Each week students will be encouraged to build and tailor the ML projects discussed to their specific domains of interest. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops. Students should also be familiar with matrix-vector multiplication and the norm of a vector.

Requires some basic programming experience.

This course will explore the methods that are useful for analyzing text as a data source. The course will survey the different goals and questions relating to text, including areas like text processing, morphological analysis, syntactic analysis, lexical analysis, semantics, discourse analysis and text summarization. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops.

No prerequisites required.

A large part of data science is being able to manipulate the data you have for the analysis you wish to do. The goal of this course is to teach you how to format data, clean the data, extract relevant information from the data and manipulate it for your analysis. We will also focus on visualizations, how different types of visualizations convey different meanings, and how to pick the most accurate way to represent your analysis that will interest the intended audience.

No prerequisites required.

Data analytics represents one of the most competitive and thriving fields within the social sciences. Employers from public, private and nonprofit firms across the country need workers with expertise in this area. In this course, you will learn basic statistical concepts and principles in an applied setting using R, one of the most common statistical software packages. You will run basic statistical analyses using R and, in doing so, you will also learn how to program in R and how to use R for effective data analysis. The course covers practical issues in statistical computing which includes programming in R, reading data into R, accessing R packages, writing R functions, debugging and organizing R code. Topics in statistical data analysis will provide working examples. The class will have mini-projects that lead to a final project.

No programming experience is expected but students are expected to have an understanding of basic descriptive statistics.

There are many layers to consider when trying to guarantee reproducibility of a given program (can you still run a program you wrote years ago without errors?). Package managers (apt-get, pip, etc.) solve part of that problem, but the next step is making a portable recreation of (part of) the OS environment. Docker images package software and their dependencies into just such portable containers. Containerization and container orchestration have become critical tools in software engineering, including machine learning engineering, and are heavily used in industry. In this course we’ll learn the basics of containerization and gain experience running containers locally and in the cloud.

Prerequisites: Familiarity with the Linux command line and bash scripting.

Python is a high-level, interpreted language that has emerged as a power ful tool for scientific computing. This 1-credit course includes exposure to tools in three different areas:

  1. General software development tools, including terminal commands and version control.
  2. Python programming basics, including syntax, object oriented structures, modules and exception handling.
  3. Scientific computing in Python.

Topics include

  • Basics (Variables/Loops/Conditionals/Data Structures)
  • Object-oriented programming
  • Plotting (matplotlib)
  • Scientific computing packages (NumPy/SciPy/Matplotlib/Pandas)

Some higher level programming background (e.g., C++/MATLAB) is desirable.

Social networking sites have quickly become some of the most visited sites on the internet and wield political and economic power that surpasses that of many traditional media institutions. Although these services have democratized expression and provided digital space to build virtual communities, they have also fundamentally modified media consumption and social behavior, potentially exposing users and non-users alike to myriad risks. Students will discuss the socio-cultural impacts of social media and explore the ways in which individual agency is influenced by social media systems and practices.

No prerequisites required.

Visualizations can be one of the most effective means to communicate quantitative information. This course will cover the principles of effective visualization and how to interpret data displays. Students will evaluate current examples in the media and learn tools (such as Excel, Tableau and Gephi) for creating static, interactive and dynamic data displays.

No prerequisites required.

A Note About Courses

DSA courses are listed as DSC 495 Special Topics in Data Science in the Class Search and Registration Wizard tools. The descriptions on this page give more information than may be available in those tools. These descriptions are for all courses offered by the Data Science Academy.

Please note that not all courses are offered every semester.