Students participate in a class in Fitts-Woolard Hall on Centennial Campus.

Previous Courses

Below is a list of DSA courses and course descriptions offered previously.

DSA Spring 2025 Sections

Introduction to R/Python for Data Science
Introduction to Data Visualization
Data Communication
Data Science for Social Good
Fusion Fitness for Big Data
Measuring Success
Citizen Science Data Analytics
Introduction to AI Ethics
Data Science for Cybersecurity
Data Wrangling and Web Scraping
Exploratory Data Analysis for Big Data
Exploring Machine Learning
Introduction to APACHE Spark Using Big Datasets
Sports Analytics and Forecasting Using R
Bayesian Computations for Machine Learning
Predictive Analytics for Improving Services

Introduction to R/Python for Data Science
Introduction to Data Visualization
Data Communication
Data Science for Social Good
Fusion Fitness for Big Data
Measuring Success
Introduction to AI Ethics
Data Science for Cybersecurity
Data Wrangling and Web Scraping
Exploratory Data Analysis for Big Data
Exploring Machine Learning
Principles of Quantum Computing Algorithms
Introduction to APACHE Spark Using Big Datasets
R for Biological Research
Neural Networks and Reinforcement Learning

Advanced Social Network Analysis
Data Internship Preparation for Social Impact
Data Science for Cybersecurity
Data Science for Social Good
Data wrangling and Web Scraping
Exploratory Data Analysis for Big Data
Exploring Machine Learning
Fusion Fitness for Big Data
Introduction to Data Simulation, Permutation and Augmentation
Introduction to Data Visualization
Introduction to R/Python for Data Science
Machine Learning for Computer Vision
Predictive Analytics for Improving Services
R for Biological Research
Topic Modeling: Clustering Data Through Machine Learning

In Spring 2023, the DSA offered 15 topics:

Biomedical Data Sharing DSC-495-604
Data Communication DSC-295-004
Data in Motion DSC-295-301
Data Internships for Social Good DSC-495-006
Data Science for Cybersecurity DSC-295-603
Data Science for Social Good DSC-295-601
Data Visualization DSC-295-602
Data Wrangling and Web Scraping DSC-495-016
Epidemiology: BIG Data for Disease and Disparities DSC-495-606
Exploratory Data Analysis for Big Data DSC-495-017
Introduction to R/Python (4 sections) DSC-295-002,003,005,006
Machine Learning for Computer Vision DSC-495-013
R for Data Science and Visualization DSC-295-001
R for Social Sciences DSC-295-604
Text Analytics Using Intermediate Python DSC 595-601

In Fall 2022, the DSA is offered 17 topics:

Biomedical Data Sharing DSC-495-605
Clustering Data Through Machine Learning DSC-495-600
Data Communication DSC-495-012
Data Science for Cybersecurity DSC-495-602
Data Science for Social Good DSC-495-603
Data Science for Sustainability DSC-495-003
Data Wrangling and Web Scraping* DSC-495-004
Exploratory Data Analysis for Big Data* DSC-405-002
Epidemiology: BIG Data for Disease and Disparities DSC-495-607
Machine Learning for Practitioners* DSC-495-010
Reproducibility, Containers and the Cloud* DSC-495-020
Introduction to R/Python (5 sections) DSC-495-001, DSC-495-006, DSC-495-009, DSC-495-016, and DSC-495-017)
R for Data Science and Visualization DSC-495-007
R for Social Sciences DSC-495-606
Scientific Programming with Python (2 sections) DSC-495-014, DSC-495-015
Social Media: Data, Ethics and Theory DSC-495-601
Visualization: Tools and Techniques DSC-495-005

In Spring 2022, the DSA offered 8 courses:

Big Data*
Data for Policy
Ethics
Machine Learning for Practitioners*
NLP (Natural Language Processing)*
Data Physicalization
Introduction to R/Python (3 sections)
Wrangling/Scraping*

In Fall 2022, the DSA offered 5 courses:

Big Data*
Data for Policy
Data Visualization
R/Python
Wrangling/Scraping*

All Course Descriptions

Some courses listed above are still being offered in the upcoming semester. Below is a list of both current and previously offered course descriptions.

Currently offered courses:

Course Section: DSC 595-001

Day/Time: Tuesdays, 3 p.m. to 3:50 p.m.

Delivery: In person

Prerequisites: Students should have a basic understanding of probability and statistical inference. Experience working with the R programming environment is beneficial, though not required in advance of the course.

Description: Social network analysis (SNA) refers to the study of connections among and between social units: people, events, organizations, communities, and other groups. This course provides an overview of the primary statistical tools used to analyze network data. This includes the estimation of various network measurements, the identification of cohesive subgroups, and the application of various inferential statistics to network data.

Course Section: DSC 410-001

Day/Time: Wednesdays, 10:40 a.m. to 11:30 a.m.

Delivery: In person

Prerequisites: Some elementary data science experience that could be applied in an internship.

Description: Students will prepare to apply for internships for social impact in nonprofits, governmental organizations, and community organizations. As part of this preparation, students will become familiar with tools (such as a data maturity questionnaire) that can help organizations assess their own use of data, and use assessment results to initiate conversations about the organization’s data practices and goals. Students will learn about the appropriate scope of projects for an internship, and practice some basic data management, analysis, and visualization through a mini-project utilizing data from real organizations with a focus on social impact. Additional emphases include developing and refining interviewing skills, professional and personal networks, job applications, and job selection.

Course Section: DSC 295-602

Day/Time: Mondays, 6 p.m. to 6:50 p.m.

Delivery: Online

Prerequisites: None

Description: In the past year, a cyberattack occurred once every 39 seconds on average. Students will investigate the use of data to discover, explore, and address relevant cybersecurity use cases. Students will become familiar with fundamental approaches to tackle common cybersecurity problems using Python. No prior programming experience required.

Course Section: DSC 295-612

Day/Time: Mondays, 4:30 p.m. to 5:20 p.m.

Delivery: Online

Prerequisites: None

Description: Data science offers powerful tools for addressing a multitude of societal challenges, yet it is no panacea and will require collaboration and commitment from across society to fulfill its promise. Students will investigate the growing use of data science in the social impact space, drawing from real-world examples aligned with the United Nation’s Sustainable Development Goals. These examples will span practice areas and approaches, including machine learning, natural language processing, and image recognition. Students will discuss the challenges of implementing data science for social good solutions, including considerations of community involvement, bias, and equity and identify best practices.

Course Section: DSC 405-001

Day/Time: Thursdays, 4:30 p.m. to 5:20 p.m.

Delivery: In person

Prerequisites: Students should enter the course with basic knowledge of a programming language, such as R or Python.

Description: Finding, cleaning, and preparing data is often required prior to conducting any data analysis. Data wrangling often accounts for the majority of the time spent working with data and learning these concepts is fundamental to the data science process. Students will learn how to manipulate and clean data for analyses and visualizations, read data from web pages, and merge multiple data sets of reasonable sizes.

Course Section: DSC 406-601

Day/Time: Wednesdays, 4:30 p.m. to 5:20 p.m.

Delivery: Online

Prerequisites: Students should enter the course with some basic programming experience, such as experience using and familiarity with R or Python.

Description: Exploratory data analysis (EDA) focuses on summarizing the main characteristics of data sets, often using visualization methods. The goal is not formal modeling or hypothesis testing, but understanding and exploring data to formulate hypotheses for further investigation. Students will use techniques of EDA and generalize those approaches to large data sets.

Course Section: DSC 412-601

Day/Time:

Delivery: Online

Prerequisites: Students should have basic knowledge of a programming language (e.g., R, Python, or others), experience with appropriate use of data structures (e.g., lists and matrices), and flow control mechanisms, such as loops. Students should also be familiar with matrix-vector multiplication and the norm of a vector.

Description: Machine learning (ML) is a fundamental component of artificial intelligence. Students will deconstruct the basic ideas behind popular ML algorithms, such as logistic regression or K-means, using a projects-centered approach. Students will create projects from successful ML use cases tailored to their specific domains of interest.

Course Section: DSC 295-008

Day/Time: Tuesdays and Thursdays, 10:40 a.m. to 11:30 a.m.

Delivery: In person

Prerequisites: None

Description: A fusion of aerobics, strength, and flexibility with data analysis. Students will participate in health and fitness activities and collect data on their physiological responses to assess their progress throughout the semester. Parameters include 20 measures such as urinalysis test results, training heart rate, recovery heart rate, temperature, blood pressure, components of the Functional Movement Screening Assessments, site body fat indicators, and more. This data will be collected at regular intervals throughout the semester and used in a project to investigate health trends and communicate the data informed stories of students’ fitness progress.

Course Section: DSC 495-013

Day/Time: Wednesdays, 1:55 p.m. to 2:45 p.m.

Delivery: In person

Prerequisites: Students should have basic coding proficiency in loops, conditional statements, and matrix data structures.

Description: Students will be introduced to a range of techniques from computational statistics frequently encountered in big data and machine learning. Through case studies students will learn to identify sampling biases commonly encountered in real world data sets. Students will then explore how to overcome these data deficits to extract reliable data inferences through permutation testing, nonparametric imputation, and parametric data simulation and biases correction techniques. Course materials will emphasize a conceptual understanding of these topics. Course content will be delivered in R.

Course Section: DSC 202-601

Day/Time: Wednesdays, 11:45 a.m. to 12:35 p.m.

Delivery: Online

Prerequisites: None

Description: Visualizations can be one of the most effective means to communicate quantitative information. Students will cover the principles of effective visualization and how to interpret data displays. Students will evaluate current examples in the media and learn tools for creating static, interactive, and dynamic data displays.

Course Section: DSC 201

DSC 201-001: In Person, Wednesdays, 9:25 a.m. to 10:25 a.m.
DSC 201-002: In Person, Tuesdays, 11:45 a.m. to 12:35 p.m.
DSC 201-003: In Person, Thursdays, 12:50 p.m. to 1:40 p.m.
DSC 201-601: Online, Thursdays, 6 p.m. to 6:50 p.m.

Prerequisites: None

Description: Students will develop introductory skills in R and Python needed for data science. Topics include data types, data structures, control structures, good coding practices, and reproducible coding. Students will become acquainted with basic data science techniques and their implementations in R and Python. Skills acquired in this course serve as a foundation for many of the Data Science Academy classes that suggest some experience with R or Python.

Course Section: DSC 495-001

Day/Time: Tuesdays, 1:55 p.m. to 2:45 p.m.

Delivery: In person

Prerequisites: Basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops. Students should also be familiar with matrix-vector multiplication and the norm of a vector

Description: Students will deconstruct the fundamental ideas behind popular ML algorithms used in computer vision using a project-centered approach. Students will create projects from successful ML use cases in image recognition and learn how to deploy ML algorithms in embedded hardware platforms. Each week students will be encouraged to build and tailor the ML projects discussed to their specific domains of interest. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops. Students should also be familiar with matrix-vector multiplication and the norm of a vector.

Course Section: DSC 495

Day/Time: Thursdays, 3 p.m. to 3:50 p.m.

Delivery: Online

Prerequisites: Students should have familiarity with R Studio and some experience using R to manipulate data and run basic descriptive statistics.

Description: For-profit companies, nonprofit service providers and government agencies often use predictive analytics to improve their services. Predictive analytics harnesses historical data and may incorporate machine learning to model future outcomes. Students will explore the value, limitations and ethical considerations of predictive analytics when used to improve services. Using their own dataset or one provided by the instructor, students will learn and apply a practical approach for planning, implementing and assessing a predictive analytics project. The instruction will highlight topics related to data preparation, model training and selection, validation, fairness, transparency and communication of results.

Course Section: DSC 595-002

Day/Time: Mondays, 4:30 p.m. to 5:20 p.m.

Delivery: In person

Prerequisites: Familiarity with biology and biological research.

Description: An introduction to the R programming environment geared towards the biological sciences. Topics include installation & software setup, programming, and data exploration & analysis, with heavy focus on data visualization (graphics) within R. Students will learn analyses with a focus on shareability and reproducibility within the context of a project or mini projects. Students may bring their own data and use case datasets will be available as well.

Course Section: DSC 495-600

Day/Time:

Delivery: Online

Prerequisites: Basic familiarity with Python.

Description: Interested in using data science to study written information? This course will introduce students to a data science technique called topic modeling that is used in many different contexts to analyze data that is composed of words. Students will learn to preprocess words and texts to prepare them for machine learning models by turning words into numbers that an algorithm can take in and process. Then students will learn to use models that can help you decide how many clusters are appropriate for a particular dataset. Students will characterize clusters by identifying the main intents (actions) in each cluster and the most common entities (words) in that cluster. Finally students will use models to make decisions about how to label each cluster.

Previously offered courses:

Students will learn to articulate the benefits and potential risks of biomedical data sharing and explore the evolution of data sharing policy, from initial conception to current and future considerations. Students will discuss FAIR principles of data sharing and the technical and ethical barriers to sharing present in many existing data sets, examine how private information (such as personal identifying information (PII) and personal health information (PHI)) are safeguarded in data sharing; and learn how data are collected, deposited, and how access to those sets is managed in federal repositories. Students will discuss the differences in domestic versus foreign data sharing policies and impacts on research and international collaboration. Students will investigate the role of the private sector in data sharing from research, discovery, and legal perspectives. Final project puts ideas into practice.

Interested in using data science to study written information? This course will introduce students to a data science technique called topic modeling that is used in many different contexts to analyze data that is composed of words. Students will learn to preprocess words and texts to prepare them for machine learning models by turning words into numbers that an algorithm can take in and process. Then students will learn to use models that can help you decide how many clusters are appropriate for a particular dataset. Students will characterize clusters by identifying the main intents (actions) in each cluster and the most common entities (words) in that cluster. Finally students will use models to make decisions about how to label each cluster.

This course will provide a framework to analyze privacy and control of information/big data through the lens of ethical implications of data collection and management. Students will evaluate datasets and relevant case studies to evaluate the broader impact of data science on government policy and society using principles of fairness, accountability and open-data. Students will integrate web scraping and textual analysis to examine the need for transparency while also learning best practices for responsible data management.

What kind of data interests you? How can you communicate it in unexpected ways? To investigate these questions, students will communicate data and other information through movement. No prior dance experience is required. Students will collaborate with their peers – a mix of dancers and people interested in data, communication and moving. Students will both create movement and participate in movement created by others.

Data science offers powerful tools for addressing a multitude of societal challenges, yet it is no panacea and will require collaboration and commitment from across society to fulfill its promise. Students will investigate the growing use of data science in the social impact space, drawing from real-world examples aligned with the United Nation’s Sustainable Development Goals. These examples will span practice areas and approaches, including machine learning, natural language processing, and image recognition. Students will discuss the challenges of implementing data science for social good solutions, including considerations of community involvement, bias, and equity and identify best practices.

A data physicalization (or simply physicalization) is a physical artifact whose materiality encodes data. Data physicalization engages its audience and communicates data using tangible data representations. This course covers topics such as visualization aesthetics, the data-object, data sculptures, critical making and wearable/art technology. Students will analyze current examples of data physicalization, discuss visualization in the context of cultural and historical practice, and evaluate scholarship that recognizes intersections among physicalization, record keeping and data literacy.

Data is a fundamental part of learning more about the most effective ways to improve communities, including learning about what policies work and why. Data Science for Policy will introduce students to the role of data as evidence in the policy process, including identifying cause and effect in complex social environments. Students will discuss the fundamental problem of causal inference, and explore the ways statistical modeling can assist policymakers in identifying effective public policy.

Data science and sustainability are two buzzwords that dominate industry, academia and social sectors. This course will explore the intersectionality of data science and sustainability to solve existential problems facing the modern world. Data science for sustainability will introduce issues like missing data, data availability and small data sets as it relates to climate change, plastic waste, public health and related topics.

In this course, participants will experience a practical approach to employing design thinking, computational thinking and problem solving through data science. Students will be able to enhance their 21st-century skills (communication, collaboration, critical thinking and creativity) and be able to incorporate problem-solving frameworks to solve global challenges that impact the society.

Machine learning (ML) is the “field of study that gives computers the ability to learn without being explicitly programmed.” In this course we will deconstruct the fundamental ideas behind popular ML algorithms, such as logistic regression or k-means, using a projects-center approach. We will draw our projects from successful ML use cases like image recognition and anomaly detection. Each week students will be encouraged to build and tailor the ML projects discussed to their specific domains of interest. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops. Students should also be familiar with matrix-vector multiplication and the norm of a vector.

This course will explore the methods that are useful for analyzing text as a data source. The course will survey the different goals and questions relating to text, including areas like text processing, morphological analysis, syntactic analysis, lexical analysis, semantics, discourse analysis and text summarization. Students should have basic knowledge of a programming language, such as appropriate use of data structures, such as lists and matrices, and flow control mechanisms, such as loops.

A large part of data science is being able to manipulate the data you have for the analysis you wish to do. Students will format and clean data, extract relevant information from the data, and manipulate data for analysis. Students will also explore visualizations, how different types of visualizations convey different meanings, and how to pick the most accurate way to represent analyses that will interest the intended audience.

Python is a high-level, interpreted language that has emerged as a power ful tool for scientific computing. This 1-credit course includes exposure to tools in three different areas:

General software development tools, including terminal commands and version control.
Python programming basics, including syntax, object oriented structures, modules and exception handling.
Scientific computing in Python.

Topics include

Basics (Variables/Loops/Conditionals/Data Structures)
Object-oriented programming
Plotting (matplotlib)
Scientific computing packages (NumPy/SciPy/Matplotlib/Pandas)

Some higher level programming background (e.g., C++/MATLAB) is desirable.

Social networking sites have quickly become some of the most visited sites on the internet and wield political and economic power that surpasses that of many traditional media institutions. Although these services have democratized expression and provided digital space to build virtual communities, they have also fundamentally modified media consumption and social behavior, potentially exposing users and non-users alike to myriad risks. Students will discuss the socio-cultural impacts of social media and explore the ways in which individual agency is influenced by social media systems and practices.

This course will provide a broad overview of text analysis and natural language processing (NLP), including a significant amount of introductory material but with extensions to state-of-the-art methods. All aspects of the text analysis pipeline will be covered including data preprocessing, converting text to numeric representations (from simple aggregation methods to more complex embeddings), and training supervised and unsupervised learning methods for standard text-based tasks such as named entity recognition (NER), sentiment analysis, and topic modeling. The course will alternate between presentation and hands-on exercises in Python. Translations from Python to R will be provided for students more comfortable in that language and students can create a project in the language of their choosing, however all illustrative examples will be in Python. Students should be familiar with Python (preferably), R, or both and have a basic understanding of statistics and/or machine learning concepts. In particular, students should have experience applying supervised and/or unsupervised methods, such as regression, classification, dimension reduction, and clustering, and understand how to assess model performance using appropriate metrics. Students will gain the practical skills necessary to begin using text analysis tools for their tasks, an understanding of the strengths and weaknesses of these tools, and an appreciation for the ethical considerations of using these tools in practice.

Visualizations can be one of the most effective means to communicate quantitative information. This course will cover the principles of effective visualization and how to interpret data displays. Students will evaluate current examples in the media and learn tools (such as Excel, Tableau and Gephi) for creating static, interactive and dynamic data displays.

Previous Courses

DSA Spring 2025 Sections

DSA Spring 2024 Sections

DSA Fall 2023 Sections

DSA Spring 2023 Sections

DSA-495 Fall 2022 Sections

DSA-495 Spring 2022 Sections

DSA-495 Fall 2021 Sections

All Course Descriptions

Advanced Social Network Analysis

Data Internship Preparation for Social Impact

Data Science for Cybersecurity

Data Science for Social Good

Data Wrangling & Web Scraping

Exploratory Data Analysis for Big Data

Exploring Machine Learning

Fusion Fitness for Big Data

Introduction to Data Simulation, Permutation & Augmentation

Introduction to Data Visualization

Introduction to R/Python for Data Science

Machine Learning for Computer Vision

Predictive Analytics for Improving Services

R for Biological Research

Topic Modeling: Clustering Data Through Machine Learning

Biomedical Data Sharing

Clustering Data Through Machine Learning

Data and Ethics

Data in Motion

Data Science for Social Good

Data Physicalization

Data Science for Policy

Data Science for Sustainability

Design Thinking, computational Thinking and Problem-solving through Data Science

Machine Learning for Practitioners

Natural Language Processing

R for Data Science and Visualization

Scientific Programming with Python

Social Media: Data, Ethics and Theory

Text Analytics Using Intermediate Python

Visualization: Tools and Techniques