DSC-2026-03 | Applied Text Mining Using Python
Wann?
08. & 09. April
09:30-12:30 Uhr & 14:00-17:00 Uhr
Wo?
Campus
MZH | Raum 4290
Trainer*innen
Dr. Maryam Movahedifar
Data Science Center, Universit?t Bremen
Anzahl Teilnehmende: Max. 20
Sprache: Englisch
Why is the topic important?
Given the rapid rate at which text data are being digitally gathered in many domains of science, there is a growing need for automated tools that can analyze, classify, and interpret this kind of data. Text mining techniques can be applied to create a structured representation of text, making its content more accessible for researchers. Applications of text mining are everywhere: social media, web search, advertising, emails, customer service, healthcare, marketing, etc.
This course offers an extensive exploration into text mining with Python. The course has a strong, practical, hands-on focus, and students will gain experience in using text mining and interpreting results using real data from, for example, social sciences and healthcare. Through lectures and practicals, the students will learn the necessary skills to design, implement, and understand their own text mining pipeline.
Workshop Goal
The course deals with the following topics:
- Review the fundamental approaches to text mining;
- Understand and apply current methods for analyzing texts;
- Define a text mining pipeline given a practical data science problem;
- Implement all steps in a text mining pipeline: feature extraction, feature selection, model learning, model evaluation;
- Understand and apply state-of-the-art methods in text mining;
The course starts with reviewing basic concepts of text mining and implementing advanced concepts in natural language processing.
Workshop Content
DAY 1: Introduction to Applied Text Mining
- Part 1:
- What is text mining?
- Text preprocessing
- Vector space model
- Practical
- Part 2:
- Classification basics
- Text classification algorithms
- Evaluating classifiers
- Practical
DAY 2: Text Classification and Feature Selection in Text
- Part 3:
- How to do feature selection (FS) for text data?
- Is PCA a FS method for text?
- Other methods?
- Practical
- Part 4:
- What is text clustering?
- What are the applications?
- How to cluster text data?
- Practical
Target Audience & Prior Knowledge
This course is ideal for learners who are comfortable with Python programming, wish to acquire skills in text mining approaches, and have a foundational understanding of machine learning. Participants from various fields such as sociology, psychology, education, human development, marketing, business, biology, medicine, political science, and communication sciences will find this course beneficial.
Technical Requirements
- Participants are requested to bring their own laptop for the lab meetings and make sure that they have an Internet connection to be able to use Python in Jupiter hub.
- Participants should have a basic knowledge of data science and programming and a motivation for scripting and programming in Python.
About the Trainer
Dr. Maryam Movahedifar is a data scientist for training and consulting at the DSC.

Maryam Movahedifar holds a PhD in Statistics and has extensive experience in Interpretable Machine Learning. With a strong foundation in statistical methods and practical experience in applying these techniques to real-world problems, she is well-equipped to teach complex machine learning concepts. Her expertise includes making advanced models understandable and accessible.

