Python Interview Questions for Data Analysts: What Interviewers Expect in 2026
These Python interview questions for data analysts cover everything from core language fundamentals to pandas, NumPy, EDA, and visualisation — exactly what you will be tested on in technical screenings. Python has become the second must-know language for data analysts after SQL. Companies use Python for automating data pipelines, cleaning large datasets, building exploratory analyses, and creating visualisations. In interviews, Python proficiency is tested through live coding, conceptual questions about libraries, and scenario-based problems that assess your ability to manipulate DataFrames efficiently.
This guide covers the 50 most important Python interview questions for data analyst roles, with concise code examples and explanations. Whether you are preparing for a role at a startup or a top consulting firm, these questions will sharpen your readiness.
Section 1 — Python Fundamentals for Analysts
Q1. What is the difference between a list, tuple, and dictionary in Python?
A list is an ordered, mutable collection that allows duplicates, defined with square brackets. A tuple is ordered and immutable — useful for fixed data like coordinates. A dictionary is an unordered collection of key-value pairs for fast lookups. For data analysis, lists store sequences of values; dictionaries represent records; tuples are used for immutable dataset rows passed between functions.
Q2. What is a lambda function?
A lambda function is an anonymous, single-expression function defined with the lambda keyword. Syntax: lambda arguments: expression. They are commonly used with pandas apply(), map(), and filter() functions. Example: df['tax'] = df['salary'].apply(lambda x: x * 0.3 if x > 50000 else x * 0.2) — applies a tax rate based on a threshold to every value in the salary column.
Q3. What is the difference between deep copy and shallow copy?
A shallow copy creates a new object but inserts references to the same nested objects as the original. Modifying nested elements affects both the copy and the original. A deep copy recursively copies all nested objects, creating a fully independent clone. In pandas, use df.copy(deep=True) (the default) to avoid unintentionally modifying the original DataFrame when working on a copy.
Q4. Explain list comprehension with an example.
List comprehension provides a concise way to create lists. Syntax: [expression for item in iterable if condition]. Example: squares = [x**2 for x in range(10) if x % 2 == 0] generates a list of squares of even numbers from 0 to 9. In data analysis, list comprehensions are used to transform columns, filter values, and create new derived features more concisely than traditional for loops.
Q5. What are Python decorators?
Decorators are functions that modify the behaviour of another function without changing its source code. They use the @symbol. Common uses in data work include caching results with @functools.lru_cache, timing functions for performance monitoring, or adding logging to data pipeline functions. While not heavily tested in junior analyst interviews, knowing decorators demonstrates Python depth valued in senior roles and data engineering discussions.
Section 2 — Pandas & Data Manipulation
Q6. What is a pandas DataFrame and how is it different from a Series?
A DataFrame is a 2-dimensional labelled data structure with rows and columns — like a spreadsheet or SQL table. A Series is a 1-dimensional labelled array — like a single column. When you select one column from a DataFrame (e.g. df['age']), you get a Series. DataFrames support merging, grouping, reshaping, and applying complex transformations across rows and columns using vectorised operations.
Q7. How do you handle missing values in pandas?
Detect: df.isnull().sum() shows null counts per column. Remove rows: df.dropna(). Fill with a constant: df.fillna(0). Fill with column mean: df.fillna(df['col'].mean()). Forward fill (time series): df.fillna(method='ffill'). Interpolate: df['col'].interpolate(). The right approach depends on the column type, percentage of missingness, and whether data is missing randomly or systematically.
Q8. What is groupby() in pandas and how does it work?
groupby() splits a DataFrame into groups based on one or more columns, applies an aggregate or transformation function, and combines the results. Example: df.groupby('department')['salary'].mean() calculates the average salary per department. You can chain multiple aggregations using .agg(): df.groupby('city').agg({'sales':'sum','orders':'count'}). groupby() is the pandas equivalent of SQL GROUP BY and is one of the most used functions in data analysis.
Q9. What is the difference between merge() and concat() in pandas?
merge() combines two DataFrames based on a common key column (like SQL JOIN) — it aligns rows by matching values. concat() stacks DataFrames vertically (row-wise) or horizontally (column-wise) without regard to matching keys. Use merge() to join datasets with a shared identifier; use concat() to append rows from multiple DataFrames with the same schema or to add new feature columns side by side.
Q10. How do you apply a function to every row or column in pandas?
apply() applies a function along an axis: axis=0 (column-wise) or axis=1 (row-wise). applymap() (now map() in newer pandas) applies an element-wise function to every cell in a DataFrame. map() on a Series applies a function or dictionary mapping to each element. For large datasets, vectorised operations using NumPy or pandas built-in methods (e.g. str.upper(), dt.year) are significantly faster than apply() loops.
Section 3 — Data Cleaning & EDA Questions
Q11. What is Exploratory Data Analysis (EDA)?
EDA is the process of analysing datasets to summarise their main characteristics, detect patterns, spot anomalies, and check assumptions before formal modelling. Key EDA steps: (1) check shape, dtypes, and nulls with df.info() and df.describe(); (2) visualise distributions with histograms; (3) check correlations with df.corr() and a heatmap; (4) identify outliers with box plots; (5) explore categorical value counts with df['col'].value_counts(). EDA informs data cleaning decisions and feature engineering.
Q12. What is the difference between loc[] and iloc[] in pandas?
loc[] selects rows and columns by label (index name or column name). iloc[] selects by integer position (0-based index). Example: if the DataFrame has a string index like employee names, df.loc['Alice', 'salary'] gets Alice's salary by label. df.iloc[0, 3] gets the value in the first row, fourth column by position. When the index is an integer, loc[] and iloc[] can return different results if the index does not start at 0.
Q13. How do you detect and handle duplicate rows in pandas?
Detect: df.duplicated().sum() counts duplicate rows; df[df.duplicated()] shows them. Remove: df.drop_duplicates() removes all duplicates keeping the first occurrence. df.drop_duplicates(subset=['email'], keep='last') removes duplicates based on email only, keeping the latest entry. Always inspect duplicates before dropping — they may indicate a merge error, a legitimate multi-row event (like multiple orders), or a data pipeline bug.
Q14. What is vectorisation in NumPy and why is it faster?
Vectorisation means applying operations to entire arrays at once instead of looping element-by-element. NumPy operations like arr * 2 or np.sqrt(arr) execute in compiled C code, bypassing Python's interpreter overhead. This makes vectorised operations 10x to 100x faster than Python for loops on large arrays. In pandas, the same principle applies — use df['col'] * 2 instead of iterating with iterrows() for best performance.
Q15. How do you visualise data in Python for an analyst presentation?
Core libraries: Matplotlib for foundational charts (line, bar, scatter, histogram). Seaborn (built on Matplotlib) for statistical plots (heatmaps, violin plots, pair plots, regression plots) with better default aesthetics. Plotly for interactive charts embeddable in dashboards and notebooks. For analyst presentations, Seaborn heatmaps for correlations, Matplotlib bar charts for comparisons, and Plotly line charts for time series trends are the most common choices.
Interview Tip: Be prepared to write pandas code live. Common tasks: groupby + agg, merge on a key, fill nulls with group means, and filter rows based on multiple conditions. Practise these daily.
Related Free Resources