Pandas Interview Questions for Data Science
Today, I will cover some Pandas Interview questions that I have sourced from different websites and are useful for interviews.
Q1:How to create new columns derived from existing columns in Pandas?
- We create a new column by assigning the output to the DataFrame with a new column name in between the
[]
. - Let’s say we want to create a new column
'C'
whose values are the multiplication of column'B'
with column'A'
. The operation will be easy to implement and will be element-wise, so there's no need to loop over rows.
- Also, other mathematical operators (
+
,-
,\*
,/
) or logical operators (<
,>
,=
,…
) work element-wise. But if we need more advanced logic, we can use arbitrary Python code viaapply()
. - Depending on the case, we can use
rename
with a dictionary or function to rename row labels or column names according to the problem.
Q2: How are iloc()
and loc()
different?
DataFrame.iloc
is a method used to retrieve data from a Data frame, and it is an integer position-based locator (from 0 to length-1 of the axis), but may also be used with a boolean array. It takes input as an integer, arrays of integers, a slice object, a boolean array, and functions.
DataFrame.loc
gets rows (and/or columns) with particular labels. It takes input as a single label, list of arrays, and slice objects with labels.
Q3:What are the operations that the Pandas Groupby method is based on?
- Splitting the data into groups based on some criteria.
- Applying a function to each group independently.
- Combining the results into a data structure.
Q4: How to check whether a Pandas DataFrame is empty?
You can use the attribute df.empty
to check whether it's empty or not:
Q5: How does the groupby()
method works in Pandas?
- In the first stage of the process, data contained in a pandas object, whether a
Series
,DataFrame
, or otherwise, is split into groups based on one or more keys that we provide. - The splitting is performed on a particular axis of an object. For example, a
DataFrame
can be grouped in its rows(axis=0)
or its columns(axis=1)
. - Once this is done, a function is applied to each group, producing a new value. Finally, the results of all those function applications are combined into a result object. The form of the resulting object will usually depend on what’s being done to the data.
- In the figure below, this process is illustrated for a simple group aggregation.
Q6 :What Is Time Series In pandas?
A time series is an ordered sequence of data which basically represents how some quantity changes over time. pandas contains extensive capabilities and features for working with time series data for all domains.
pandas supports:
- Parsing time series information from various sources and formats
- Generate sequences of fixed-frequency dates and time spans
- Manipulating and converting date time with timezone information
- Resampling or converting a time series to a particular frequency
- Performing date and time arithmetic with absolute or relative time increments