Exploring Lambda Functions in Python: A Data Processing Guide
Written on
In this follow-up to my previous article, "What’s in a Lambda?", I will guide you through a practical data processing example utilizing lambda functions in Python. This sequel was inspired by the positive reception of the original piece, where I discussed how lambda functions can streamline data transformation using Pandas.
Previously, I detailed the syntax of lambda functions. Instead of repeating all the specifics, I will present a more complex lambda function as a precursor to our data processing examples.
Consider the following function that squares each element in a numeric list:
def square_list(nums):
squared_list = []
for item in nums:
squared_list.append(item * item)return squared_list
Is it feasible to convert this lengthy procedure into a lambda function? Absolutely! By leveraging the same syntax and utilizing list comprehension, we can simplify it as follows:
square_list = lambda nums: [item * item for item in nums]
And just like that, we’ve transformed our function into a concise, readable line. As a quick recap: the argument nums appears to the left of the colon, while the return value—our entire list comprehension that generates a new squared list—resides on the right. We assign the function name square_list using variable assignment.
Now, let's delve into a concrete example where we apply lambda functions for data processing in a Pandas DataFrame. Suppose you have a DataFrame named my_df containing summary statistics that your employer wants you to analyze further:
mean median standard deviation
0 22 23 1.7
1 33 25 2.8
2 44 40 4.9
3 55 55 2.0
4 66 78 1.0
As you review this DataFrame, you decide to square the standard deviation column to assess the variances more conveniently. With lambda functions, this task can be accomplished in one line:
my_df['standard deviation'] = my_df['standard deviation'].apply(lambda x: x * x)
What’s happening here? The .apply() function receives a lambda function that squares each value in the selected column. This operation effectively modifies each row’s standard deviation value and updates the column with the new squared values. It’s advisable to rename the column to ‘variance’ to reflect the change, although that step is omitted here.
Next, suppose you want to convert the mean column to float values but can't recall the function to do so (hint: it’s my_df['mean'].astype(float)). And, as often happens, you might be too lazy to look it up, or perhaps you’re offline. Lambda functions come to the rescue once more:
my_df['mean'] = my_df['mean'].apply(lambda x: float(x))
At this point, you may wonder if the only application of lambdas is with the .apply() function. The answer is no! Let's explore another complex scenario using the following DataFrame, which lists the grades of two students across various assignments:
name letter grade score
0 Kayla A 92
1 Kayla A 94
2 Kayla A 97
3 Kayla B 81
4 Kayla B 83
5 Kayla B 85
6 Arif A 93
7 Arif B 86
8 Arif A 99
9 Arif B 80
10 Arif A 94
11 Arif B 88
As the semester concludes, we aim to organize this data more clearly and calculate the average score for each letter grade per student. However, according to the syllabus, each student’s lowest score per letter grade is dropped, so only the top two scores will be considered for averaging. This requirement complicates the process, as we can’t simply use the built-in mean() function.
Fortunately, we can achieve this by using lambda functions in tandem with the pandas.pivot_table function in a single line:
grades_df.pivot_table(index='name', columns='letter grade', values='score',
aggfunc=lambda series: (sorted(list(series))[-1] + sorted(list(series))[-2]) / 2)
The resulting pivot table will look like this:
letter grade A B
name
Arif 96.5 87.0
Kayla 95.5 84.0
Let’s break this down step-by-step:
- We create a pivot table with index set to 'name' and columns set to 'letter grade', meaning each row represents a student’s name and each column corresponds to a letter grade.
- The values parameter is set to 'score', instructing Pandas to examine the score values for each combination of name and letter grade.
- Finally, the aggfunc parameter utilizes our lambda function, which sorts each series of scores, selects the top two, and averages them.
Pandas’ ability to execute this task in a single line is indeed impressive. However, this example serves to illustrate the use of lambda functions in Pandas and may not represent the most efficient solution in all cases. In scenarios involving larger lists, defining a traditional multi-line function that sorts only once may be more efficient.
In this case, I prioritized code simplicity over minor time improvements. Always consider the specific data requirements of your use case and decide on the best approach accordingly.
Final Thoughts
While I’ve covered just two scenarios in this article, Pandas offers a multitude of operations that accept optional functions as lambdas. I hope the detailed examples provided here have enhanced your understanding of how to implement lambda functions for data processing. Keep in mind the ultimate goal: to write cleaner, more Pythonic code. By focusing on that, you’ll be well on your way to success.
Until next time, everyone!
Want to master Python? Access exclusive, free guides here. Interested in reading unlimited stories on Medium? Sign up with my referral link below!
<div class="link-block">
<div>
<div>
<h2>Murtaza Ali - Medium</h2>
<div><h3>Read writing from Murtaza Ali on Medium. PhD student at the University of Washington. Interested in human-computer interaction.</h3></div>
<div><p>murtaza5152-ali.medium.com</p></div>
</div>
<div></div>
</div>
</div>
My name is Murtaza Ali, and I am a PhD student at the University of Washington studying human-computer interaction. I enjoy writing about education, programming, life, and the occasional random musing.