Pandas sum() (With Examples) (2024)

The sum() method in Pandas is used to calculate the sum of a DataFrame along a specific axis.

Example

import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6]})

# calculate the sum of each columncolumn_sum = df.sum()

print(column_sum)'''OutputA 6B 15dtype: int64'''

sum() Syntax

The syntax of the sum() method in Pandas is:

df.sum(axis=None, skipna=True, numeric_only=None, min_count=0)

sum() Arguments

The sum() method takes following arguments:

  • axis (optional) - specifies axis along which the sum will be computed
  • skipna (optional) - determines whether to include or exclude missing values
  • numeric_only (optional) - specifies whether to include only numeric columns in the computation or not
  • min_count (optional) - required number of valid values to perform the operation

sum() Return Value

The sum() method returns the sum of the values along the specified axis.

Example 1: Compute Sum Along Different Axis

import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# calculate the sum of each columncolumn_sum = df.sum()

# calculate the sum of each rowrow_sum = df.sum(axis=1)

print("Sum of each column:")print(column_sum)print("\nSum of each row:")print(row_sum)

Output

Sum of each column:A 6B 15C 24dtype: int64Sum of each row:0 121 152 18dtype: int64

In the above example,

  1. column_sum = df.sum() - calculates the sum of values in each column of the df DataFrame. Default axis=0 means it operates column-wise.
  2. row_sum = df.sum(axis=1) - calculates the sum of values in each row of df by setting axis=1, meaning it operates row-wise.

Note: We can also pass axis=0 inside sum() to compute the sum of each column.

Example 2: Calculate Sum of a Specific Column

import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

# calculate the sum of column 'A'sum_A = df['A'].sum()

# calculate the sum of column 'B'sum_B = df['B'].sum()

print("sum of column A:", sum_A)print("sum of column B:", sum_B)

Output

sum of column A: 6sum of column B: 15

In this example, df['A'] selects column A of the df DataFrame, and sum() calculates the sum of its values. The same is done for column B.

Example 3: Use of numeric_only Argument in sum()

import pandas as pd# create a DataFrame with both numeric and non-numeric columnsdata = { 'A': [10, 20, 30, 40], 'B': [5, 3, 2, 1], 'C': ['a', 'b', 'c', 'd'], 'D': [1.5, 2.5, 3.5, 4.5]}df = pd.DataFrame(data)

# sum only the numeric columnssummed = df.sum(numeric_only=True)

print(summed)

Output

A 100.0B 11.0D 12.0dtype: float64

Here, when using numeric_only=True, the sum is calculated only for columns A, B, and D and column C is excluded because it contains string data.

If we hadn't specified any value for numeric_only as

summed_all = df.sum()

The output would be:

A 100B 11C abcdD 12.0dtype: object

Example 4: Effect of skipna Argument on Calculating sum

import pandas as pd# create a DataFrame with NaN valuesdf = pd.DataFrame({ 'A': [1, None, 3], 'B': [4, 5, None], 'C': [7, 8, 9]})

# calculate the sum of each column, ignoring NaN valuessum_skipna_true = df.sum()

# calculate the sum of each column, including NaN valuessum_skipna_false = df.sum(skipna=False)

print("sum with skipna=True (default):")print(sum_skipna_true)print("\nsum with skipna=False:")print(sum_skipna_false)

Output

sum with skipna=True (default):A 4.0B 9.0C 24.0dtype: float64sum with skipna=False:A NaNB NaNC 24.0dtype: float64

In this example,

  • With skipna=True - sums of columns A, B, and C are 4.0, 9.0, and 24.0, respectively, ignoring None values.
  • With skipna=False - sums of columns A and B are NaN due to None values, while C is 24.0.

Example 5: Calculate sums With Minimum Value Counts

import pandas as pd# create a DataFrame with some missing valuesdf = pd.DataFrame({ 'A': [1, None, 3], 'B': [4, 5, None], 'C': [None, None, 9]})

# calculate the sum of each column with min_count set to 1sum_min_count_1 = df.sum(min_count=1)

# calculate the sum of each column with min_count set to 2sum_min_count_2 = df.sum(min_count=2)

# calculate the sum of each column with min_count set to 3sum_min_count_3 = df.sum(min_count=3)

print("sum with min_count=1:\n", sum_min_count_1)print("\nsum with min_count=2:\n", sum_min_count_2)print("\nsum with min_count=3:\n", sum_min_count_3)

Output

sum with min_count=1:A 4.0B 9.0C 9.0dtype: float64sum with min_count=2:A 4.0B 9.0C NaNdtype: float64sum with min_count=3:A NaNB NaNC NaNdtype: float64

Here,

  • When min_count=1, the sum will be calculated if there is at least one non-missing value in the column. Here, all columns meet this criterion.
  • When min_count=2, the sum will be calculated if there are at least two non-missing values in the column.
  • When min_count=3, the sum will be calculated if there are at least three non-NA values in the column. None of the columns meets this criterion, so all results should be NaN.
Pandas sum() (With Examples) (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Msgr. Benton Quitzon

Last Updated:

Views: 6248

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Msgr. Benton Quitzon

Birthday: 2001-08-13

Address: 96487 Kris Cliff, Teresiafurt, WI 95201

Phone: +9418513585781

Job: Senior Designer

Hobby: Calligraphy, Rowing, Vacation, Geocaching, Web surfing, Electronics, Electronics

Introduction: My name is Msgr. Benton Quitzon, I am a comfortable, charming, thankful, happy, adventurous, handsome, precious person who loves writing and wants to share my knowledge and understanding with you.