The sum()
method in Pandas is used to calculate the sum of a DataFrame along a specific axis.
Example
import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6]})# calculate the sum of each columncolumn_sum = df.sum()
print(column_sum)'''OutputA 6B 15dtype: int64'''
sum() Syntax
The syntax of the sum()
method in Pandas is:
df.sum(axis=None, skipna=True, numeric_only=None, min_count=0)
sum() Arguments
The sum()
method takes following arguments:
axis
(optional) - specifies axis along which the sum will be computedskipna
(optional) - determines whether to include or exclude missing valuesnumeric_only
(optional) - specifies whether to include only numeric columns in the computation or notmin_count
(optional) - required number of valid values to perform the operation
sum() Return Value
The sum()
method returns the sum of the values along the specified axis.
Example 1: Compute Sum Along Different Axis
import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})# calculate the sum of each columncolumn_sum = df.sum()
See Also5 Best Ways to Group and Calculate the Sum of Column Values in a Pandas DataFrame – Be on the Right Side of ChangePandas Sum DataFrame Columns With Examples5 Best Ways to Sum a Specific Column of a DataFrame in Pandas Python – Be on the Right Side of Change# calculate the sum of each rowrow_sum = df.sum(axis=1)
print("Sum of each column:")print(column_sum)print("\nSum of each row:")print(row_sum)
Output
Sum of each column:A 6B 15C 24dtype: int64Sum of each row:0 121 152 18dtype: int64
In the above example,
column_sum = df.sum()
- calculates the sum of values in each column of the df DataFrame. Defaultaxis=0
means it operates column-wise.row_sum = df.sum(axis=1)
- calculates the sum of values in each row of df by settingaxis=1
, meaning it operates row-wise.
Note: We can also pass axis=0
inside sum()
to compute the sum of each column.
Example 2: Calculate Sum of a Specific Column
import pandas as pd# create a DataFramedf = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})# calculate the sum of column 'A'sum_A = df['A'].sum()
# calculate the sum of column 'B'sum_B = df['B'].sum()
print("sum of column A:", sum_A)print("sum of column B:", sum_B)
Output
sum of column A: 6sum of column B: 15
In this example, df['A']
selects column A
of the df DataFrame, and sum()
calculates the sum of its values. The same is done for column B
.
Example 3: Use of numeric_only Argument in sum()
import pandas as pd# create a DataFrame with both numeric and non-numeric columnsdata = { 'A': [10, 20, 30, 40], 'B': [5, 3, 2, 1], 'C': ['a', 'b', 'c', 'd'], 'D': [1.5, 2.5, 3.5, 4.5]}df = pd.DataFrame(data)# sum only the numeric columnssummed = df.sum(numeric_only=True)
print(summed)
Output
A 100.0B 11.0D 12.0dtype: float64
Here, when using numeric_only=True
, the sum is calculated only for columns A
, B
, and D
and column C
is excluded because it contains string data.
If we hadn't specified any value for numeric_only
as
summed_all = df.sum()
The output would be:
A 100B 11C abcdD 12.0dtype: object
Example 4: Effect of skipna Argument on Calculating sum
import pandas as pd# create a DataFrame with NaN valuesdf = pd.DataFrame({ 'A': [1, None, 3], 'B': [4, 5, None], 'C': [7, 8, 9]})# calculate the sum of each column, ignoring NaN valuessum_skipna_true = df.sum()
# calculate the sum of each column, including NaN valuessum_skipna_false = df.sum(skipna=False)
print("sum with skipna=True (default):")print(sum_skipna_true)print("\nsum with skipna=False:")print(sum_skipna_false)
Output
sum with skipna=True (default):A 4.0B 9.0C 24.0dtype: float64sum with skipna=False:A NaNB NaNC 24.0dtype: float64
In this example,
- With
skipna=True
- sums of columnsA
,B
, andC
are 4.0, 9.0, and 24.0, respectively, ignoringNone
values. - With
skipna=False
- sums of columnsA
andB
areNaN
due toNone
values, whileC
is 24.0.
Example 5: Calculate sums With Minimum Value Counts
import pandas as pd# create a DataFrame with some missing valuesdf = pd.DataFrame({ 'A': [1, None, 3], 'B': [4, 5, None], 'C': [None, None, 9]})# calculate the sum of each column with min_count set to 1sum_min_count_1 = df.sum(min_count=1)
# calculate the sum of each column with min_count set to 2sum_min_count_2 = df.sum(min_count=2)
# calculate the sum of each column with min_count set to 3sum_min_count_3 = df.sum(min_count=3)
print("sum with min_count=1:\n", sum_min_count_1)print("\nsum with min_count=2:\n", sum_min_count_2)print("\nsum with min_count=3:\n", sum_min_count_3)
Output
sum with min_count=1:A 4.0B 9.0C 9.0dtype: float64sum with min_count=2:A 4.0B 9.0C NaNdtype: float64sum with min_count=3:A NaNB NaNC NaNdtype: float64
Here,
- When
min_count=1
, the sum will be calculated if there is at least one non-missing value in the column. Here, all columns meet this criterion. - When
min_count=2
, the sum will be calculated if there are at least two non-missing values in the column. - When
min_count=3
, the sum will be calculated if there are at least three non-NA values in the column. None of the columns meets this criterion, so all results should beNaN
.