Comparing Financial Analysis with Excel and Python/Pandas

By JwL on May 11, 2021 • ( 2 Comments )

In the forestry profession, forest analysts tend to naturally gravitate to Microsoft Excel when comparing investment projects. Early in my career, I became very interested in mastering Excel, and learned to build elaborate spreadsheets using Excel’s built-in Visual Basic for Applications (VBA) language. As I became more experienced with Excel, I learned that it has some major limitations. Some of these include issues with sharing workbooks, inflexible worksheet configurations, and complex nested formulas.

I want to compare my experience with Excel to analysis with Pandas, a popular library for Python programmers. I’ll provide examples using a common financial project called a Discounted Cash Flow (DCF) analysis. I will demonstrate both the traditional Excel workbook approach and the Python/Pandas approach. We should struggle against the tendency to use one approach for every situation. Later, I’ll highlight some of the benefits and drawbacks of each.

Python / Pandas

Most people are familiar with Microsoft Excel, but for those who are new to Python or Pandas we should understand that Python is a general purpose programming language that can be used for web development, statistical and data analysis, machine learning, scientific research, and much more.

Pandas was created in 2008, by Wes McKinney at AQR Capital Management. It is an open-source, cross-platform library designed to work with tabular data in a fast and efficient manner using 2D structures similar to the worksheet, called a DataFrame. In just a few short years, Pandas has quickly grown to become one of the most popular data analysis libraries used with Python.

Example Data – Cash Flow Table

For the DCF example, we will use a planted forest rotation scenario culminating with final harvest at age 28. This will be our investment horizon. The table below shows real prices for this scenario assuming a constant 2% rate of inflation, and no appreciation for stumpage prices.

Forest Finance Example using Excel

An example of the Excel version of the DCF analysis is provided in the image below. The associated Worksheet contains a VBA Macro which calculates additional metrics about the cash flows including the Payback Period, Net Present Value, Net Future Value, and Internal Rate of Return. Land Expectation Value was not included in this Worksheet, but could easily be added.

This is a typical Spreadsheet solution that an Analyst might use to model a Forest Investment project. All of the information is displayed in a concise manner, with nice visual formatting. The required inputs to the model are the Alternative ROR (aka Discount Rate), the Annual Inflation Rate, the project year as the Constant Dollar Basis, and the Age of Final Harvest. The six columns on the right side of the tabular display, starting with “Revenue/Acre”, are the formula cells. These cells have the highest potential to introduce errors, and can be locked to avoid tampering. Locking means that the user will need a password provided by the Worksheet author to edit and/or visualize the formula calculations.

Observations about the Excel Workbook

Excel is a popular software tool for working with structured data in a tabular format, and I use it often in work and personal projects. Excel is wonderful and powerful tool for performing complex calculations and data analysis, but as I found there is a diminishing return to its usefulness.

Below are some of the drawback’s I’ve seen when working with complex spreadsheets.

Workbooks aren’t easily shared. Granted, most working professionals have access to Excel or a similar spreadsheet program, but one cannot assume that spreadsheets are ubiquitous. Sharing Workbooks can cause problems when there are multiple versions floating around, or when they contain personal or proprietary information. Often novice users will attempt to hide cells and/or Worksheets that contain sensitive information with hopes that no one will notice. This could potentially be troublesome for an individual or a company.
Workbooks can be inflexible. What I mean by this is if someone shares a Workbook with you, depending on how complex it is, it may be difficult to edit the format and formulas without risking errors. As indicated above, there are ways to protect cells, ranges, and worksheets from editing, but this limits the usefulness and verifiability of the calculations.
Macros can cause security concerns. Excel macros pose valid security concerns for the organization, as they could potentially contain malicious code. If you’ve ever opened a Workbook containing macros, you may have seen the popup warning that you should proceed only if the Workbook is from a trusted source. I believe this risk is pretty low in most business cases, but macros have the ability to access and edit the computer Registry and therefore can cause significant damage to the Operating System on the local machine.
Formulas are difficult too read and are error prone. Formulas with broken references, or deeply nested formulas, can cause major issues. Formulas that return an unintended result can impact your business decisions. Verifying results and ensuring formulas aren’t changed inadvertently is a tedious process. Secondly, cell formulas can be difficult to understand, especially when they are deeply nested or missing a required reference. Ambiguous cell references such as “A5” and “B10” make little sense unless they are named. And troubleshooting formulas written by others can be tricky at best. Spreadsheets have improved to provide better visual references, but with complex formulas this can still be an daunting process.

Python and Pandas Solution

The Python/Pandas solution requires a steeper learning curve, and setup of the initial model parameters (in this case our rates), and loading the data file into memory. The data file stored as a CSV contains three columns named “Year”, “Activity” and “Cash Flow” respectively. Each row in the data file represent a single input to the model from our cash flow table above.

# Import the required Modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Establish the model rates
discount_rate = 0.045
inflation_rate = 0.02

# import the dataset from the provided CSV located in the current working directory
file_path = 'dcf_data.csv'
data = pd.read_csv(file_path)

The following output shows the Pandas DataFrame created by the Pandas read_csv() method, which handled the import of our CSV file.

The next part includes demonstrates how to transform the cash flows into real terms, create a Cumulative Cash Flow, NPV, and Cumulative NPV columns. Note how this provides a transparent glimpse into all the model functionality in just a few lines of code, without having to search through endless cell formulas. In addition to the cell formulas, the spreadsheet macro contains approximately 70 lines of code to provide similar output. Also note, that this is much easier to audit and fix later than the spreadsheet approach.

# transform the cash flows to real dollars - currently 
# in nominal dollars
data['Cash_Flow_Real'] = data['Cash_Flow'] * 
      (1 + inflation_rate) ** data['Year']

# add a cumulative cash flow column
data['Cumulative_Cash_Flow'] = data['Cash_Flow_Real']
      .cumsum()

# add NPV column
data['NPV'] = data['Cash_Flow_Real'] / (1 + discount_rate)
      ** data['Year']

# add a cumulative NPV column
data['Cumulative_NPV'] = data['NPV'].cumsum()

# Code Source: Brad Solomon's answer on StackOverflow at 
# https://stackoverflow.com/questions/46203735/
# calculating-variable-cash-flow-irr-in-python-pandas
import numpy as np
from scipy.optimize import fsolve

def npv(i, cfs, yrs):  
    return np.sum(cfs / (1. + i) ** yrs)

def irr(cfs, yrs, x0, **kwargs):
    return fsolve(npv, x0=x0, args=(cfs, yrs), **kwargs).item()

cf = data['Cash_Flow_Real']
yrs = data['Year']
irate = irr(cf, yrs, x0=0.001) * 100


# Present a summary
npv = data['NPV'].sum()
payback = data[data['Cumulative_NPV'] > 0].Year.values[0]

print(f'OUTPUT:\nNPV: ${npv:.2f}\nPayback Period: 
      {payback} years\nIRR: {irate:.2f}%')

Now when we output the Pandas DataFrame, it looks similar to the spreadsheet tabular form.

Lastly, we will add a horizontal bar chart for the Cumulative NPV. This helps to visualize the NPV over time and the Payback Period (i.e.; where NPV becomes a positive value). The plotting is done using the Matplotlib library.

# Create the Cumulative NPV Bar Chart
fig, ax = plt.subplots(figsize=(8,6))

ax.barh(data['Year'], data['Cumulative_NPV'], color='orange')
ax.set_xlabel('NPV (Dollars/Acre)')
ax.set_ylabel('YEAR')
ax.set_title("Cumulative Net Present Value 
      - 28 year Rotation Example.")
plt.grid(which='major', axis='x')
plt.show()

The visual chart helps to quickly show that our cumulative NPV is between $850-$900 per acre and our Payback Period is between 20-25 years (22 to be exact).

Observations about the Python/Pandas Approach

First off, even with Python’s growing popularity, it is not the end-all, be-all software tool for data analysis. As I mentioned in the introduction, analysts should have multiple tools to draw from. Personally, I like what Python brings to the table, and I prefer to use it over Excel in most situations. But, I am comfortable working with programming languages so the complexity of Python does not overwhelm me. For analysts who are non-coders, Python is probably not the best option to reach for.

Python is not for everyone – The Python language adds more complexity to the Analysis, but is more flexible. Python can help analysts become more productive and automate tasks once the initial development is completed. In addition this code can be linked to other systems for a consistent workflow.
Python is dynamic and portable– because Python runs on multiple platforms it can be used in different environments without needing spreadsheet software installed. Python also runs with ArcGIS, meaning the models you develop can be incorporated into GIS workflows.
Python allows robust testing and debugging – many popular development environments provide debugging capabilities for Python. A couple that come to mind are PyCharm and VSCode. These added tools allow developers to step into code line-by-line and see what’s going on behind the scenes. Python also provides testing frameworks such as Unittest and PyTest.
Python / Pandas are more efficient – once you get the hang of the code syntax, working with Pandas is often much faster than creating a new spreadsheet for each project. As shown above, the Python version required fewer lines of code overall, and the process can be replicated over and over again with little setup. This is an extremely efficient workflow.
Python is fun to learn – if you enjoy learning, Python is a great career building resource.

I hope that you learned something useful about automating analysis with Python/Pandas, and that you will explore new ways to enhance your forestry analysis projects in the future.

Categories: Forestry Apps Pandas DataFrame Python

Tagged as: Data Analysis Excel Pandas Python

JwL

2 replies ›

Johan Ekenstedt says:

May 11, 2021 at 1:04 pm

Thanks for a comprehensive guide.

Pandas is really powerful and fast. It is often much faster than working in GIS-applications like Arcgis Pro that sometimes takes very long time to just calculate values in one column.

Keep on posting articles about python and forestry!

LikeLike

1. JwL says:
  
  May 12, 2021 at 2:06 am
  
  Excellent Point. Thank you Johan!
  
  LikeLike

The Forest Technology Blog

Technology from a forester's perspective.

Comparing Financial Analysis with Excel and Python/Pandas

JwL

2 replies ›

Leave a comment Cancel reply

Comparing Financial Analysis with Excel and Python/Pandas

Share this:

Related

JwL

2 replies ›

Leave a comment Cancel reply