# Machine Learning in Forestry

Machine Learning (ML) is one of the new buzz words in the field of Data Science. I see machine learning articles, tutorials, and blog posts on websites and LinkedIn almost daily. So what is it, and does it have realistic applications for the forest industry now, or is it just a bunch of analytical hype? Can machine learning improve forest management, wood supply chains, or give us a better understanding of forest ecology. What about improving forest operations, yields, or sustainability? These are some big questions, but as an industry I think we need to be aware of these statistical tools, and consider the possible impact that machine learning can have on our industry. Only then can we decide whether we are ready to invest time, energy and capital in machine learning solutions.

What is Machine Learning?

In this section, I will provide a brief definition and background of machine learning.

How do we define machine learning? I prefer the definition given by the SAS Institute, Inc. which states:

Machine learning is a method of data analysis that automates analytical model building. It’s a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

by SAS Institute, Inc (2020), https://expertsystem.com/machine-learning-definition/

Note the phrase, “with minimal human intervention“? Automation is the reason machine learning has become so popular so quickly, as it becomes integrated into our daily work lives.

Jason Brownlee explains in his eBook “Mastering Machine Learning Algorithms” (source: J Brownlee, 2019), that machine learning aims to make predictions from data by learning a function that maps input variables (X) to an output variable (Y). Brownlee also states, “We don’t know what the function (f) looks like or its form. If we did, we would use it directly and would not need to learn it from data …“. Machine learning is capable of approximating this function, which could be linear or non-linear in form. The target function that most closely maps the input variable(s) to an output variable is often written as follows:

Y = f(X) + e

where Y is the output, X is the input, and e is an error term.

Rarely, does the estimated function predict the output perfectly, but this is okay as long as it gives a prediction that is accurate enough given the circumstances. For example, maybe we don’t need to know exactly how many trees are growing on a plot of land, we just need to be certain that the prediction is within some acceptable standard (e.g.; +/- 5% of the actual value).

In order for these systems to learn and make useful predictions, they need a sufficient amount of data. Compiling and processing these large datasets beforehand is one of the first steps involved in the machine learning process.

Finally, machine learning problems can be categorized as Supervised, Unsupervised, or Semi-Supervised (a hybrid of the previous two). Models are capable of learning from data with labels (Supervised) or data without labels (Unsupervised). According to Brownlee (2019), the goal of supervised machine learning is to approximate the function so well that when the model sees new input data it is able to predict the output variables for it. Brownlee states that the goal of unsupervised learning is to accurately model the underlying data in order to learn more about it. In this form of machine learning, there is “…no correct answers and there is no teacher“.

Common Machine Learning Algorithms

I will spare you the technical details of these algorithms, but the list below includes some of the more common algorithms used in machine learning applications. Foresters will likely recognize a few of these from their coursework in Statistics. If you would like to know more about how these methods work, Jason Brownlee has many resources available on his website, Machine Learning Mastery.

• Linear Regression
• Logistic Regression
• Decision Trees
• Random Forests
• Support Vector Machines
• Naive Bayes
• K-nearest Neighbors
• K-means Clustering
• Gradient Boosting

Applications of Machine Learning in Forestry

In my experience, very few “actual” applications in Forestry already exist, but there are many “potential” applications yet to be developed. The following is a list of applications that I’m currently aware of, or that I found through research. Machine learning is a fairly new and developing science, so many of these applications are still not fully adopted.

• Automation of forest composition, mapping and inventory using remotely sensed data in the form of satellite imagery. Companies currently working on solutions in this area include 20tree.ai, and SilviaTerra.
• Near real-time timber harvest monitoring and forest health assessments (e.g.; drought, insect, disease and storm damage). Companies currently working on solutions in this area include SwiftGeospatial, and 20tree.ai.
• Forest carbon estimation and analysis. Companies current working on solutions in this area include aitree.ltd, and SilviaTerra.
• Combining supervised machine learning models and geographic information (GIS) to predict future wildfire spread. (source: D. Radke, A. Hessler, and D. Ellsworth (2019). Firecast: Leveraging Deep Learning to predict wildfire spread, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19).)
• Autonomous machinery performing forest maintenance tasks such as tree harvesting and direct seeding and pod planting with drones. This is more in the realm of Artificial Intelligence than machine learning, but before this level of automation is possible, machine learning models are required to accurately predict forest inventory or other site conditions. Companies working in this area include Dendra Systems, and the Finnish Forest Centre (source: S. McQueen (Dec 16, 2019). How Artificial Intelligence, Robots Enhance Forest Sustainability in Finland, ESRI blog post).

These are just some examples of companies that are making an impact with machine learning, all of which are important for making a difference in the industry. In the United States, forest companies are adjusting to a shortage of graduate foresters, leading to fewer people managing more resources as compared to previous decades. Machine learning tools offer some hope toward improving efficiencies, thereby eliminating some of the burden of repetitive tasks. In addition, these models can help our understanding of forest ecosystem dynamics, and can shed light on our overall performance as an industry.

Challenges with Machine Learning in Forestry

I’d like to close by addressing some challenges to machine learning in forestry. As Liu et. al. indicated in a recent journal article titled “Applications of machine learning methods in forest ecology“, one of the biggest drawbacks is the lack of suitable data preventing the widespread application of machine learning in forest ecology. These models require more data than traditional process-based models, which increases the cost of collection, storage and processing. The increased availability of high-resolution satellite imagery has helped in this capacity, and has shown to be a reliable source of data for machine learning models. Regardless, data continues to be expensive to collect and maintain.

Another challenge presented by Liu et. al. is the lack of interest and/or understanding surrounding machine learning algorithms by forestry professionals. An understanding of machine learning models and algorithms typically requires a strong background in mathematics and computer science (source: Z. Liu, et. al., (2018). Applications of machine learning methods in forest ecology: recent progress and future challenges. Canadian Science Publishing, Environmental Reviews.)

I’d like to add that the complex interactions within forest ecosystems can be a significant deterrent to applying machine learning models. Our industry relies on cooperatives and academic research to produce advanced models and analytic tools, but not all companies have access to this information. A cooperative effort is needed between private, state, and academic entities to further develop and refine machine learning methods that are open to all. Only then can we learn what works and what doesn’t, and better equip foresters to utilize machine learning models on an operational scale.