Machine Learning (ML) is one of the new buzz words in the field of Data Science. I see machine learning articles, tutorials, and blog posts on websites and LinkedIn almost daily. So what is it, and does it have realistic applications for the forest industry now, or is it just a bunch of analytical hype? Can machine learning improve forest management, wood supply chains, or give us a better understanding of forest ecology. What about improving forest operations, yields, or sustainability? These are some big questions, but as an industry I think we need to be aware of these statistical tools, and consider the possible impact that machine learning can have on our industry. Only then can we decide whether or not we are ready to invest time, energy and capital into new solutions of machine learning in forestry.
What is Machine Learning?
In this section, I will give a brief definition and background of machine learning.
First, how do we define machine learning? I prefer the definition given by the SAS Institute, Inc. which states:
Machine learning is a method of data analysis that automates analytical model building. It’s a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.by SAS Institute, Inc (2020), https://expertsystem.com/machine-learning-definition/
Did you catch that last part, “with minimal human intervention“? This is the reason machine learning has become so popular as it becomes integrated into daily work processes through automation.
Jason Brownlee explains in his eBook “Mastering Machine Learning Algorithms” (source: J Brownlee, 2019), that machine learning aims to make predictions from data by learning a function that maps input variables (X) to an output variable (Y). Brownlee also states, “We don’t know what the function (f) looks like or its form. If we did, we would use it directly and would not need to learn it from data …“. Machine learning is capable of approximating this function, which could be linear or non-linear in form. The target function that most closely maps the input variable(s) to an output variable is often written as follows:
Y = f(X) + e
where Y is the output, X is the input, and e is an error term.
Rarely, does the estimated function predict the output perfectly, but this is okay as long as it gives a prediction that is accurate enough given the circumstances. For example, maybe we don’t need to know exactly how many trees are growing on a plot of land, we just need to be certain that the prediction is within some acceptable standard (e.g.; +/- 5% of the actual value).
In order for these systems to learn and make useful predictions, they need a sufficient amount of data that is clean and tidy. Compiling and processing these large datasets beforehand is one of the first steps involved in the machine learning process.
Finally, machine learning problems can be categorized as Supervised, Unsupervised, or Semi-Supervised (a hybrid of the previous two). Models are capable of learning from data with labels (Supervised) or data without labels (Unsupervised). According to Brownlee (2019), the goal of supervised machine learning is to approximate the function so well that when the model sees new input data it is able to predict the output variables for it. Brownlee states that the goal of unsupervised learning is to accurately model the underlying data in order to learn more about it. In this form of machine learning, there is “…no correct answers and there is no teacher“.
Applications of Machine Learning in Forestry
To be honest, very few “actual” applications in Forestry already exist, but there are many “potential” applications yet to be discovered. The following is a list of applications that I’m currently aware of, or that I found through a Google web search. Machine learning is a fairly new and developing science, so many of these applications are still not fully developed.
- Automation of forest composition, mapping and inventory using remotely sensed data in the form of satellite imagery. Companies currently working on solutions in this area include 20tree.ai, and SilviaTerra.
- Near real-time timber harvest monitoring and forest health assessments (e.g.; drought, insect, disease and storm damage). Companies currently working on solutions in this area include SwiftGeospatial, and 20tree.ai.
- Forest carbon estimation and analysis. Companies current working on solutions in this area include aitree.ltd, and SilviaTerra.
- Combining supervised machine learning models and geographic information (GIS) to predict future wildfire spread. (source: D. Radke, A. Hessler, and D. Ellsworth (2019). Firecast: Leveraging Deep Learning to predict wildfire spread, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19).)
- Autonomous machinery performing forest maintenance tasks such as tree harvesting and direct seeding and pod planting with drones. This is more in the realm of Artificial Intelligence than machine learning, but before this level of automation is possible, machine learning models are required to accurately predict forest inventory or other site conditions. Companies working in this area include Dendra Systems, and the Finnish Forest Centre (source: S. McQueen (Dec 16, 2019). How Artificial Intelligence, Robots Enhance Forest Sustainability in Finland, ESRI blog post).
These are just some examples of companies are making an impact with machine learning, all of which are important uses for the progression of the industry. In the United States, forest companies are adjusting to a shortage of qualified and trained forest workers, leading to fewer people managing more resources as compared to previous decades. Machine learning tools offer some hope toward improving efficiencies, thereby eliminating some of the burden of repetitive tasks. To boot, these models can help our understanding of forest ecosystem dynamics, and shed light on our overall performance as an industry.
Challenges with Machine Learning in Forestry
I’d like to close by speaking a little about the challenges to machine learning in forestry. As Liu et. al. indicated in a recent journal article titled “Applications of machine learning methods in forest ecology“, one of the biggest drawbacks is the lack of suitable data preventing the widespread application of machine learning in forest ecology. These models require more data than traditional process-based models, which increases the cost of collection, storage and processing. The increased availability of high-resolution satellite imagery has helped in this capacity, and has shown to be a reliable source of data for machine learning models. Regardless, data continues to be expensive to collect and maintain.
Another challenge presented by Liu et. al. is the lack of interest and/or understanding surrounding machine learning algorithms by forestry professionals. An understanding of machine learning models and algorithms typically requires a strong background in mathematics and computer science (source: Z. Liu, et. al., (2018). Applications of machine learning methods in forest ecology: recent progress and future challenges. Canadian Science Publishing, Environmental Reviews.)
I’d like to add that the complex interactions within forest ecosystems can also be a significant deterrent to applying machine learning models within the forest industry. Our industry has historically relied on research cooperatives and academia to produce advanced mathmatical models and analytic tools, and not all companies have access to this information. A cooperative effort is needed between private, state, and academic entities to further develop and refine machine learning methods for forest uses, that are available to all. Only then can we learn what works and what doesn’t, and better equip foresters to utilize machine learning models on an operational scale.