Timber inventory (i.e., timber cruise) data can be stored in various ways. I find that it’s best to store this type of data in a relational database. A Geospatial database is another suitable choice after the data is cleaned and ready for analysis. Processing bottlenecks can occur when timber cruising contractors collect and send cruise data via spreadsheets. Using other common file formats can also cause data isolation issues. Storing the files in a database format provides a single repository that is easily maintained over time.
Databases offer data integrity, secure and concurrent access, and robust queries. Moving data from common files to a database is beneficial for optimized storage. It also enables fast queries. This is especially true when storing a large volume of data. Excel has a 1,048,576 row limit. In contrast, a PostgreSQL table is limited only by disk space, or 32 TB max,
Advanced users can also run programming scripts to clean and organize the data before copying it to the database. This process is known as ETL (Extract Transform Load). If the data was collected using reputable forest inventory software it should be in a clean and tidy format. In some cases, we can skip the transform step altogether. The solution explained below assumes that these steps are already completed.
This solution copies data stored in CSV format to a PostgreSQL database using the copy command. It is important that the data is in a structured format using the same column headings throughout all files.
The copy command is specific to PostgreSQL, but most modern relational databases offer similar functionality (e.g.; BULK INSERT – SQL Server, and LOAD DATA INFILE – MySQL)
The copy command is included with psql as a client-based copy command. Note that this command is not the same as the server based copy command in PostgreSQL. The psql utility is a command-line interface for working with PostgreSQL databases. I prefer working in the terminal where repetitive tasks are more efficient than working with a GUI program like PgAdmin.
The first step is to create a new database. Then, create a table with the same schema, including columns and data types, as in the CSV file(s). Many examples exist in the PostgreSQL documentation for creating a database and a table.
Next, login through the psql utility at a command prompt. This could be Terminal on Mac, or a command line on Windows. Then, type the following command.
psql -U postgres -d MyDatabaseName
This command says login to the database MyDatabaseName using the postgres user. You should substitute your username and database here. You may also be prompted to enter a user password, depending on the security setup in the database.
Log in to the database through psql and enter the copy command at the psql database prompt. Make sure to pass in the directory path and include the filename specific to the file for import.
\copy TO tablename FROM '/path/to/your/file.csv' WITH DELIMITER ',' CSV HEADER
The “tablename” parameter corresponds to the table in your database. If the copy command is successful you should see an summary of the number of rows imported. Just “rinse and repeat” for each file you want to import.
This is a simple solution for long-term storage of timber cruise data. It is effective and works for all types of structured files. An faster and better solution would be to write a custom Python script. Scripts can iterate through files stored on disk, extract the data, and write it to the database in each pass. Maybe I will include an example of that in a later post.
Categories: Data Analytics Forest Inventory Record Storage
Leave a comment