Geosoft Databases

What you will learn

  1. Basic understanding of a Geosoft databases.
  2. How to create a new database from data stored in a CSV file.
  3. How to call a utility function from the geosoft.gxapi module, including working with symbol locks.
  4. How to read, modify and save new data to a database.
  5. How to apply an expression to channels of a database.

What is a Geosoft database?

A Geosoft database stores 3D spatial data in a form that allows for both very-large volume data storage and very efficient processing.  Geosoft databases are stored on a file system in a file that has extension .gdb.

The Geosoft database was first designed in 1992 and has evolved to become a de facto standard for airborne and ground geophysical data, geochemical data, and drillhole data of all kinds.  All Geosoft programs and third-party applications that use the Geosoft API to read databases are able to work with databases created by any version of Geosoft since 1992, though database features and capabilities that are newer than the reading program will not be available to older programs. This cross-version stability of data has been key to the emergence of Geosoft databases as an industry standard, especially for airborne geophysical surveys.

Fundamentally, Geosoft databases store located data which has either an (x, y) location for 2D data, or an (x, y, z) location for 3D data such that each location may have any number of fields of associated information.  Information fields are referred to as 'channels' in the context of a Geosoft Database, and channels may contain numeric data, strings or arrays of numeric data to store things like time-series.  For example an airborne magnetic survey may have (x, y, elevation, altimeter, mag) data, and a ground geochemical survey may contain (x, y, elevation, Cu, Au, Hg).

Data in a database is grouped into the concept of 'lines' of data such that each 'line' contains the data from a single airborne survey line, a single ground survey line, a marine ship track, a set of data from a single drill hole, or from any other grouping construct that is suitable for the stored data.  Geosoft's desktop application, Oasis montaj, displays a single line of data in a spreadsheet-like form to the user, allowing a user to change lines and change which channels are displayed and the order channels are displayed. Simple databases might place all data into a single line.  All lines in a Geosoft database share the channel definitions, of course with data different for each line. 

As you work through the exercises in this tutorial you will be introduced to other concepts of Geosoft databases.

Create a new database from a CSV-format data file

A very common form for exchanging located data is in a simple ASCII Comma-Separated-Values (CSV) file. In this exercise we will create a new Geosoft database and import an example CSV file.

The CSV file mag_data.csv contains located data as follows:

mag_data.csv
# (X, Y) are on NAD83, UTM Zone 15N
X,Y,Z,mag
330000,6629200,211.055369153511,5535
330050,6629200,210.654999327171,5500
330100,6629200,210.286555185181,5476
330150,6629200,209.952679145607,5479
330200,6629200,209.655742280775,5496
330250,6629200,209.397817201372,5544
etc...

In this example, the first line is simply a comment line that identified the spatial coordinate system for the located data. The second line identified the names of the data columns, and data follows starting in line 3. Note that columns of information after the first line are comma-delimited, and each line has 4 columns of information.

Let's write a Python script that will import this data into a new geosoft database.  Our initial expectation is to import all the data into a single line, and establish the coordinate system of the spatial channels:

import_csv.py
import numpy as np
import geosoft.gxpy as gxpy
import geosoft.gxpy.gdb as gxdb

#create context
gxc = gxpy.gx.GXpy()

# Open csv-format data file and skip the first line, which is a comment line
f = open('mag_data.csv', 'r')
f.readline()

# the second line contains the channel/field names, from which we create a list of channel names
channel_names = f.readline().strip().split(',')

#the rest of the file contains data, which we load into a numpy float array
data = np.loadtxt(f, delimiter=',')

#create a new database from list of channels and numpy data. All data is stored in a single line.
gdb = gxdb.Geosoft_gdb.new('mag_data', overwrite=True)
line_name = gxdb.create_line_name(0, gxdb.LINE_TYPE_RANDOM, 0)
gdb.write_line(line_name, data, channel_names)

# set the coordinate system to 'NAD 83 / UTM zone 15N'
gdb.coordinate_system = 'NAD83 / UTM zone 15N'

# set the mag data units to 'nT'
gxdb.Channel(gdb, 'mag').unit_of_measure = 'nT'

exit()
line 1numpy is the standard library used to work with numerical multi-dimensioned data in Python, and Geosoft modules are designed to most easily work with numpy data.
line 3the geosoft.gxpy.gdb module works with Geosoft database.
line 9open the csv file and skip over the first line
line 13the second line contains the names of the data in each column. We will use these names as the channel names in the Geosoft database. Here we read the line, strrip the new-line character and create a list of names by splitting the line using the ',' character as the delimiter.
line 16the numpy module has a versatile method named loadtxt that can be used to import simple ASCII text files. This function returns all the data stored in a 2D numpy array, which will be dimensioned (12670, 4), which means there are 1267 rows of data with 4 data items per row.
line 19here we create a new empty database named 'mag_data', which will result in a file named 'mag_data.gdb'.
line 20all the data will be stored in a single line, and we will accept the default line name generated by the create_line_name function, which by default will be 'L0'.
line 21the write_line method will create channels from the channel_names list and store data from each column of numpy array into each channel respectively.
line 24in Geosoft it is important to identify the coordinate system of all spatial data as soon as data is imported. Once identified, Geosoft applications will automatically convert spatial locations to the coordinate systems of a required context.
line 27

Best practice it to identify data units of measurement in the database.

Opening this database from a Geosoft Desktop application will show the 'L0' line:

Split the data into lines

If we plot the data as a line on a map we see that the data in Line 0 (L0E) would be better organized as separate survey lines.  In this case the lines are horizontal, 200 m apart with sampled every 50 m along each line:

Lets modify the script by adding some processing that will use a Geosoft method to split a line into separate lines based on a change in line direction. To split a line into sections we will use a processing function from the GXAPI, which provides the base-level access to the full Geosoft API. The GXAPI function interfaces are not "pythonic", and are more verbose and demand more care when called.

We will use the  geosoft.gxapi.GXDU.split_line_xy2 function (GXDU contains many Database Utility functions):

From geosoft.gxapi.GXDU

static split_line_xy2((GXDB)arg1(int)arg2(int)arg3(int)arg4(int)arg5(float)arg6(float)arg7(int)arg8(int_ref)arg9(int)arg10(int)arg11) 

Break up a line based on tolerance of lateral and horizontal distance, with options for the output line names.
Parameters:
  • arg1 (geosoft.gxapi.GXDB) – Database
  • arg2 (int) – Line to be broken up [DB_LOCK_READONLY]
  • arg3 (int) – Channel X [DB_LOCK_READWRITE]
  • arg4 (int) – Channel Y [DB_LOCK_READWRITE]
  • arg5 (int) – Line direction, 0-any, 1-X, 2-Y.
  • arg6 (float) – Lateral tolerance, DUMMY for the default (10% of the separation between the first two points.
  • arg7 (float) – Downline Tolerance, DUMMY for none
  • arg8 (int) – DU_SPLITLINE constants
  • arg9 (geosoft.gxapi.int_ref) – First line in the sequence, for DU_SPLITLINE_SEQUENTIAL. On return, the next line in the sequence.
  • arg10 (int) – Increment in the line number sequence, for DU_SPLITLINE_SEQUENTIAL
  • arg11 (int) – Reset starting fiducials to zero (0: No, 1: Yes)

Note

The geosoft.gxapi functions often require locks on database symbols (handles to lines, channels and other database objects), which is the case for arguments 2, 3 and 4 of split_line_xy2. Because Geosoft databases allow for concurrent access by multiple applications, symbols must be locked and unlocked as they are used.  The geosoft.gxpy pythonic api will do this for you, but the geosoft.gxapi requires that you directly manage locks. Terminating a script will remove locks from all symbols.


The geosoft.gxpy.gdb module has a Line class and a Channel class that makes is easier to work with lines and channels from a database. Here is the script to import the data and spit the imported line into multiple lines:

import_csv_split.py
import numpy as np
import geosoft.gxpy as gxpy
import geosoft.gxpy.gdb as gxdb
import geosoft.gxapi as gxapi

#create context
gxc = gxpy.gx.GXpy()

# Open csv-format data file and skip the first line, which is a comment line
f = open('mag_data.csv', 'r')
f.readline()

# the second line contains the channel/field names, from which we create a list of channel names
channel_names = f.readline().strip().split(',')

#the rest of the file contains data, which we load into a numpy float array
data = np.loadtxt(f, delimiter=',')

#create a new database from list of channels and numpy data. All data is stored in a single line.
gdb = gxdb.Geosoft_gdb.new('mag_data_split', overwrite=True)
line_name = gxdb.create_line_name()
gdb.write_line(line_name, data, channel_names)

# set the coordinate system to 'NAD 83 / UTM zone 15N'
gdb.coordinate_system = 'NAD83 / UTM zone 15N'

# set the mag data units to 'nT'
gxdb.Channel(gdb, 'mag').unit_of_measure = 'nT'

print(list(gdb.list_lines()))     # ['L0']
print(list(gdb.list_channels()))  # ['mag', 'X', 'Y', 'Z']
print(gdb.xyz_channels)           # ('X', 'Y', 'Z')

# split the line into sections knowing lines are E-W, and separated by 200 m.
# see https://geosoftinc.github.io/gxpy/9.2/python/GXDU.html?highlight=split_line_xy2#geosoft.gxapi.GXDU.split_line_xy2

split_line_number_start = gxapi.int_ref()
split_line_number_start.value = 1

# create instances to the lines and channels needed by the split_line_xy2 function
line = gxdb.Line(gdb, 'L0')
x_channel = gxdb.Channel(gdb, 'X')
y_channel = gxdb.Channel(gdb, 'Y')

# lock items as required
line.lock = gxdb.SYMBOL_LOCK_READ
x_channel.lock = gxdb.SYMBOL_LOCK_WRITE
y_channel.lock = gxdb.SYMBOL_LOCK_WRITE

# split the original line into segments, based on a lateral distance tolerance of 100 m.
gxapi.GXDU.split_line_xy2(
    gdb.gxdb,
    line.symbol,
    gxdb.Channel(gdb, 'X').symbol,
    gxdb.Channel(gdb, 'Y').symbol,
    1,
    100.0,
    gxapi.rDUMMY,
    gxapi.DU_SPLITLINE_SEQUENTIAL,
    split_line_number_start,
    1,
    1)

#delete the original line as it is no longer needed
gdb.delete_line('L0')

# print a list of the new lines
print(list(gdb.list_lines()))   # ['L1', 'L2', 'L3', 'L4', 'L5', 'L6', ...

exit()
line 4Includethe geosoft.gxapi module, which will be used at line 49
line 37argument 9 of the split_line_xy2 function (see line 58) requires a geosoft.gxapi.int_ref instance. This is because this argument is passed by reference such that the returned value will be the next line number in the sequence of split lines. Passing by reference is not part of Python so we use an instance of the int_ref class, which has a property value that will hold the value to pass, and will be changed by called function. The gxapi module has predefined classes int_ref (64-bit integers), float_ref (64-bit floating point) and str_ref (unicode strings) specifically for this purpose.
lines 41-48Here we use the Line and Channel classes to create instances to the required line and channels, and we use the instances to set the required locks.
line 51Call the GXDU function to split the line.
line 65Delete the original line as it is no longer required.
line 70When a script terminates any locks on the database are cleared. If you were to continue to use the symbols in the same script it is good practice to call gdb.unlock_all() to unlock all symbols and then re-lock symbols as required.

After running this script, plotting the database as a line maps produces the following map:

Read, modify and save data

Using numpy

Geosoft databases and data models are designed around the concept of data vector arrays, which support very high performance data processing. Each channel in line holds a single 1-dimensional array of data. Python's standard library for working with data arrays is numpy, and this is a standard part of any scientifically-oriented Python environment.  The Geosoft_gdb class provides for reading and writing data directly to/from numpy arrays as well as Geosoft VV vector arrays.  This example shows a simple use of numpy and the next section shows the same exercise using a Geosoft VV.

If we open the mag_data_split.gdb database we see that the mag data has values around 5000 nT.

The following script will subtract 5000 from the data in the 'mag' channel and save it to a new channel named 'mag_base':

import geosoft.gxpy as gxpy
import geosoft.gxpy.gdb as gxdb

gxc = gxpy.gx.GXpy()

# open the database, best practice is to use a 'with ...' construct
with gxdb.Geosoft_gdb.open('mag_data_split') as gdb:

    # make a new channel for the output, duplicate properties of 'mag' channel
    new_mag_channel = gxdb.Channel.new(gdb, 'mag_base', dup='mag', replace=True)

    # work through each line
    for line in gdb.list_lines():

        print ('processing line {}'.format(line))

        # read data from the line.
        # The read_channel method returns the data as a numpy array, and the fiducial
        mag_data, fid = gdb.read_channel(line, 'mag')

		# use simple numpy math to subtract 5000, then save to the new_mag_channel
        mag_data = mag_data - 5000
        gdb.write_channel(line, new_mag_channel, mag_data, fid)

exit()
line 10In this example we will write the processed data to a new channel named 'mag_base', which will duplicate the properties of the 'mag' channel, such as inheriting the type, units and display properties.
line 13The standard pattern for processing data in a database is to work process each line in a loop. Individual lines can hold huge amounts of data, and an entire database can be unreasonably large. By processing data in line segments we are able to handle extremely large data sets.
line 17Here we read a single channel of data from a line and return it in a numpy array. All read functions that return numpy arrays also return the fiducial of the data. The fiducial references the start point and separation between data rows relative to the reference sequence of the data, and reading to numpy arrays will always ensure the data is sampled to the full range of the data at the smallest data increment based on the data requested. Always pass this fiducial back when writing data, in this case at line 23
line 22numpy is the standard array processing library for scientific Python applications, and as such it provides for very high performance operations on numpy data. You should always use optimal numpy operations to process data, in this case simply subtracting 5000 from the data.

Open the modified database and display the 'mag_base' channel:

Using Geosoft VV

This example does the same thing as the previous example, but uses the Geosoft vector processing functions from the Geosoft VV library. Geosoft VV functions will deliver even higher performance than numpy, and while you may be limited by the capabilities of the Geosoft API, there are many functions that perform geoscience-specific operations on data (see gxapi.GXVVU). Note that data in a Geosoft VV can also be exposed as a numpy array to provide numpy access when needed.

modify_channel_data_vv.py
import geosoft.gxpy as gxpy
import geosoft.gxpy.gdb as gxdb
import geosoft.gxapi as gxapi

gxc = gxpy.gx.GXpy()

# open the database, best practice is to use a 'with ...' construct
with gxdb.Geosoft_gdb.open('mag_data_split') as gdb:

    # make a new channel for the output, duplicate properties of 'mag' channel
    new_mag_channel = gxdb.Channel.new(gdb, 'mag_base', dup='mag', replace=True)

    # work through each line
    for line in gdb.list_lines():

        print ('processing line {}'.format(line))

        # read data from the line.
        # The read_channel method returns the data in a geosoft VV
        mag_data = gdb.read_channel_vv(line, 'mag')

        # use Geosoft GXVVU.translate function to subtract 5000.
        gxapi.GXVVU.translate(mag_data.gxvv, -5000, 1)
        gdb.write_channel_vv(line, new_mag_channel, mag_data)

exit()
line 11returns the data in a geosoft.gxpy.VV instance, in which the fiducial is a property.
line 23calling into the gxapi to translate the data.

Apply an expression to a database

In this exercise we calculate the distance along each line from the starting point of each line. The Pythagorean expression is simple and can be implemented in a single line of numpy code. This example reads multiple channels from the database and uses numpy slicing to work with individual channel columns from a 2D numpy array:

import numpy as np
import geosoft.gxpy as gxpy
import geosoft.gxpy.gdb as gxdb

gxc = gxpy.gx.GXpy()

# open the database, best practice is to use a 'with ...' construct
with gxdb.Geosoft_gdb.open('mag_data_split') as gdb:

    # make a distance channel
    dist_channel = gxdb.Channel.new(gdb, 'distance', dup='x', replace=True)

    # work through each line
    for line in gdb.list_lines():

        print ('processing line {}'.format(line))

        # read data from the line, returns in a 2D numpy array.
        xy_data, channels_read, fid = gdb.read_line(line, ('x', 'y'))

        # get the first point (x0, y0)        
        x0 = xy_data[0, 0]
        y0 = xy_data[0, 1]
        
        # use numpy array math to calculate distance in a 1D array dist_array
        dist_array = np.sqrt((xy_data[:, 0] - x0)**2 + (xy_data[:, 1] - y0)**2)
        
        # save the distance
        gdb.write_channel(line, dist_channel, dist_array, fid)

exit()
line 19Here we are reading multiple channels from a single line. If we were to omit the list of channels, here expressed as tuple ('x', 'y'), all channels in the line would be returned in a 2D numpy array. But in this case we will get a 2D numpy array that contains (x, y) on each row. The returned channels_read is a list of the channels read.
line 26

This expression uses  numpy vector operations and is therefore very fast. You could also iterate through the array, but this is much slower:

calculate distance element-by-element
dist_array = np.zeros(len(xy_data))
for i in range(len(dist_array)):
    dx = xy_data[i, 0] - x0
    dy = xy_data[i, 1] - y0
    dist_array[i] = math.sqrt(dx **2 + dy **2)