8. Data Maintenance

Maintenance activities are triggered by Geoscape receiving updated address data from data contributors according to an agreed delivery schedule. At present, this schedule defines a quarterly update process.

During the maintenance phase, contributed addresses are analysed and compared to existing records in G-NAF. This analysis and comparison give rise to new records being inserted and existing records being updated or retired.

The following diagram of the G-NAF Maintenance Process provides a high-level view of the G-NAF system including G-NAF maintenance pre-processing, the use of reference data files, G-NAF maintenance software and G-NAF outputs.

strict digraph {
graph [fontname = "Chakra Petch,Verdana,serif"]
node [fontname = "Chakra Petch,Verdana,serif"]
size="4,5"
edge [fontname="Chakra Petch,Verdana,serif" color="blue"]
A [label="Reference\n Data Files" shape="parallelogram" style="filled" fillcolor="#FFCF06"]
B [label= "Contributor\nData Files" shape="parallelogram" style="filled" fillcolor="#FFCF06"]
C [label= "G-NAF \nMaintenance \nPre-Processing" shape="square"]
D [label= "G-NAF \nMaintenance \nSoftware" shape="square"]
E [label=  "PSMA \nGeocoded National Address File \n(G-NAF) \nDatabase" shape="cylinder" style="filled" fillcolor="#482CFF" COLOR="white"]
F [label= "Data Extraction" shape="square"]
G [label= "Reports" shape="square" ]
H [label= "G-NAF \n Data" shape="parallelogram"]
I [label= "Log Files" shape="parallelogram"]
J [label= "Rule Generation" shape="square"]
A -> C
B -> C
C -> D
D -> E
E -> F
F -> G
F -> H
D -> I
I -> J
J -> C

rank=same{A B}

rank=same{ C }

rank=same{I D}

}

8.1. Pre-processing

The G-NAF maintenance pre-process takes the input files from the Geoscape reference datasets and contributor data and performs processing prior to data being processed by the G-NAF maintenance software. Pre-processing is used to describe the following activities:

  • Mapping from the contributor model to G-NAF model (with parsing as necessary)

  • Application of rules that make corrections to misspellings, abbreviations and erroneous characters

  • Application of updates to suburb data and road names propagating the changes through all affected parts of the data.

Data structure of an address

For an address to be included in G-NAF, it must be a “complete” entry. Complete equals:

  • Must include a matched locality

  • Must include a street name

  • Must contain either a valid number_first or a lot number.

Reference datasets

G-NAF is a dataset which is reliant on other Geoscape datasets. Below is a diagram which displays that relationship and order of production cycle for the release of Geoscape datasets. Geoscape’s Administrative Boundaries and Roads datasets need to be completed before G-NAF production can commence.

G-NAF's Reference Datasets Timeline

8.2. G-NAF maintenance software

The G-NAF maintenance software receives data from the pre-processing phase. All the contributed addresses from each jurisdiction are cleansed, compared and merged into the normalised G-NAF maintenance model.

8.2.1. Processing

The core maintenance processing consists of the following:

  • Address scrubbing

  • State-Locality validation and geocoding

  • Street validation

  • Street geocoding

  • Address geocoding

  • Merging (merge criteria and confidence levels)

A further series of processing occurs for the following steps:

  • Post merge processing (including validation processes)

  • Primary / Secondary maintenance

  • Alias / Principal maintenance

  • Geocode maintenance

  • Update address attributes (update attributes not in core processing)

  • Update address links (i.e. contributor mapping, mesh blocks, default geocode)

  • Verify G-NAF data (i.e. conformance with a data model)

  • Data export to integrated maintenance database.

8.2.2. Geocoding

Multiple geocodes and multiple types of geocodes can be stored for each address. While this capability exists in the G-NAF model, addresses with multiple geocodes only exist for some addresses at this stage.

8.2.2.1. Geocode level type

Every address within G-NAF must have a locality level geocode, it may also have a street level geocode and a parcel level geocode. The table GEOCODE_LEVEL_TYPE_AUT indicates which of these geocode level types are associated with an address in accordance with the table below:

Geocode_Level_Type

Description

0

No Geocode

1

Parcel Level Geocode Only (No Locality or Street Level Geocode)

2

Street Level Geocode Only (No Locality or Parcel Level Geocode)

3

Street and Parcel Level Geocodes (No Locality Geocode)

4

Locality Level Geocode Only (No Street or Parcel Level Geocode)

5

Locality and Parcel level Geocodes (No Street Level Geocode)

6

Locality and Street Level Geocodes (No Parcel Level Geocodes)

7

Locality, Street and Parcel Level Geocodes

Note

LEVEL_GEOCODED_CODE field within the ADDRESS_DETAIL table refers to the CODE field within the GEOCODE_LEVEL_TYPE_AUT.

8.2.2.2. Geocode reliability

Reliability of a geocode refers to the geocode precision and is linked to how the geocode was generated. Every geocode in G-NAF has a reliability level. The levels and their descriptions are stored in the table GEOCODE_RELIABILITY_AUT. These descriptions together with examples are given in the table below.

Geocode reliability

Reliability

Description

Example

Level

1

Geocode resolution recorded to appropriate surveying standard.

Address level geocode was manually geocoded with a GPS.

2

Geocode resolution sufficient to place geocode within address site boundary or access point close to address site boundary.

Address level geocode was calculated as the geometric centre within the associated cadastral parcel Geocode for access point identified for a rural property Calculated geocode based on centre setback from road within cadastral parcel Geocode for approximate centre of building.

3

Geocode resolution sufficient to place geocode near (or possibly within) address site boundary.

Address level geocode was automatically calculated by determining where on the road the address was likely to appear, based on other bounding geocoded addresses

4

Geocode resolution sufficient to associate address site with a unique road feature

Street level geocode automatically calculated by using the road centreline reference data

5

Geocode resolution sufficient to associate address site with a unique locality or neighbourhood

Locality level geocode automatically calculated to the geometric centre within the gazetted locality for this address

6

Geocode resolution sufficient to associate address site with a unique region

Locality level geocode derived from topographic feature

Note

RELIABILITY_CODE field within the ADDRESS_SITE_GEOCODE table refers to the CODE field within the GEOCODE_TYPE_AUT.

Every geocode has a reliability level. These levels are stored with the geocodes in the following tables:

  • LOCALITY_POINT

  • STREET_LOCALITY_POINT

  • ADDRESS_SITE_GEOCODE

8.2.2.3. Geocode type

Provision has also been made for G-NAF to cater for multiple types of geocodes for an address. Where geocode types are nominated by the jurisdiction, these are reflected in the geocode type field. Where a geocode type is not provided, a default value is used that reflects the majority of addresses. Nationally, the PROPERTY CENTROID (PC) geocode type is the most uniform. While the data model and respective geocode types have been listed, in the vast majority of cases, there are no current national data sources identified to populate the additional codes. The full list of allowed geocode types is included of the Data Dictionary in Appendix C (i.e. GEOCODE_TYPE_AUT table).

8.2.2.4. Geocode priority

A priority order has been developed and applied during G-NAF production to provide a single geocode for all G-NAF addresses. The priority order developed places emphasis on identifying locations associated with emergency management access, buildings on a site and other locations which are associated with the land management process. This order has been developed to assist users in general and will not be suitable for all user business needs. The priority order applied is included in the relevant table in Appendix C. The priority order has been applied in the ADDRESS_DEFAULT_GEOCODE table.

8.2.3. Confidence levels

Every address and geocode can be related to a supplied dataset, which in turn can be related to the contributor who provided it. This feature is essential to being able to supply the information back to the address contributors. However, the address custodian identifier is not available in G-NAF. Instead, address level metadata is available indicating how many source datasets provided each address. Address Usage is reflected in the Confidence field included in the ADDRESS_DETAIL table and is expressed as follows:

(1)\[n-1 = C\]

(n = number of datasets providing the address, C = confidence level)

Given G-NAF has been built with three contributor datasets, the Address Usage (Confidence Level) possibilities are as follows:

Confidence levels

Confidence Level

Description

2

This reflects that all three contributors have supplied an identical address.

1

This reflects that a match has been achieved between only two contributors.

0

This reflects that a single contributor holds this address and no match has been achieved with either or the other two contributors.

-1

This reflects that none of the contributors hold this address in their address dataset anymore.

Where an address is no longer provided by any contributor, the address will be retired. Addresses provided by contributors will be retired by Geoscape when following a review of an address, the address is considered to be no longer in use in the community and has yet to be retired from contributor databases. The retirement will be reflected in its confidence level value of -1. Up until the August 2018 release of G-NAF all retired addresses were retained in G-NAF for four releases after which they were then archived and not retained in the product. The introduction of the ADDRESS_FEATURE table in August 2018 with the tracking of change to addresses, requires the need to retain all retired addresses to show change over time.

8.2.4. Merge criteria

Addresses which share similar characteristics from the different contributors are merged into a single record. These shared characteristics are known as the merge criteria. The fields comprising the G-NAF merge criteria are:

  • STATE_ABBREVIATION

  • LOCALITY_NAME

  • PRIMARY_POSTCODE

  • STREET_NAME

  • STREET_TYPE

  • STREET_SUFFIX

  • NUMBER_FIRST_PREFIX

  • NUMBER_FIRST

  • NUMBER_FIRST_SUFFIX

  • NUMBER_LAST_PREFIX

  • NUMBER_LAST

  • NUMBER_LAST_SUFFIX

  • FLAT_NUMBER_PREFIX

  • FLAT_NUMBER

  • FLAT_NUMBER_SUFFIX

  • LEVEL_NUMBER

Note

Exception for Addresses without a number_first
When a contributed address is supplied without a number_first, consideration is given as to whether
the address contains a lot_number. An address without a number_first but with a lot_number will be
added to G NAF.

A G-NAF ID or address_detail_pid relates to a unique combination of these merge criteria fields. This address_detail_pid will persist with the address while it remains in the dataset. Where values in fields which are not included in the merge criteria (from the ADDRESS_DETAIL table) change in consecutive product releases, the address_detail_pid will not change. However, the associated date_last_modified field will.

8.2.5. Merge criteria changes

When any element of the merge criteria changes, the new record is treated as a new address and inserted into G-NAF as such.

Example

This example shows Unit 3 21 Smith Street Burwood (address_detail_pid = GAVIC411711441) being changed to Unit 3 21 Brown Street Burwood by a contributor. The street name change will mean it is no longer possible to match the new incoming record to an existing G-NAF record, so a new G-NAF record (address_detail_pid = GAVIC998999843) is created.

As the existing address (i.e. GAVIC411711441) is now only supported by two contributors, its confidence level is reduced to 1. The new incoming address, only supported by one contributor, will get a confidence of 0.

digraph structs{
    table1 [shape=plain label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
    <TR>
    <TD BGCOLOR="black" COLSPAN="2"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">Existing G-NAF Record Example</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">GNAF_ID</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">GAVIC411711441</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">FLAT_TYPE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">UNIT</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">FLAT_NUMBER</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">3</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BUILDING_NAME</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">PONDEROSA</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">NUMBER_FIRST</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">21</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET_NAME</FONT></TD>
    <TD PORT="f1"><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BROWN</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET_TYPE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">LOCALITY_NAME</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BURWOOD</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">CONFIDENCE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">1</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_CREATED</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">29/04/2014</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_RETIRED</FONT></TD>
    <TD></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_LAST_MODIFIED</FONT></TD>
    <TD></TD>
    </TR>
    </TABLE>>];

    table2 [shape=plain label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
    <TR>
    <TD BGCOLOR="black" COLSPAN="3"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">Updated G-NAF Record Example</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">GNAF_ID</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">GAVIC411711441</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">GAVIC411711441</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">FLAT_TYPE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">UNIT</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">UNIT</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">FLAT_NUMBER</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">3</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">3</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BUILDING_NAME</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">PONDEROSA</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">PONDEROSA</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">NUMBER_FIRST</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">21</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">21</FONT></TD>
    </TR>
    <TR>
    <TD PORT="f2" BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET_NAME</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">SMITH</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BROWN</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET_TYPE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">STREET</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">LOCALITY_NAME</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BURWOOD</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">BURWOOD</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">CONFIDENCE</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">1</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">1</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_CREATED</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">29/04/2014</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">14/06/2014</FONT></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_RETIRED</FONT></TD>
    <TD><FONT FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">14/06/2014</FONT></TD>
    <TD></TD>
    </TR>
    <TR>
    <TD BGCOLOR="black"><FONT COLOR="white" FACE="Chakra Petch,Verdana,serif" POINT-SIZE="8">DATE_LAST_MODIFIED</FONT></TD>
    <TD></TD>
    <TD></TD>
    </TR>
    </TABLE>>];

    table1:f1 -> table2:f2

    rank=same {table1  table2};
}

8.2.6. Address duplication

As multiple contributors supply data nominally covering the same area, there is a possibility that there are duplicate addresses which represent the same addressable location. The above example simplistically demonstrates how this could occur. Geoscape has developed a sophisticated series of production processes in an effort to counter these issues. The majority of this duplication has occurred as a result of the following: The use of both ranged and non-ranged addresses for the same site (e.g. 22-28 Sydney Street vs 22 Sydney Street).

The use of a flat number as opposed to a number_first suffix for the same site (e.g. 2/27 Melbourne Street vs 27B Melbourne Street).

Where one contributor supplies a level number as part of an address string and another contributor does not supply the level number for the same site. This tends to occur on properties where “hotel style addressing” is used (e.g. Level 3, 302/50 Adelaide Street vs 302/50 Adelaide Street).

Where circumstances of this nature have been identified during processing, alias principal relationships have been established to prevent the duplication of addresses.

8.2.6.1. Alias Management

The usability of G NAF is greatly enhanced by the inclusion of alias information that captures addresses in popular use irrespective of official status. Geoscape recognises that G NAF has a role to play in progressing usage of official gazetted addresses. However, it is also acknowledged that the issue cannot be forced and in some cases, it will take generational change to see alias or incorrect addresses taken out of everyday usage. It is also considered that the benefits of the inclusion of aliases outweigh the costs; particularly in the application of G NAF by emergency services. There are three levels of aliases in the G NAF schema:

  • Alias Address - where an individual address is also known by another name

  • Alias Street/Locality Address - where a street/locality pair does not exist in the reference data and is the synonym or incorrect spelling of a street/locality pair that does exist.

  • Alias Locality Address - where a locality does not exist in the reference data and is the synonym or incorrect spelling of a locality that does exist

8.2.6.2. Alias address

Alias addresses (ADDRESS_ALIAS) are addresses, other than the principal address, that refer to the same physical location as another address record.

Alias address

An address level alias refers to the same address site which is identified by different address elements. The relationship between addresses at a specific site is modelled through a principal and alias attribute and join table.

8.2.6.3. Alias street/locality

Alias street/locality (STREET_LOCALITY_ALIAS) is used to determine addresses that refer to the same physical location as another address record, where the street/locality is different. Where it is identified that the street/locality in an address from a contributor was incorrect (e.g. spelling error), a rule (see below) is created to manipulate the data during the scrubbing process.

8.2.6.4. Alias locality

Alias localities (LOCALITY_ALIAS) are used to determine those addresses that refer to the same physical location as another address record, but where the locality is different. The example locality ‘”CITY” will exist in the LOCALITY table and an entry for “CANBERRA CITY”’ will exist in the LOCALITY_ALIAS table.

8.2.6.5. Using alias datasets

When using G NAF to validate an address, the steps are:

  1. Is there a principal address for this address?

  2. Is there an alias address for this address?

  3. Is there an alias locality for the locality of the address?

    This can be determined by checking the locality name of the address against the LOCALITY_NAME field in the LOCALITY_ALIAS table; the locality_pid is then used to determine the correct locality_name from the LOCALITY table. The next step would be to retry steps 1 & 2 with the new locality_name.

  4. Is there an alias street/locality for the address?

    This can be determined by checking the street name of the address against the street_name, street_type, street_suffix fields in the STREET_LOCALITY_ALIAS table; the street_pid is then used to determine the correct street_name from the STREET table. The next step would be to retry steps 1, 2 & 3 with the new street name.

8.3. Maintenance scope

Data for existing objects with changed geometry and attributes as well as data for new objects within the release period are included in the release.