================
Data Maintenance
================
Maintenance activities are triggered by Geoscape receiving updated address data from data contributors according to an agreed delivery schedule. At present, this schedule defines a quarterly update process.
During the maintenance phase, contributed addresses are analysed and compared to existing records in G-NAF. This analysis and comparison give rise to new records being inserted and existing records being updated or retired.
The following diagram of the G-NAF Maintenance Process provides a high-level view of the G-NAF system including G-NAF maintenance pre-processing, the use of reference data files, G-NAF maintenance software and G-NAF outputs.
.. graphviz::
strict digraph {
graph [fontname = "Chakra Petch,Verdana,serif"]
node [fontname = "Chakra Petch,Verdana,serif"]
size="4,5"
edge [fontname="Chakra Petch,Verdana,serif" color="blue"]
A [label="Reference\n Data Files" shape="parallelogram" style="filled" fillcolor="#FFCF06"]
B [label= "Contributor\nData Files" shape="parallelogram" style="filled" fillcolor="#FFCF06"]
C [label= "G-NAF \nMaintenance \nPre-Processing" shape="square"]
D [label= "G-NAF \nMaintenance \nSoftware" shape="square"]
E [label= "PSMA \nGeocoded National Address File \n(G-NAF) \nDatabase" shape="cylinder" style="filled" fillcolor="#482CFF" COLOR="white"]
F [label= "Data Extraction" shape="square"]
G [label= "Reports" shape="square" ]
H [label= "G-NAF \n Data" shape="parallelogram"]
I [label= "Log Files" shape="parallelogram"]
J [label= "Rule Generation" shape="square"]
A -> C
B -> C
C -> D
D -> E
E -> F
F -> G
F -> H
D -> I
I -> J
J -> C
rank=same{A B}
rank=same{ C }
rank=same{I D}
}
Pre-processing
--------------
The G-NAF maintenance pre-process takes the input files from the Geoscape reference
datasets and contributor data and performs processing prior to data being processed by the
G-NAF maintenance software.
Pre-processing is used to describe the following activities:
* Mapping from the contributor model to G-NAF model (with parsing as necessary)
* Application of rules that make corrections to misspellings, abbreviations and erroneous characters
* Application of updates to suburb data and road names propagating the changes through all affected parts of the data.
| **Data structure of an address**
For an address to be included in G-NAF, it must be a “complete” entry. Complete equals:
* Must include a matched locality
* Must include a street name
* Must contain either a valid number_first or a lot number.
| **Reference datasets**
G-NAF is a dataset which is reliant on other Geoscape datasets. Below is a diagram which
displays that relationship and order of production cycle for the release of Geoscape
datasets. Geoscape’s Administrative Boundaries and Roads datasets need to be completed
before G-NAF production can commence.
.. image:: data_maintenance/reference_datasets_timeline.png
:width: 450
:alt: G-NAF's Reference Datasets Timeline
:align: center
G-NAF maintenance software
--------------------------
The G-NAF maintenance software receives data from the pre-processing phase. All the
contributed addresses from each jurisdiction are cleansed, compared and merged into the
normalised G-NAF maintenance model.
Processing
++++++++++
The core maintenance processing consists of the following:
* Address scrubbing
* State-Locality validation and geocoding
* Street validation
* Street geocoding
* Address geocoding
* Merging (merge criteria and confidence levels)
A further series of processing occurs for the following steps:
* Post merge processing (including validation processes)
* Primary / Secondary maintenance
* Alias / Principal maintenance
* Geocode maintenance
* Update address attributes (update attributes not in core processing)
* Update address links (i.e. contributor mapping, mesh blocks, default geocode)
* Verify G-NAF data (i.e. conformance with a data model)
* Data export to integrated maintenance database.
Geocoding
+++++++++
Multiple geocodes and multiple types of geocodes can be stored for each address. While
this capability exists in the G-NAF model, addresses with multiple geocodes only exist for
some addresses at this stage.
Geocode level type
******************
Every address within G-NAF must have a locality level geocode, it may also have a street
level geocode and a parcel level geocode. The table GEOCODE_LEVEL_TYPE_AUT indicates
which of these geocode level types are associated with an address in accordance with the
table below:
.. table::
:align: center
=================== ===============================================================
Geocode_Level_Type Description
=================== ===============================================================
0 No Geocode
------------------- ---------------------------------------------------------------
1 Parcel Level Geocode Only (No Locality or Street Level Geocode)
------------------- ---------------------------------------------------------------
2 Street Level Geocode Only (No Locality or Parcel Level Geocode)
------------------- ---------------------------------------------------------------
3 Street and Parcel Level Geocodes (No Locality Geocode)
------------------- ---------------------------------------------------------------
4 Locality Level Geocode Only (No Street or Parcel Level Geocode)
------------------- ---------------------------------------------------------------
5 Locality and Parcel level Geocodes (No Street Level Geocode)
------------------- ---------------------------------------------------------------
6 Locality and Street Level Geocodes (No Parcel Level Geocodes)
------------------- ---------------------------------------------------------------
7 Locality, Street and Parcel Level Geocodes
=================== ===============================================================
.. note::
LEVEL_GEOCODED_CODE field within the ADDRESS_DETAIL table refers to the CODE field within the GEOCODE_LEVEL_TYPE_AUT.
Geocode reliability
********************
Reliability of a geocode refers to the geocode precision and is linked to how the geocode
was generated. Every geocode in G-NAF has a reliability level. The levels and their
descriptions are stored in the table GEOCODE_RELIABILITY_AUT. These descriptions
together with examples are given in the table below.
.. table:: Geocode reliability
:align: center
:widths: 62,218,218
================== ============================================ =========================================================
Reliability Level Description Example
================== ============================================ =========================================================
1 Geocode resolution recorded to appropriate Address level geocode was manually geocoded with a
surveying standard. GPS.
------------------ -------------------------------------------- ---------------------------------------------------------
2 Geocode resolution sufficient to place Address level geocode was calculated as the geometric
geocode within address site boundary or centre within the associated cadastral parcel
access point close to address site boundary. Geocode for access point identified for a rural property
Calculated geocode based on centre setback from road
within cadastral parcel Geocode for approximate
centre of building.
------------------ -------------------------------------------- ---------------------------------------------------------
3 Geocode resolution sufficient to place Address level geocode was automatically calculated by
geocode near (or possibly within) address determining where on the road the address was likely to
site boundary. appear, based on other bounding geocoded addresses
------------------ -------------------------------------------- ---------------------------------------------------------
4 Geocode resolution sufficient to associate Street level geocode automatically calculated by using the
address site with a unique road feature road centreline reference data
------------------ -------------------------------------------- ---------------------------------------------------------
5 Geocode resolution sufficient to associate Locality level geocode automatically calculated to the
address site with a unique locality or geometric centre within the gazetted locality for this
neighbourhood address
------------------ -------------------------------------------- ---------------------------------------------------------
6 Geocode resolution sufficient to associate Locality level geocode derived from topographic feature
address site with a unique region
================== ============================================ =========================================================
.. note::
RELIABILITY_CODE field within the ADDRESS_SITE_GEOCODE table refers to the
CODE field within the GEOCODE_TYPE_AUT.
Every geocode has a reliability level. These levels are stored with the geocodes in the
following tables:
* LOCALITY_POINT
* STREET_LOCALITY_POINT
* ADDRESS_SITE_GEOCODE
Geocode type
************
Provision has also been made for G-NAF to cater for multiple types of geocodes for an
address. Where geocode types are nominated by the jurisdiction, these are reflected in the
geocode type field. Where a geocode type is not provided, a default value is used that
reflects the majority of addresses. Nationally, the PROPERTY CENTROID (PC) geocode type
is the most uniform. While the data model and respective geocode types have been listed,
in the vast majority of cases, there are no current national data sources identified to
populate the additional codes. The full list of allowed geocode types is included of the Data
Dictionary in Appendix C (i.e. GEOCODE_TYPE_AUT table).
Geocode priority
****************
A priority order has been developed and applied during G-NAF production to provide a
single geocode for all G-NAF addresses. The priority order developed places emphasis on
identifying locations associated with emergency management access, buildings on a site
and other locations which are associated with the land management process. This order
has been developed to assist users in general and will not be suitable for all user business
needs. The priority order applied is included in the relevant table in Appendix C. The
priority order has been applied in the ADDRESS_DEFAULT_GEOCODE table.
Confidence levels
+++++++++++++++++
Every address and geocode can be related to a supplied dataset, which in turn can be
related to the contributor who provided it. This feature is essential to being able to supply
the information back to the address contributors. However, the address custodian identifier
is not available in G-NAF. Instead, address level metadata is available indicating how many
source datasets provided each address.
Address Usage is reflected in the Confidence field included in the ADDRESS_DETAIL table
and is expressed as follows:
.. math:: n-1 = C
:name: Confidence level equation
(n = number of datasets providing the address, C = confidence level)
Given G-NAF has been built with three contributor datasets, the Address Usage (Confidence Level) possibilities are as follows:
.. table:: Confidence levels
:align: center
====================== ========================================================================================
Confidence Level Description
====================== ========================================================================================
2 This reflects that all three contributors have supplied an identical address.
---------------------- ----------------------------------------------------------------------------------------
1 This reflects that a match has been achieved between only two contributors.
---------------------- ----------------------------------------------------------------------------------------
0 This reflects that a single contributor holds this address and no match has been
achieved with either or the other two contributors.
---------------------- ----------------------------------------------------------------------------------------
-1 This reflects that none of the contributors hold this address in their address dataset
anymore.
====================== ========================================================================================
Where an address is no longer provided by any contributor, the address will be retired. Addresses
provided by contributors will be retired by Geoscape when following a review of an address, the address is considered
to be no longer in use in the community and has yet to be retired from contributor databases. The
retirement will be reflected in its confidence level value of -1. Up until the August 2018
release of G-NAF all retired addresses were retained in G-NAF for four releases after which
they were then archived and not retained in the product. The introduction of the
ADDRESS_FEATURE table in August 2018 with the tracking of change to addresses,
requires the need to retain all retired addresses to show change over time.
Merge criteria
++++++++++++++
Addresses which share similar characteristics from the different contributors are merged
into a single record. These shared characteristics are known as the merge criteria.
The fields comprising the G-NAF merge criteria are:
* STATE_ABBREVIATION
* LOCALITY_NAME
* PRIMARY_POSTCODE
* STREET_NAME
* STREET_TYPE
* STREET_SUFFIX
* NUMBER_FIRST_PREFIX
* NUMBER_FIRST
* NUMBER_FIRST_SUFFIX
* NUMBER_LAST_PREFIX
* NUMBER_LAST
* NUMBER_LAST_SUFFIX
* FLAT_NUMBER_PREFIX
* FLAT_NUMBER
* FLAT_NUMBER_SUFFIX
* LEVEL_NUMBER
.. note::
**Exception for Addresses without a number_first**
When a contributed address is supplied without a number_first, consideration is given as to whether
the address contains a lot_number. An address without a number_first but with a lot_number will be
added to G NAF.
A G-NAF ID or address_detail_pid relates to a unique combination of these merge criteria
fields. This address_detail_pid will persist with the address while it remains in the dataset.
Where values in fields which are not included in the merge criteria (from the
ADDRESS_DETAIL table) change in consecutive product releases, the address_detail_pid
will not change. However, the associated date_last_modified field will.
Merge criteria changes
++++++++++++++++++++++
When any element of the merge criteria changes, the new record is treated as a new
address and inserted into G-NAF as such.
**Example**
This example shows Unit 3 21 Smith Street Burwood (``address_detail_pid = 'GAVIC411711441'``) being changed to Unit 3 21 Brown Street Burwood by a contributor. The
street name change will mean it is no longer possible to match the new incoming record to
an existing G-NAF record, so a new G-NAF record (``address_detail_pid = 'GAVIC998999843'``)
is created.
As the existing address (i.e. ``'GAVIC411711441'``) is now only supported by two contributors,
its confidence level is reduced to 1. The new incoming address, only supported by one
contributor, will get a confidence of 0.
.. graphviz::
digraph structs{
table1 [shape=plain label=<
Existing G-NAF Record Example |
GNAF_ID |
GAVIC411711441 |
FLAT_TYPE |
UNIT |
FLAT_NUMBER |
3 |
BUILDING_NAME |
PONDEROSA |
NUMBER_FIRST |
21 |
STREET_NAME |
BROWN |
STREET_TYPE |
STREET |
LOCALITY_NAME |
BURWOOD |
CONFIDENCE |
1 |
DATE_CREATED |
29/04/2014 |
DATE_RETIRED |
|
DATE_LAST_MODIFIED |
|
>];
table2 [shape=plain label=<
Updated G-NAF Record Example |
GNAF_ID |
GAVIC411711441 |
GAVIC411711441 |
FLAT_TYPE |
UNIT |
UNIT |
FLAT_NUMBER |
3 |
3 |
BUILDING_NAME |
PONDEROSA |
PONDEROSA |
NUMBER_FIRST |
21 |
21 |
STREET_NAME |
SMITH |
BROWN |
STREET_TYPE |
STREET |
STREET |
LOCALITY_NAME |
BURWOOD |
BURWOOD |
CONFIDENCE |
1 |
1 |
DATE_CREATED |
29/04/2014 |
14/06/2014 |
DATE_RETIRED |
14/06/2014 |
|
DATE_LAST_MODIFIED |
|
|
>];
table1:f1 -> table2:f2
rank=same {table1 table2};
}
Address duplication
+++++++++++++++++++
As multiple contributors supply data nominally covering the same area, there is a
possibility that there are duplicate addresses which represent the same addressable
location. The above example simplistically demonstrates how this could occur. Geoscape
has developed a sophisticated series of production processes in an effort to counter these
issues. The majority of this duplication has occurred as a result of the following:
The use of both ranged and non-ranged addresses for the same site (e.g. 22-28 Sydney
Street vs 22 Sydney Street).
The use of a flat number as opposed to a number_first suffix for the same site (e.g. 2/27
Melbourne Street vs 27B Melbourne Street).
Where one contributor supplies a level number as part of an address string and another
contributor does not supply the level number for the same site. This tends to occur on
properties where “hotel style addressing” is used (e.g. Level 3, 302/50 Adelaide Street vs
302/50 Adelaide Street).
Where circumstances of this nature have been identified during processing, alias principal
relationships have been established to prevent the duplication of addresses.
Alias Management
****************
The usability of G NAF is greatly enhanced by the inclusion of alias information that captures addresses in popular use irrespective of official status. Geoscape recognises that G NAF has a role to play in progressing usage of official gazetted addresses. However, it is also acknowledged that the issue cannot be forced and in some cases, it will take generational change to see alias or incorrect addresses taken out of everyday usage.
It is also considered that the benefits of the inclusion of aliases outweigh the costs; particularly in the application of G NAF by emergency services. There are three levels of aliases in the G NAF schema:
* Alias Address - where an individual address is also known by another name
* Alias Street/Locality Address - where a street/locality pair does not exist in the reference data and is the synonym or incorrect spelling of a street/locality pair that does exist.
* Alias Locality Address - where a locality does not exist in the reference data and is the synonym or incorrect spelling of a locality that does exist
Alias address
*************
Alias addresses (ADDRESS_ALIAS) are addresses, other than the principal address, that refer to the same physical location as another address record.
.. image:: data_maintenance/alias_address.png
:width: 450
:alt: Alias address
:align: center
An address level alias refers to the same address site which is identified by different address elements.
The relationship between addresses at a specific site is modelled through a principal and alias attribute and join table.
Alias street/locality
*********************
Alias street/locality (STREET_LOCALITY_ALIAS) is used to determine addresses that refer to the same physical location as another address record, where the street/locality is different. Where it is identified that the street/locality in an address from a contributor was incorrect (e.g. spelling error), a rule (see below) is created to manipulate the data during the scrubbing process.
Alias locality
**************
Alias localities (LOCALITY_ALIAS) are used to determine those addresses that refer to the same physical location as another address record, but where the locality is different.
The example locality ‘"CITY” will exist in the LOCALITY table and an entry for “CANBERRA CITY”’ will exist in the LOCALITY_ALIAS table.
Using alias datasets
********************
When using G NAF to validate an address, the steps are:
#. Is there a principal address for this address?
#. Is there an alias address for this address?
#. Is there an alias locality for the locality of the address?
This can be determined by checking the locality name of the address against the LOCALITY_NAME field in the LOCALITY_ALIAS table; the locality_pid is then used to determine the correct locality_name from the LOCALITY table. The next step would be to retry steps 1 & 2 with the new locality_name.
#. Is there an alias street/locality for the address?
This can be determined by checking the street name of the address against the street_name, street_type, street_suffix fields in the STREET_LOCALITY_ALIAS table; the street_pid is then used to determine the correct street_name from the STREET table. The next step would be to retry steps 1, 2 & 3 with the new street name.
Processing links to other Geoscape Data
+++++++++++++++++++++++++++++++++++++++
Administrative Boundaries
*************************
There are three layers within the Administrative Boundaries product that have linkages to
G-NAF:
* Suburbs/Localities
* Mesh Blocks 2011 (ABS Boundaries 2011 theme).
* Mesh Blocks 2016 (ABS Boundaries 2016 theme).
Suburbs/Localities is a reference dataset for G-NAF and is the source for
identifying the official locality name for an address, where available. The
suburbs/localities geometry is also an important part in the allocation of
geocodes for locality and street-locality geocodes generated for G-NAF.
Roads
*****
Geoscape Roads is a reference dataset that is used for the processing of G-NAF. The roads
data is a fundamental part of an address and is used as the source for the allocation of
road names in the STREET_LOCALITY table. The roads geometry is also used in the
allocation of the street-locality level geocodes.
Legal Parcel Identifier
***********************
The ADDRESS_DETAIL table contains a field called LEGAL_PARCEL_ID, the process involves
incorporating the cadastral information captured from the address supplied by the
jurisdiction, where possible. This process is done at the time that the address data is
supplied by the jurisdiction and more accurately represents the cadastral information used
for an address by the jurisdiction. Addresses from other contributors will also be allocated
the same cadastral information where the geocode is at the same location. The
LEGAL_PARCEL_ID field is populated with the cadastral information using the same
concatenations (where applicable) as adopted for the PARCEL_ID used in the Cadastre
product as shown in the table below.
.. table:: Cadastre Parcel ID Constructors
:align: center
============= ================================================== ====================
State Concatenation Examples
============= ================================================== ====================
ACT DISTRICT_SHORT/DIVISION_SHORT/SECTION/BLOCK CANB/BRAD/18/41
------------- -------------------------------------------------- --------------------
| DISTRICT_SHORT/DIVISION_SHORT/SECTION/BLOCK/UNIT BELC/BRUC/78/17/2
------------- -------------------------------------------------- --------------------
| DISTRICT_SHORT/DIVISION_SHORT// CANB/CITY//
------------- -------------------------------------------------- --------------------
NSW If SECTIONNUM is
then
------------- -------------------------------------------------- --------------------
| LOTNUMBER/PLANNUMBER 13/31993
else
------------- -------------------------------------------------- --------------------
| If SECTIONNUM has a value 11/C/3625
then
------------- -------------------------------------------------- --------------------
| LOT_NUMBER/SECTIONNUM/PLANNUMBER 3/23/2163
------------- -------------------------------------------------- --------------------
| PLAN_LABEL 4994-1497
------------- -------------------------------------------------- --------------------
NT [#]_ PAR_LOC/PAR_PAR/PAR_LTO 550/3252/
------------- -------------------------------------------------- --------------------
| 055/C/60001
------------- -------------------------------------------------- --------------------
QLD LOT/PLAN 66/RP139841
------------- -------------------------------------------------- --------------------
SA PLAN_T/PLAN/PARCEL_T/PARCEL D/10001/A/14
------------- -------------------------------------------------- --------------------
TAS PLAN/LOT 158882/1
------------- -------------------------------------------------- --------------------
VIC [#]_ PARCEL.SPI 1\TP201500
------------- -------------------------------------------------- --------------------
| CM\PS405814
------------- -------------------------------------------------- --------------------
| PC370718
------------- -------------------------------------------------- --------------------
WA PI_PARCEL/LOT_NUMBER S030337/1
------------- -------------------------------------------------- --------------------
| P003008/74
------------- -------------------------------------------------- --------------------
Jervis (OT) DISTRICT_SHORT/BLOCK JERV/927
------------- -------------------------------------------------- --------------------
Cocos (OT) Same as WA.
------------- -------------------------------------------------- --------------------
Norfolk (OT) LOT/PORTION/SECTION 66/41a27/16
============= ================================================== ====================
.. [#] Leading 0 will be trimmed from PAR_PAR
.. [#] The VIC SPI uses a \ concatenator (opposite to other jurisdictions)
Jurisdiction Property Identifier
********************************
The ADDRESS_DETAIL table includes a field called GNAF_PROPERTY_PID that includes the
property identifier provided by the jurisdiction for the property associated with the address.
This identifier is the same as the CONTRIBUTOR_ID in the Property product as shown in
the table below.
.. table:: Property Contributor ID Constructors
:align: center
===== ===============================================
State Concatenation
===== ===============================================
ACT TITLE + “/” + UNIT
----- -----------------------------------------------
NSW PROPID
----- -----------------------------------------------
NT VOLUME_TYP + “/” + VOLUME_NO + “/” + FOLIO_NO
----- -----------------------------------------------
QLD PROPERTY_ID
----- -----------------------------------------------
SA ASSNO_TENSEQNO
----- -----------------------------------------------
TAS PID
----- -----------------------------------------------
VIC PFI
----- -----------------------------------------------
WA VPU_VE_NUMBER
===== ===============================================
Maintenance scope
-----------------
Data for existing objects with changed geometry and attributes as well as data for new
objects within the release period are included in the release.