8. Data Maintenance¶
Maintenance activities are triggered by Geoscape receiving updated address data from data contributors according to an agreed delivery schedule. At present, this schedule defines a quarterly update process.
During the maintenance phase, contributed addresses are analysed and compared to existing records in G-NAF. This analysis and comparison give rise to new records being inserted and existing records being updated or retired.
The following diagram of the G-NAF Maintenance Process provides a high-level view of the G-NAF system including G-NAF maintenance pre-processing, the use of reference data files, G-NAF maintenance software and G-NAF outputs.
8.1. Pre-processing¶
The G-NAF maintenance pre-process takes the input files from the Geoscape reference datasets and contributor data and performs processing prior to data being processed by the G-NAF maintenance software. Pre-processing is used to describe the following activities:
Mapping from the contributor model to G-NAF model (with parsing as necessary)
Application of rules that make corrections to misspellings, abbreviations and erroneous characters
Application of updates to suburb data and road names propagating the changes through all affected parts of the data.
For an address to be included in G-NAF, it must be a “complete” entry. Complete equals:
Must include a matched locality
Must include a street name
Must contain either a valid number_first or a lot number.
G-NAF is a dataset which is reliant on other Geoscape datasets. Below is a diagram which displays that relationship and order of production cycle for the release of Geoscape datasets. Geoscape’s Administrative Boundaries and Roads datasets need to be completed before G-NAF production can commence.
8.2. G-NAF maintenance software¶
The G-NAF maintenance software receives data from the pre-processing phase. All the contributed addresses from each jurisdiction are cleansed, compared and merged into the normalised G-NAF maintenance model.
8.2.1. Processing¶
The core maintenance processing consists of the following:
Address scrubbing
State-Locality validation and geocoding
Street validation
Street geocoding
Address geocoding
Merging (merge criteria and confidence levels)
A further series of processing occurs for the following steps:
Post merge processing (including validation processes)
Primary / Secondary maintenance
Alias / Principal maintenance
Geocode maintenance
Update address attributes (update attributes not in core processing)
Update address links (i.e. contributor mapping, mesh blocks, default geocode)
Verify G-NAF data (i.e. conformance with a data model)
Data export to integrated maintenance database.
8.2.2. Geocoding¶
Multiple geocodes and multiple types of geocodes can be stored for each address. While this capability exists in the G-NAF model, addresses with multiple geocodes only exist for some addresses at this stage.
8.2.2.1. Geocode level type¶
Every address within G-NAF must have a locality level geocode, it may also have a street level geocode and a parcel level geocode. The table GEOCODE_LEVEL_TYPE_AUT indicates which of these geocode level types are associated with an address in accordance with the table below:
Geocode_Level_Type |
Description |
---|---|
0 |
No Geocode |
1 |
Parcel Level Geocode Only (No Locality or Street Level Geocode) |
2 |
Street Level Geocode Only (No Locality or Parcel Level Geocode) |
3 |
Street and Parcel Level Geocodes (No Locality Geocode) |
4 |
Locality Level Geocode Only (No Street or Parcel Level Geocode) |
5 |
Locality and Parcel level Geocodes (No Street Level Geocode) |
6 |
Locality and Street Level Geocodes (No Parcel Level Geocodes) |
7 |
Locality, Street and Parcel Level Geocodes |
Note
LEVEL_GEOCODED_CODE field within the ADDRESS_DETAIL table refers to the CODE field within the GEOCODE_LEVEL_TYPE_AUT.
8.2.2.2. Geocode reliability¶
Reliability of a geocode refers to the geocode precision and is linked to how the geocode was generated. Every geocode in G-NAF has a reliability level. The levels and their descriptions are stored in the table GEOCODE_RELIABILITY_AUT. These descriptions together with examples are given in the table below.
Reliability |
Description |
Example |
---|---|---|
Level |
||
1 |
Geocode resolution recorded to appropriate surveying standard. |
Address level geocode was manually geocoded with a GPS. |
2 |
Geocode resolution sufficient to place geocode within address site boundary or access point close to address site boundary. |
Address level geocode was calculated as the geometric centre within the associated cadastral parcel Geocode for access point identified for a rural property Calculated geocode based on centre setback from road within cadastral parcel Geocode for approximate centre of building. |
3 |
Geocode resolution sufficient to place geocode near (or possibly within) address site boundary. |
Address level geocode was automatically calculated by determining where on the road the address was likely to appear, based on other bounding geocoded addresses |
4 |
Geocode resolution sufficient to associate address site with a unique road feature |
Street level geocode automatically calculated by using the road centreline reference data |
5 |
Geocode resolution sufficient to associate address site with a unique locality or neighbourhood |
Locality level geocode automatically calculated to the geometric centre within the gazetted locality for this address |
6 |
Geocode resolution sufficient to associate address site with a unique region |
Locality level geocode derived from topographic feature |
Note
RELIABILITY_CODE field within the ADDRESS_SITE_GEOCODE table refers to the CODE field within the GEOCODE_TYPE_AUT.
Every geocode has a reliability level. These levels are stored with the geocodes in the following tables:
LOCALITY_POINT
STREET_LOCALITY_POINT
ADDRESS_SITE_GEOCODE
8.2.2.3. Geocode type¶
Provision has also been made for G-NAF to cater for multiple types of geocodes for an address. Where geocode types are nominated by the jurisdiction, these are reflected in the geocode type field. Where a geocode type is not provided, a default value is used that reflects the majority of addresses. Nationally, the PROPERTY CENTROID (PC) geocode type is the most uniform. While the data model and respective geocode types have been listed, in the vast majority of cases, there are no current national data sources identified to populate the additional codes. The full list of allowed geocode types is included of the Data Dictionary in Appendix C (i.e. GEOCODE_TYPE_AUT table).
8.2.2.4. Geocode priority¶
A priority order has been developed and applied during G-NAF production to provide a single geocode for all G-NAF addresses. The priority order developed places emphasis on identifying locations associated with emergency management access, buildings on a site and other locations which are associated with the land management process. This order has been developed to assist users in general and will not be suitable for all user business needs. The priority order applied is included in the relevant table in Appendix C. The priority order has been applied in the ADDRESS_DEFAULT_GEOCODE table.
8.2.3. Confidence levels¶
Every address and geocode can be related to a supplied dataset, which in turn can be related to the contributor who provided it. This feature is essential to being able to supply the information back to the address contributors. However, the address custodian identifier is not available in G-NAF. Instead, address level metadata is available indicating how many source datasets provided each address. Address Usage is reflected in the Confidence field included in the ADDRESS_DETAIL table and is expressed as follows:
(n = number of datasets providing the address, C = confidence level)
Given G-NAF has been built with three contributor datasets, the Address Usage (Confidence Level) possibilities are as follows:
Confidence Level |
Description |
---|---|
2 |
This reflects that all three contributors have supplied an identical address. |
1 |
This reflects that a match has been achieved between only two contributors. |
0 |
This reflects that a single contributor holds this address and no match has been achieved with either or the other two contributors. |
-1 |
This reflects that none of the contributors hold this address in their address dataset anymore. |
Where an address is no longer provided by any contributor, the address will be retired. Addresses provided by contributors will be retired by Geoscape when following a review of an address, the address is considered to be no longer in use in the community and has yet to be retired from contributor databases. The retirement will be reflected in its confidence level value of -1. Up until the August 2018 release of G-NAF all retired addresses were retained in G-NAF for four releases after which they were then archived and not retained in the product. The introduction of the ADDRESS_FEATURE table in August 2018 with the tracking of change to addresses, requires the need to retain all retired addresses to show change over time.
8.2.4. Merge criteria¶
Addresses which share similar characteristics from the different contributors are merged into a single record. These shared characteristics are known as the merge criteria. The fields comprising the G-NAF merge criteria are:
STATE_ABBREVIATION
LOCALITY_NAME
PRIMARY_POSTCODE
STREET_NAME
STREET_TYPE
STREET_SUFFIX
NUMBER_FIRST_PREFIX
NUMBER_FIRST
NUMBER_FIRST_SUFFIX
NUMBER_LAST_PREFIX
NUMBER_LAST
NUMBER_LAST_SUFFIX
FLAT_NUMBER_PREFIX
FLAT_NUMBER
FLAT_NUMBER_SUFFIX
LEVEL_NUMBER
Note
A G-NAF ID or address_detail_pid relates to a unique combination of these merge criteria fields. This address_detail_pid will persist with the address while it remains in the dataset. Where values in fields which are not included in the merge criteria (from the ADDRESS_DETAIL table) change in consecutive product releases, the address_detail_pid will not change. However, the associated date_last_modified field will.
8.2.5. Merge criteria changes¶
When any element of the merge criteria changes, the new record is treated as a new address and inserted into G-NAF as such.
Example
This example shows Unit 3 21 Smith Street Burwood (address_detail_pid = GAVIC411711441) being changed to Unit 3 21 Brown Street Burwood by a contributor. The street name change will mean it is no longer possible to match the new incoming record to an existing G-NAF record, so a new G-NAF record (address_detail_pid = GAVIC998999843) is created.
As the existing address (i.e. GAVIC411711441) is now only supported by two contributors, its confidence level is reduced to 1. The new incoming address, only supported by one contributor, will get a confidence of 0.
8.2.6. Address duplication¶
As multiple contributors supply data nominally covering the same area, there is a possibility that there are duplicate addresses which represent the same addressable location. The above example simplistically demonstrates how this could occur. Geoscape has developed a sophisticated series of production processes in an effort to counter these issues. The majority of this duplication has occurred as a result of the following: The use of both ranged and non-ranged addresses for the same site (e.g. 22-28 Sydney Street vs 22 Sydney Street).
The use of a flat number as opposed to a number_first suffix for the same site (e.g. 2/27 Melbourne Street vs 27B Melbourne Street).
Where one contributor supplies a level number as part of an address string and another contributor does not supply the level number for the same site. This tends to occur on properties where “hotel style addressing” is used (e.g. Level 3, 302/50 Adelaide Street vs 302/50 Adelaide Street).
Where circumstances of this nature have been identified during processing, alias principal relationships have been established to prevent the duplication of addresses.
8.2.6.1. Alias Management¶
The usability of G NAF is greatly enhanced by the inclusion of alias information that captures addresses in popular use irrespective of official status. Geoscape recognises that G NAF has a role to play in progressing usage of official gazetted addresses. However, it is also acknowledged that the issue cannot be forced and in some cases, it will take generational change to see alias or incorrect addresses taken out of everyday usage. It is also considered that the benefits of the inclusion of aliases outweigh the costs; particularly in the application of G NAF by emergency services. There are three levels of aliases in the G NAF schema:
Alias Address - where an individual address is also known by another name
Alias Street/Locality Address - where a street/locality pair does not exist in the reference data and is the synonym or incorrect spelling of a street/locality pair that does exist.
Alias Locality Address - where a locality does not exist in the reference data and is the synonym or incorrect spelling of a locality that does exist
8.2.6.2. Alias address¶
Alias addresses (ADDRESS_ALIAS) are addresses, other than the principal address, that refer to the same physical location as another address record.
An address level alias refers to the same address site which is identified by different address elements. The relationship between addresses at a specific site is modelled through a principal and alias attribute and join table.
8.2.6.3. Alias street/locality¶
Alias street/locality (STREET_LOCALITY_ALIAS) is used to determine addresses that refer to the same physical location as another address record, where the street/locality is different. Where it is identified that the street/locality in an address from a contributor was incorrect (e.g. spelling error), a rule (see below) is created to manipulate the data during the scrubbing process.
8.2.6.4. Alias locality¶
Alias localities (LOCALITY_ALIAS) are used to determine those addresses that refer to the same physical location as another address record, but where the locality is different. The example locality ‘”CITY” will exist in the LOCALITY table and an entry for “CANBERRA CITY”’ will exist in the LOCALITY_ALIAS table.
8.2.6.5. Using alias datasets¶
When using G NAF to validate an address, the steps are:
Is there a principal address for this address?
Is there an alias address for this address?
- Is there an alias locality for the locality of the address?
This can be determined by checking the locality name of the address against the LOCALITY_NAME field in the LOCALITY_ALIAS table; the locality_pid is then used to determine the correct locality_name from the LOCALITY table. The next step would be to retry steps 1 & 2 with the new locality_name.
- Is there an alias street/locality for the address?
This can be determined by checking the street name of the address against the street_name, street_type, street_suffix fields in the STREET_LOCALITY_ALIAS table; the street_pid is then used to determine the correct street_name from the STREET table. The next step would be to retry steps 1, 2 & 3 with the new street name.
8.2.7. Processing links to other Geoscape Data¶
8.2.7.1. Administrative Boundaries¶
There are three layers within the Administrative Boundaries product that have linkages to G-NAF:
Suburbs/Localities
Mesh Blocks 2011 (ABS Boundaries 2011 theme).
Mesh Blocks 2016 (ABS Boundaries 2016 theme).
Suburbs/Localities is a reference dataset for G-NAF and is the source for identifying the official locality name for an address, where available. The suburbs/localities geometry is also an important part in the allocation of geocodes for locality and street-locality geocodes generated for G-NAF.
8.2.7.2. Roads¶
Geoscape Roads is a reference dataset that is used for the processing of G-NAF. The roads data is a fundamental part of an address and is used as the source for the allocation of road names in the STREET_LOCALITY table. The roads geometry is also used in the allocation of the street-locality level geocodes.
8.2.7.3. Legal Parcel Identifier¶
The ADDRESS_DETAIL table contains a field called LEGAL_PARCEL_ID, the process involves incorporating the cadastral information captured from the address supplied by the jurisdiction, where possible. This process is done at the time that the address data is supplied by the jurisdiction and more accurately represents the cadastral information used for an address by the jurisdiction. Addresses from other contributors will also be allocated the same cadastral information where the geocode is at the same location. The LEGAL_PARCEL_ID field is populated with the cadastral information using the same concatenations (where applicable) as adopted for the PARCEL_ID used in the Cadastre product as shown in the table below.
State |
Concatenation |
Examples |
---|---|---|
ACT |
DISTRICT_SHORT/DIVISION_SHORT/SECTION/BLOCK |
CANB/BRAD/18/41 |
DISTRICT_SHORT/DIVISION_SHORT/SECTION/BLOCK/UNIT |
BELC/BRUC/78/17/2 |
|
DISTRICT_SHORT/DIVISION_SHORT// |
CANB/CITY// |
|
NSW |
If SECTIONNUM is <NULL> then |
|
LOTNUMBER/PLANNUMBER else |
13/31993 |
|
If SECTIONNUM has a value then |
11/C/3625 |
|
LOT_NUMBER/SECTIONNUM/PLANNUMBER |
3/23/2163 |
|
PLAN_LABEL |
4994-1497 |
|
NT [1] |
PAR_LOC/PAR_PAR/PAR_LTO |
550/3252/ |
055/C/60001 |
||
QLD |
LOT/PLAN |
66/RP139841 |
SA |
PLAN_T/PLAN/PARCEL_T/PARCEL |
D/10001/A/14 |
TAS |
PLAN/LOT |
158882/1 |
VIC [2] |
PARCEL.SPI |
1TP201500 |
CMPS405814 |
||
PC370718 |
||
WA |
PI_PARCEL/LOT_NUMBER |
S030337/1 |
P003008/74 |
||
Jervis (OT) |
DISTRICT_SHORT/BLOCK |
JERV/927 |
Cocos (OT) |
Same as WA. |
|
Norfolk (OT) |
LOT/PORTION/SECTION |
66/41a27/16 |
8.2.7.4. Jurisdiction Property Identifier¶
The ADDRESS_DETAIL table includes a field called GNAF_PROPERTY_PID that includes the property identifier provided by the jurisdiction for the property associated with the address. This identifier is the same as the CONTRIBUTOR_ID in the Property product as shown in the table below.
State |
Concatenation |
---|---|
ACT |
TITLE + “/” + UNIT |
NSW |
PROPID |
NT |
VOLUME_TYP + “/” + VOLUME_NO + “/” + FOLIO_NO |
QLD |
PROPERTY_ID |
SA |
ASSNO_TENSEQNO |
TAS |
PID |
VIC |
PFI |
WA |
VPU_VE_NUMBER |
8.3. Maintenance scope¶
Data for existing objects with changed geometry and attributes as well as data for new objects within the release period are included in the release.