Wrong number format in origin files #3

Closed
opened 2025-12-28 18:11:52 +00:00 by sami · 2 comments
Owner

Originally created by @ZhangTao1596 on GitHub (May 21, 2021).

Originally assigned to: @End-rey on GitHub.

As we know, we expect specific integer and decimal parts in latitude or langitude numbers.
As I port neofs c# and do unit tests, I find there are wrong formats in origin downloaded files.

  • File: 2020-2 UNLOCODE CodeListPart1.csv, wrong longitude format
    10247: ,"CA","BHH","Blockhouse","Blockhouse","NS","--3-----","RL","1707",,"4427N 064250W",
  • File: 2020-2 UNLOCODE CodeListPart1.csv, wrong latitude format
    11243: ,"CA","JSS","Jerseyville","Jerseyville","ON","--3-----","RL","1707",,"43120N 08006W",
  • File: 2020-2 UNLOCODE CodeListPart3.csv, no lontitude symbol E or W
    7030: ,"SA","SAL","Salw�","Salwa","04","--3-----","RL","1707",,"2444N 05045",
  • File: 2020-2 SubdivisionCodes.csv, Line: 218, 219, unexpected wrap

Maybe we should correct these files and store them in this repo.

Originally created by @ZhangTao1596 on GitHub (May 21, 2021). Originally assigned to: @End-rey on GitHub. As we know, we expect specific integer and decimal parts in latitude or langitude numbers. As I port neofs c# and do unit tests, I find there are wrong formats in origin downloaded files. * File: `2020-2 UNLOCODE CodeListPart1.csv`, wrong longitude format `10247`: `,"CA","BHH","Blockhouse","Blockhouse","NS","--3-----","RL","1707",,"4427N 064250W",` * File: `2020-2 UNLOCODE CodeListPart1.csv`, wrong latitude format `11243`: `,"CA","JSS","Jerseyville","Jerseyville","ON","--3-----","RL","1707",,"43120N 08006W",` * File: `2020-2 UNLOCODE CodeListPart3.csv`, no lontitude symbol E or W `7030`: `,"SA","SAL","Salw�","Salwa","04","--3-----","RL","1707",,"2444N 05045",` * File: `2020-2 SubdivisionCodes.csv`, Line: `218`, `219`, unexpected wrap Maybe we should correct these files and store them in this repo.
sami 2025-12-28 18:11:52 +00:00
Author
Owner

@alexvanin commented on GitHub (May 21, 2021):

Unfortunately, these files will always have some inconsistency inside. Maintaining several large database files with fixes is hard and there may be some license issues. Instead we can maintain a short list of "overrided" UN/LOCODE records. These records can be applied to the database after parsing.

$ ./neofs-cli util locode generate \
  ...
  --override override.csv
  --out locode_db

If this option is okay, then we will add support of overrided values into locode generator in CLI as in example above.

For now I see that these records with invalid coordinates are simply ignored in v0.1.0 database

$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA BHH"
Error: record not found
$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA JSS"
Error: record not found
$ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "SA SAL"
Error: record not found

I think it is okay for now. But before N3 release (maybe for RC3) we will recompile it with newer database files and list of overrided values and publish it as v0.2.0

Thougths? @cthulhu-rider @realloc

@alexvanin commented on GitHub (May 21, 2021): Unfortunately, these files will always have some inconsistency inside. Maintaining several large database files with fixes is hard and there may be some license issues. Instead we can maintain a short list of "overrided" UN/LOCODE records. These records can be applied to the database after parsing. ``` $ ./neofs-cli util locode generate \ ... --override override.csv --out locode_db ``` If this option is okay, then we will add support of overrided values into locode generator in CLI as in example above. For now I see that these records with invalid coordinates are simply ignored in v0.1.0 database ``` $ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA BHH" Error: record not found $ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "CA JSS" Error: record not found $ ./neofs-cli util locode info --db neofs-dev-env/vendor/locode_db --locode "SA SAL" Error: record not found ``` I think it is okay for now. But before N3 release (maybe for RC3) we will recompile it with newer database files and list of overrided values and publish it as v0.2.0 Thougths? @cthulhu-rider @realloc
Author
Owner

@alexvanin commented on GitHub (Jun 9, 2021):

NeoFS CLI LOCODE generator has --in flag to provide database files. We can provide any number of such files. In case of record collisions, data from latter file is being used. Therefore we can use --in flag with override.csv file as the last argument to achieve our goal.

$ cat override.csv 
,"SA","SAL","Salwá","Salwa","04","--3-----","RL","1707",,"2444N 05045E",

$ neofs-cli util locode generate \
  --airports airports.dat \
  --continents continents.geojson \
  --countries countries.dat \
  --subdiv 2020-2\ SubdivisionCodes.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart1.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart2.csv \
  --in 2020-2\ UNLOCODE\ CodeListPart3.csv \
  --in override.csv \ 
  --out locode_db

$ neofs-cli util locode info --db locode_db --locode "SA SAL"
Country: Saudi Arabia
Location: Salwa
Continent: Asia
Subdivision: [04] Ash Sharqiyah
Coordinates: 24.44, 50.45

I propose to create separate PR that adds override.csv file in this repository. There we can discuss content of override.csv file.

@alexvanin commented on GitHub (Jun 9, 2021): NeoFS CLI LOCODE generator has `--in` flag to provide database files. We can provide any number of such files. In case of record collisions, data from latter file is being used. Therefore we can use `--in` flag with override.csv file as the last argument to achieve our goal. ``` $ cat override.csv ,"SA","SAL","Salwá","Salwa","04","--3-----","RL","1707",,"2444N 05045E", $ neofs-cli util locode generate \ --airports airports.dat \ --continents continents.geojson \ --countries countries.dat \ --subdiv 2020-2\ SubdivisionCodes.csv \ --in 2020-2\ UNLOCODE\ CodeListPart1.csv \ --in 2020-2\ UNLOCODE\ CodeListPart2.csv \ --in 2020-2\ UNLOCODE\ CodeListPart3.csv \ --in override.csv \ --out locode_db $ neofs-cli util locode info --db locode_db --locode "SA SAL" Country: Saudi Arabia Location: Salwa Continent: Asia Subdivision: [04] Ash Sharqiyah Coordinates: 24.44, 50.45 ``` I propose to create separate PR that adds `override.csv` file in this repository. There we can discuss content of `override.csv` file.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
nspcc-dev/locode-db#3
No description provided.