Common Customizations

A major goal of PgOSM Flex is support a wide range of use cases for using OpenStreetMap data in PostGIS. This chapter explores a few ways PgOSM Flex can be customized.

Selecting region and subregion

The most used customization is the region and subregion selection. The examples throughout this project's documentation use the --region=north-america/us and --subregion=district-of-columbia because it is a small region that downloads and imports quickly.

docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america/us \
    --subregion=district-of-columbia

By default PgOSM Flex will attempt to download the necessary data files from Geofabrik's download server. Navigate the Region/Sub-region structure on Geofabrik to determine exactly what --region and --subregion options to choose. This can be a bit confusing as larger subregions can contain smaller subregions. Feel free to start a discussion if you need help figuring this part out!

If you want to load the entire United States subregion, instead of the District of Columbia subregion, the docker exec command is changed to the following.

docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america \
    --subregion=us

For top-level regions, such as North America, leave off the --subregion option.

docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=north-america

Specific input file

The automatic Geofabrik download can be overridden by providing PgOSM Flex with the path to a valid .osm.pbf file using --input-file. This option overrides the default file handling, archiving, and MD5 checksum validation. With --input-file you can use a custom osm.pbf you created, or use it to simply remove the need for an internet connection from the instance running the processing.

Note: The --region option is always required, the --subregion option can be used with --input-file to put the information in the subregion column of osm.pgosm_flex.

Small area / custom extract

Some of the smallest subregions provided by Geofabrik are quite large compared to the area of interest for a project. The osmium tool makes it quick and easy to extract a bounding box. The following example extracts an area roughly around Denver, Colorado. It takes about 3 seconds to extract the 3.2 MB denver.osm.pbf output from the 239 MB input.

osmium extract --bbox=-105.0193,39.7663,-104.9687,39.7323 \
    -o denver.osm.pbf \
    colorado-2023-04-18.osm.pbf

The PgOSM Flex procesing time for the smaller Denver region takes less than 20 seconds on a typical laptop, versus 11 minutes for all of Colorado.

docker exec -it \
    pgosm python3 docker/pgosm_flex.py \
    --ram=8 \
    --region=custom \
    --subregion=denver \
    --input-file=denver.osm.pbf \
    --layerset=everything

Customize load to PostGIS

There are a few ways to customize exactly how data is loaded to PostGIS / Postgres.

SRID

PgOSM Flex defaults to SRID 3857 matching the default osm2pgsql behavior. This can be customized using --srid 4326 or any other SRID supported by osm2pgsql and PostGIS.

Schema name

Experimental, added for v0.10.1.

The --schema-name option allows customizing the schema name from the default of osm. The schema name option allows loading multiple source files into multiple regions.

While schema name can be customized when using --replication, it cannot be used to load multiple regions with replication.

Language

The --language option enables defining a preferred language for OpenStreetMap names. If --language=en is defined, PgOSM Flex's helper.get_name() function will use name:en if it exists. The usage and effect of this option is shown in this comment.

Using -e PGOSM_LANGUAGE=kn for U.S. West results in most state labels picking up the Kannada language option. The states without a name:kn default to the standard name selection logic.

Data only

The --data-only option skips creating optional data structures in the target database. This includes the helper tables in the pgosm schema and the QGIS layer style table.

Skip nested place polygons

The nested place polygon calculation (explained in this post) adds minimal overhead to smaller regions, e.g. Colorado with a 225 MB PBF input file. Larger regions, such as North America (12 GB PBF), are impacted more severely as a difference in processing time. Calculating nested place polygons for Colorado adds less than 30 seconds on an 8 minute process, taking about 5% longer. A larger region, such as North America, can take 33% longer adding more than an hour and a half to the total processing time. See the performance section for more details.

Use --skip-nested to bypass the calculation of nested admin polygons.

Use --pg-dump to export data

The --pg-dump option was added in 0.7.0. Prior versions defaulted to using pg_dump and provided a --skip-dump option to override. The default now is to only use pg_dump when requested. See #266 for more.

A .sql file can be created using pg_dump as part of the processing for easy loading into one or more external Postgres databases. Add --pg-dump to the docker exec command to enable this feature.

The following example creates an empty myosm database to load the processed and dumped OpenStreetMap data.

psql -d postgres -c "CREATE DATABASE myosm;"
psql -d myosm -c "CREATE EXTENSION postgis;"

psql -d myosm \
    -f ~/pgosm-data/pgosm-flex-north-america-us-district-of-columbia-default-2023-01-21.sql

The above assumes a database user with superuser permissions is used. See the Postgres Permissions section for a more granular approach to permissions.

Use --help

The PgOSM Docker image can provide command line help. The Python script that controls PgOSM Flex's behavior is built using the click module, providing built-in --help. Use docker exec to show the full help.

docker exec -it pgosm python3 docker/pgosm_flex.py --help

The first portion of the --help output is shown here.

Usage: pgosm_flex.py [OPTIONS]

  Run PgOSM Flex within Docker to automate osm2pgsql flex processing.

Options:
  --ram FLOAT               Amount of RAM in GB available on the machine
                            running the Docker container. This is used to
                            determine the appropriate osm2pgsql command via
                            osm2pgsql-tuner recommendation engine.  [required]
  --region TEXT             Region name matching the filename for data sourced
                            from Geofabrik. e.g. north-america/us. Optional
                            when --input-file is specified, otherwise
                            required.
  --subregion TEXT          Sub-region name matching the filename for data
                            sourced from Geofabrik. e.g. district-of-columbia
  --data-only               When set, skips running Sqitch and importing QGIS
                            Styles.