Earth is flat. Locally. And globally (in a way)

About different approaches to storing coordinates

No, it’s not about any conspiracy theory about Earth being flat, but just how it is made, that we see flat maps of our globe. The challenge is that if the Earth is not flat, paper and screens are. How should we accurately store coordinates?" Through the years, multiple systems were designed, but today, we use one to write locations: latitude and longitude, and another to keep the data in our databases.

For the article, I will use OSM data and software like PostgreSQL (with Postgis extension) Google Maps, and QGIS.

So how are the maps flat?

While representing maps on a sphere, it looks pretty accurate, because it is (simplifying some stuff), but… how do we represent stuff on flat surfaces, like paper on screen?

Let’s take a look at Europe on Google Maps in 3D:

Earth looks round, now focus on only one country:

It Looks flat enough, especially if we zoom in. But the problem is, that if we rotate the Earth a bit, the shape of the same country will look completely different:

Then we turn off the 3D view and we see something like this:

Greenland looks huge! But going back to 3D:

Oh.. What happened?

In short: (on the 2D view) what’s rendered on the equator, is 1:1 to the reality. The furthest from the equator, the more distorted things are, in short: widened and elongated. While 1km looks like 1km on the equator, the same amount of pixels at 45 degrees would be 1,41km (1/cos(45°)) in reality. Directly on poles 1 point would take the whole width of the map, that’s why this is not even rendered.

When we were looking directly at Poland, we projected it on the screen.

Coordinate Reference System (Spatial Reference System)

A Coordinate Reference System (CRS), also known as a Spatial Reference System (SRS), specifies how spatial data is represented, including the meaning of the numbers, the origin of the coordinate system, and other relevant details. We categorize CRS into two primary types of projections:

  1. Geographic: This type uses latitude and longitude coordinates (e.g., EPSG:4326), expressed in angular measurements.

  2. Projected: In this type, coordinates are represented on a flat surface (e.g., EPSG:3857, EPSG:2180, among many others, I’ll explain later), where one unit corresponds to one meter (or foot).

Each Coordinate Reference System (CRS) is assigned a unique code and a Spatial Reference Identifier (SRID) number, which is used in PostGIS to identify the CRS.

The EPSG (European Petroleum Survey Group) is the organization that defines these codes and SRIDs, providing a standardized reference for various spatial reference systems.

There are also other providers for CRS, like ESRI (a GIS solutions provider), IAU (for planets and moons), or… Australia’s GDA (it’s a complicated topic, this continent moves over 6cm and rotates a bit yearly!)

Latitude and Longitude (EPSG:4326, WGS84)

Those two are the simplest answers: we put a pin somewhere on the Earth, marking it as a center, and then using degrees, we can point to any point on the Globe. In short. The center is somewhere near Africa in the ocean. Why there? Because:

  • Latitude: It’s on the equator, the furthest point (line) from the poles (the north and south, not those in Europe)

  • Longitude: it’s the point of Greenwich (a district in London)

This system works pretty well because it’s universal and aware that Earth isn’t flat. Of course, points on the ground also have height, but… let’s omit that.

Also remember, that the latitude is the y coordinate (higher/lower latitude means further from the equator), and the longitude is x (higher/lower means further from Greenwich).

While it’s accurate, it’s slow for any calculations.

When rendered on a flat screen, it looks like this:

The further from the equator, the more “flattened” vertically everything is.

For the advantages of this system we can count the fact, that it’s a rather small number before the dot (-180 to 180 for longitude, -90 to 90 for latitude), easy to remember and verify if it’s “the right area” (if you work at 18° longitude and 50° latitude and suddenly see there 20° longitude and 35° latitude, you know something isn’t right), and the amount of digits after the dot makes it more precise with each digit, you can stop any time. However, 4 digits after the dot are sufficient to point at a building.

You can also find systems like Google’s Plus Code, but they are just wrappers for lat/long.

WGS 84 / Pseudo-Mercator: EPSG:3857

Multiple names for the same thing. In short, it maps lan/lot to Euclidean coordinates. That way 17.34N 52.1231W will be 1960439.216 5802316.95. The number looks weird, but it makes sense and simplifies everything. Really. On the equator (lat=0), 1 unit will be equal to 1 meter in reality. That’s why a pseudo-mercator is awesome! But again, the further from the equator, the less accurate the calculations are. How much? Mentioned equation earlier will be useful here: 1 unit = 1/cos(lat) meters.

It looks like this when rendered on a flat screen:

This CRS compensates for the flattening making the shape look much more realistic, however, the sizes are messed up. Things far from the equator look much bigger than they should be. You can read about it by looking for the “Mercator effect”.

Anyway, the problem is that the further from the equator, the unit represents a longer distance.

Coordinates in this system are expressed in "meters," resulting in numbers that often span millions. While these large values are just numbers for software, they can be challenging for humans to read, remember, or manually write due to the sheer number of digits.

Local projections

If you are working on a region of the size of a country, or even smaller, consider using a projection for that local region. For example, for Poland, there is a projection called EPSG:2180 (screen from QGIS):

For even smaller areas, there is EPSG:2177, which covers only a stripe:

With those projections, you can assume that 1 unit equals 1 meter in the real world. No further calculations are required, and there are no errors we need to be aware of.

With QGIS, you can browse different CRSs and see the maps with fragments of the world they cover. You can also use a website like https://epsg.io/2180, for information without using QGIS.

However, you must keep your geometry inside the projection’s bound! Outside it will lose its accuracy and will be useless.

While I’ll focus on EPSG:2180, there are plenty of others: EPSG:4839 for Germany, EPSG:2062 for Spain, or universal like UTM Zones that don’t care about political boundaries. I won’t dive into how they work, because it’s not my goal in this article.

For example, ED50 / UTM Zone 34N (a.k.a. EPSG:23034): it is not correlated to any country, just Europe, 18°E to 24°E

Web mercator EPSG:3857 vs EPSG:4326 vs EPSG:2180

Most (if not every) web map uses EPSG:3857 (Google calls it EPSG:900913, but it’s the same as EPSG:3857, but they are.. google. It’s a leet code anyway), because as I mentioned, it’s a pretty accurate representation that locally looks good.

Let’s see how Poland looks in each CRS:

EPSG:4326:

EPSG:3857:

And local EPSG:2180, actual flat projection:

The EPSG:3857 is close enough to the actual shape of the local geometry. This image shows the differences: blue is EPSG:2180, red is EPSG:3857.

While EPSG:2180 represents the country’s actual shape (it should be the most accurate), EPSG:3857 is close enough. So why bother with local projections?

Calculations performance and precision

If we want to calculate the distance between points in lon/lat (EPSG:4326), we need to use a complicated equation:

$$\mathrm{distance} = \arccos(\sin(\mathrm{lat}_1) \cdot \sin(\mathrm{lat}_2) + \cos(\mathrm{lat}_1) \cdot \cos(\mathrm{lat}_2) \cdot \cos(\mathrm{lon}_2 - \mathrm{lon}_1))) \cdot 6371$$

Sure, doable, and pretty accurate, but… slow. 6 trigonometry functions. For Euclidean geometry, there is a much simpler equation:

$$\mathrm{distance} = \sqrt{\left(x_2 - x_1\right)^2 + \left(y_2 - y_1\right)^2}$$

The difference for CPU calculations is huge: the first one takes (for a modern CPU, don’t take this too seriously, it’s not as simple): 545 cycles (561 with having to calculate radians from degrees), while the second is just 95 cycles. Over 5 times faster.

That’s why geographical coordinates lat/long are not worth using for calculations. Not to mention many other geometrical methods would require a completely different approach, like calculating area…

Testing

I’ve downloaded OSM data for Poland, especially roads (highway=*) and fuel stations (amenity=fuel). Then, for roads, I added 2 additional columns (to the default one in EPSG:3857) with geometries in EPSG:4326 and EPSG:2180. I will use the distance (<->) operator.


--EPSG:3857
explain analyze select count(*)
from import.osm_road
where geometry <-> st_setsrid(st_makepoint(1849106, 6522473), 3857) < 10000;

-- EPSG:2180
explain analyze select count(*)
from import.osm_road
where geometry_2180 <-> st_transform(st_setsrid(st_makepoint(1849106, 6522473), 3857), 2180) < 10000;

-- EPSG:4326 (geometry)
explain analyze select count(*)
from import.osm_road
where geometry_4326 <-> st_transform(st_setsrid(st_makepoint(1849106, 6522473), 3857), 4326) < 10000;


-- EPSG:4326 (geography)
explain analyze select count(*)
from import.osm_road
where geometry_4326::geography <-> st_transform(st_setsrid(st_makepoint(1849106, 6522473), 3857), 4326)::geography < 10000;

Results:

CRScolumn typeResultTime
EPSG:2180geometry8177~640ms
EPSG:3857geometry4679~630ms
EPSG:4326geometry5650317~750ms
EPSG:4326geometry (cast to geography)8187~5050ms

(this is not a benchmark, time measurements aren’t exact and are rather to show the presence of differences)

Here is what’s going on:

  • geometry-based calculations don’t care about which projection they use: the time differences there are most likely results of the number of rows

  • Geometry calculations for EPSG:4326 make no sense at all, it has to be cast into geography type

    • or better: just use the type of column Geography if you want to use EPSG:4326
  • Geographical calculations for EPSG:4326 are very slow (as we predicted earlier)

  • Calculations for EPSG:3857 result in a weirdly low number, compared to EPSG:2180 and EPSG:4326

  • Even the results for EPSG:2180 and EPSG:4326 are slightly different

Can we make it better? Of course, we can. Running explain analyze shows that it doesn't use Indexes

Finalize Aggregate  (cost=1943571.58..1943571.59 rows=1 width=8) (actual time=598.505..606.474 rows=1 loops=1)
  ->  Gather  (cost=1943571.37..1943571.58 rows=2 width=8) (actual time=598.394..606.466 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=1942571.37..1942571.38 rows=1 width=8) (actual time=583.419..583.420 rows=1 loops=3)
              ->  Parallel Seq Scan on osm_road  (cost=0.00..1940609.45 rows=784766 width=0) (actual time=64.368..583.331 rows=2726 loops=3)
                    Filter: ((geometry_2180 <-> '0101000020840800008939DBF1682A144160A49A99EA991141'::geometry) < '10000'::double precision)
                    Rows Removed by Filter: 1880713
Planning Time: 0.071 ms
JIT:
  Functions: 14
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 0.819 ms (Deform 0.340 ms), Inlining 78.209 ms, Optimization 70.354 ms, Emission 44.259 ms, Total 193.642 ms"
Execution Time: 606.802 ms

Postgres can’t use indexes, because we force the query to calculate the distance, and then filter out the rows with geometry further from the point than the given distance.

Use ST_DWithin for getting rows within a radius

The function ST_DWithin exists just for that: it returns true if geometry A and geometry B are apart from each other at most given the radius. It utilizes k-tree indexes and makes it much faster. Let’s try again to find the roads near a point, this time using ST_DWithin:

ProjectiontypeResultTime
EPSG:2180geometry8177~30ms
EPSG:3857geometry4679~30ms
EPSG:4326geography8175~60ms

As we can see, the performance boost is huge. Over 20 times faster for projections, 10 times faster for geography.

Finalize Aggregate  (cost=65154.73..65154.74 rows=1 width=8) (actual time=29.152..34.180 rows=1 loops=1)
  ->  Gather  (cost=65154.52..65154.73 rows=2 width=8) (actual time=29.070..34.177 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=64154.52..64154.53 rows=1 width=8) (actual time=15.315..15.316 rows=1 loops=3)
              ->  Parallel Bitmap Heap Scan on osm_road  (cost=211.18..64153.93 rows=235 width=0) (actual time=9.665..15.231 rows=2726 loops=3)
"                    Filter: st_dwithin(geometry_2180, '0101000020840800008939DBF1682A144160A49A99EA991141'::geometry, '10000'::double precision)"
                    Rows Removed by Filter: 559
                    Heap Blocks: exact=708
                    ->  Bitmap Index Scan on osm_road_geometry_2180_index  (cost=0.00..211.04 rows=7267 width=0) (actual time=23.086..23.087 rows=9855 loops=1)
"                          Index Cond: (geometry_2180 && st_expand('0101000020840800008939DBF1682A144160A49A99EA991141'::geometry, '10000'::double precision))"
Planning Time: 3.780 ms
Execution Time: 34.869 ms

All results are still the same… except EPSG:4326. While explain analyze says it used the index:

Aggregate  (cost=9358.34..9358.35 rows=1 width=8) (actual time=117.243..117.243 rows=1 loops=1)
  ->  Index Scan using osm_road_geometry_4326_index on osm_road  (cost=0.54..9356.93 rows=565 width=0) (actual time=8.245..116.983 rows=8175 loops=1)
"        Index Cond: (geometry_4326 && _st_expand('0101000020E6100000739C4DD95D9C3040951BEE8F0F384940'::geography, '10000'::double precision))"
"        Filter: st_dwithin(geometry_4326, '0101000020E6100000739C4DD95D9C3040951BEE8F0F384940'::geography, '10000'::double precision, true)"
        Rows Removed by Filter: 4271
Planning Time: 0.231 ms
Execution Time: 57.598 ms

The result is different for some reason. It’s most likely because the indexing method used by ST_DWithin doesn’t calculate the distance using the same method:

select id, geometry_4326 <-> point as distance
from import.osm_road
where
    ST_DWithin(geometry_4326, point, 11000)
    and not ST_DWithin(geometry_4326, point), 10000)
order by distance asc;

This is because the k-tree used as part of the index stored in degrees does not reflect a realistic distance calculation, cutting off many leaves with positions that should fall into the search range

Which result is correct?

I would say that the one performed on EPSG:2180 is the best, because it is not subject to any distortion, does not require any corrections, and does not require any twisted formulas to calculate distances. Even the fix applied to EPSG:3857 is not perfect, because it would need to be calculated separately for each pair we calculate distance for. What corrections?? I will explain later.

But it all depends on how precise your calculations need to be. If you are fine with “items not further than about 10km”, it’s ok to stay with EPSG:3857 (it’s still faster than EPSG:4326). Using local projections is required if you need really precise calculations.

Use EPSG:3857 and correct the radius: latitude-based scaling

Do you remember when I mentioned the distortion growing with the distance from the equator? We can calculate it and represent it with a single graph:

It means for latitude 60° the result distance will be twice that much than you expect. 1000 units would be 2km. If you want to calculate the actual distance, you need to multiply it by 1 / cos(lat).

The point I was testing had a latitude equal to 50.44°N. if we want objects in a radius of 10km, we need to multiply that value by 1 / cos(50.44°), which is about 1.57, so set the radius to 15700 units instead of 10000. With that correction, the query returns 8179 entries now. This is now much closer to the correct (as we assume) number 8177.

Look at this monster with that applied correction (assuming point is st_setsrid(st_makepoint(1849106.5790939943, 6522473.732475682), 3857), just a Geometry point in EPGS:3857):

from import.osm_road where ST_DWithin(geometry, point, 10000 * 1.57);

And this is as fast, as querying without the correction. It’s simple multiplication after all. For short distances, this correction should be enough. For long, again, enough if you don’t need to be precise. Let’s fix the result table:

ProjectiontypeResultTime
EPSG:2180geometry8177~30ms
EPSG:3857 (fixed)geometry8179~30ms
EPSG:4326geography8175~60ms

There are still 3 different answers here, let’s fix it.

Take care of the edge cases differently

If you want to get exact calculations, for edge cases (close to the radius) you can use a different projection than the one you have geometry in.

Calculating distance in EPSG:4326 is very expensive, so limit it just to the border cases:

  • if the distance is bigger than 10000 × 1 / cos(lat) * 1.1, filter out

  • if the distance is less than10000 × 1 / cos(lat) * 0.9, accept

  • calculate distance using the geographical method

For the third point, we could use one of:

  • ST_DWithin(ST_Transform(geography, 2180), ST_Transform(poiont, 2180), distance)

  • ST_Transform(geometry, 4326)::geography <-> st_transform(point, 4326)::geography < distance

Which will look like:

select count(*)
from import.osm_road
where ST_DWithin(geometry, point, 10000 * 1.1 * 1.57)
  and (
    ST_DWithin(geometry, point, 10000 * 0.9 * 1.57)
    or ST_DWithin(ST_Transform(geography, 2180), ST_Transform(poiont, 2180), distance)
  );

Performance impact is negligible (at least in my case), and the result is the same as if we would have geometry in EPSG:2180 projection in the first case. With transforming to EPSG:4326 execution time rose from 30 to circa 40ms.

But why are there differences in how many rows were returned? How can mere projection impact the precision of distance calculation?

Calculating distances, deep dive

Let’s figure out what’s happening here, and why there are any differences.

EPSG:2180 and EPSG:3857 use a simple equation with the root of the sum of two squares, while EPSG:4326 (lat/lon) uses more complex computations with trigonometry functions. The second one will provide different answers due to the characteristics of the calculation method and its inaccuracy. Even the calculated length will be different for each setting. Let’s take a point (within EPSG:2180 bounds) and calculate the distance to a point 10km to the north, south, east, and west (basically, just add 10000 to coordinates):

select st_setsrid(st_makepoint(530300, 538800), 2180) <-> st_setsrid(st_makepoint(530300, 548800), 2180) as epsg2180,
       st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 4326)::geography <->
       st_transform(st_setsrid(st_makepoint(530300, 548800), 2180), 4326)::geography   as epsg4326,
       st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 3857) <->
       st_transform(st_setsrid(st_makepoint(530300, 548800), 2180), 3857)   as epsg3857,
       (st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 3857) <->
       st_transform(st_setsrid(st_makepoint(530300, 548800), 2180), 3857)) / 1.650726903100751   as epsg3857_fix
;

Adding other directions, we’ll receive the results:

EPSG:2180EPSG:4326 geographyEPSG:3857 with fixEPSG:3857 with fix
10km to the north100009999.09916541.31010020.622
10km to the south100009999.25216507.50210000.141
10km to the west100009974.54016483.7569985.756
10km to the east100009974.46716483.4239985.554

If we assume the calculations in EPSG:2180 are the most precise (which should be true), we see how different other calculations are. And notice how wrong is <-> for EPSG:3857 without applying the fix for the given latitude!

Visualization

To visualize how different calculations are in different projections, I prepared an example:

This is a result of the script:

insert into testing (name, geometry)
values ('EPSG:3857 50km',ST_Transform(st_buffer(st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 3857), 50000, 32), 2180)),
       ('EPSG:3857 50km+fix',ST_Transform(st_buffer(st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 3857), 50000 * 1.650726903100751, 32), 2180)),
       ('EPSG:4326 50km',ST_Transform(st_buffer(st_transform(st_setsrid(st_makepoint(530300, 538800), 2180), 4326)::geography, 50000, 32)::geometry, 2180)),
       ('EPSG:2180 50km', st_buffer(st_setsrid(st_makepoint(530300, 538800), 2180), 50000, 32))
;

It’s ST_Buffer called on the same point, but in different projections.

The blue one in the middle is of course EPSG:3857 without fix. The other three are looking pretty close to each other, but…

The east side looks like this (The west side looks the same, just mirrored):

While North looks like this:

South:

Blue one is EPSG:2180, Green is EPSG:4326, and Red is EPSG:3857 with the fix. While green and blue are close to each other (I’ll explain later), the red one is moved south. If we change the radius to 250km:

The differences are even more noticeable. Notice that because the latitude changes in the vertical axis, the red circle is close to the green one horizontally, but is Also, notice the note on ST_Buffer:

For geography this is a thin wrapper around the geometry implementation. It determines a planar spatial reference system that best fits the bounding box of the geography object (trying UTM, Lambert Azimuthal Equal Area (LAEA) North/South pole, and finally Mercator ). The buffer is computed in the planar space, and then transformed back to WGS84. This may not produce the desired behavior if the input object is much larger than a UTM zone or crosses the dateline

Which means PostGIS does ST_Buffer by finding a local CRS (like mentioned earlier UTM Zone) that covers that area and performing the calculations there. That is why those two circles are so similar (but not exact). It doesn’t make the calculations with lat/long at all. However, creating the radius manually in Python gives a similar result, the lines are apart from each other (EPSG:4326 and EPSG:2180) by circa 10 to 150 meters (for 50km), which explains the difference in distances.

These rendered radiuses show how different the results for filtering items using ST_Distance or distance operator <-> might be and explain the difference between results for EPSG:2180 and EPSG:3857 with the fix.

Conclusions

Latitude and longitude are nice for people to read and write, but terrible for any calculations. Store your geometries in web mercator a.k.a. pseudo-mercator a.k.a. EPSG:3857. Then apply corrections for your calculations and enjoy fast algorithms that work on flat surfaces.

If your maps are small enough, use local projections, that are even more accurate, and you won’t have to apply any fixes or remember that Earth isn’t flat. For you, locally, it is flat (enough to not care about it).

If you ever need latitude and longitude, you can always transform any CRS to EPSG:4326 by ST_Transform(geometry, 4326), so there is no need to store that separately.

EPSG:4326 (WSG84, Latitude/longitude)

Pros:

  • Easy to read by human

  • Widely used, default in many map libraries and software

  • universal for the whole world

Cons:

  • Slow performance when calculating

  • Imprecise calculations

  • Hard to perform calculations

When to use:

  • When you need some points to show on the map

  • No precise calculations are needed

EPSG:3857 (Web mercator / pseudo mercator)

Pros:

  • Widely used

  • Efficient

  • Fast calculations

Cons:

  • Imprecise calculations as latitude increases (or decreases below 0)

  • Requires correction for latitude

When to use?

  • When you have worldwide spatial data to store

  • Some calculations are needed, but they don’t need to be exact

Local projections (e.g., EPSG:2180)

Pros:

  • Best precision

  • Easy and fast geometry calculations (we are working on a flat surface)

Cons:

  • Limited to a specific region

  • If the map is going to be rendered within EPSG:3857, it requires transforming anyway

When to use:

  • When precision is a priority

  • When there are a lot of calculations done

  • Use only when all geometries fit into the projection’s bound

TLDR;

  • Lat/long looks good for humans to read and write but is slow for calculations

  • There is no reason to use EPSG:4326

    • But if you need to (for some reason), keep it in Geography, not Geometry column

    • You can always transform EPSG:3857 to lat/lon, it’s cheap

  • If you work in a small area, find the best-fitting local projection

    • Otherwise, use EPSG:3857

    • Eventually, keep EPSG:4326 in a separate column if you need latitude and longitude.

  • If you use EPSG:4326, use Geography type, not Geometry

  • Avoid doing any precise calculations on EPSG:4326.

    • They are much slower

    • Make sure to use the column type Geography, casting may result in ignoring indexes

    • Consider transforming to a local projection, or at least EPSG:3857

  • When calculating distances or areas, remember that you shouldn’t rely on EPGS:3857 without applying the fix

    • Even with the fix, the bigger the distance or the area, the bigger the error. But is it important?
  • Utilize indices, check if they are used in your queries, and try to figure out why they aren’t

    • for example: while filtering use ST_DWithin instead of distance operator.