As connectivity becomes increasingly valuable to everyone, access to communication is becoming as important as access to other kinds of basic infrastructure. As a result there is an increased urgency to make affordable communication infrastructure accessible to ALL citizens. Yet, mobile network subscriber growth in Africa is slowing, as is revenue growth for mobile network operators. This slowdown is linked to the fact that a significant percentage of newer users come from lower income brackets living in regions that present challenges to operators, ranging from sparser population distributions to lack of effective power infrastructure. This is compounded by the fact that there is a growing digital urban-rural divide in access. If something is to be done about this, having an accurate and up-to-date understanding of who has access and who doesn’t is absolutely essential. This post outlines some work I have done for FHI360 and USAID on a methodology for mapping the unserved.
My goal is twofold: 1) to calculate the number of people not covered by mobile service in a given country; and, 2) to identify communities that could be served if they meet a threshold of population density and radius of coverage. Radius of coverage is a variable number that intended to be determined by the proposed technology for coverage. Coverage is affected by a number of variables including tower height, power output, frequencies in use, and antenna type. Typically a single tower coverage radius might range between two and ten kilometres. A community is identified by having a certain population density within the given coverage radius. The actual population number for the threshold is also a variable and is determined by the business model i.e. CAPEX and OPEX of the operator which would suggest the minimum population to be covered in order to make a sustainable business. For instance, an operator putting up solar-powered, low-cost base stations may be able to sustainably serve a lower population density in a given area than a traditional mobile networks. These two variables are meant to be adapted to specific operator solutions.
The two key resources required are a map of current mobile network signal coverage and a map of population distribution. In the example used below, mobile network coverage data from the GSMA, an industry association for mobile network operators, are used. Having established the extent of network coverage, it is then necessary to establish the following: how many people are unserved/underserved; where those unserved people live; and, specifically where the densest points of population exist in those unserved areas. There are a number of global data sources that provide GIS-based population density and distribution maps based on national census data. Each dataset has its own strengths and weaknesses.
JRC’s Global Human Settlement Layer population
WorldPop – University of Southampton
Landscan – Oak Ridge
CIESIN’s Gridded Population of the World (GPW)
CIESIN / Facebook High Resolution Settlement Layer (HRSL) Map
The newest and most significant of the above sources is the CIESIN High Resolution Settlement Layer (HRSL) map which has been produced in collaboration with Facebook. This new population map represents a substantial increase in population distribution resolution which is possible thanks to Facebook’s vast computing power and their use of machine learning algorithms to more accurately detect human settlements. Combined with national census data, this offers an unprecedented level of accuracy in mapping where people live. This in turn allows for better predictions as to where to locate towers for mobile coverage.
Unfortunately the map is currently only available for eight countries: Burkina Faso, Ghana, Haiti, Ivory Coast, Madagascar, Malawi, South Africa, Sri Lanka. This is an increase from the initial release of four countries but still limits the application of the map. In the Liberian example used in this document, WorldPop population data has been used. Liberia was chosen as an example because it there are large areas of the country without any mobile coverage, and as such, is easier to illustrate the methodology.
Mobile Coverage Mapping
Using GIS data supplied by the GSMA, a 2G coverage map is overlaid on the population data map in the form of a shapefile. This map is made available through the GSMA who, in turn, receive coverage maps from their member organisations. The accuracy of coverage maps supplied by mobile network operators is something that requires further validation. In the map below, the tower radius coverage appears to be 15 kilometres which is generous for most mobile towers. Eight to ten kilometres is generally considered more realistic although many factors influence coverage including tower height, transmitter power, and terrain features. As such the map probably overstates access. While it is often quite possible for the towers to reach mobile phones over extended distances, the critical limiting factor is the ability of the phones to return a signal to the towers.
For the purpose of this work, the GSMA coverage data is used in the absence of more accurate datasets on the assumption that a) this would establish a minimum value for populations that lack coverage; and b) that this methodology could be substantially improved if access to tower data, including location, technology, height, orientation, and power output, were made available.
Once the coverage map is overlaid on the population it is immediately possible to visually identify populations that are not currently covered by a mobile signal. The challenge now is to calculate the number of people currently unserved. This can be achieved by first calculating a vector (shapefile) layer on the map that corresponds to the inverse of the mobile coverage map.
This can be calculated in QGIS through the following steps:
- Add a shapefile for the administrative boundary of the country in question. These can be downloaded from the Global Administrative Areas database (GADM) that has been developed by Robert Hijmans, in collaboration with colleagues at the University of California, Berkeley Museum of Vertebrate Zoology (Julian Kapoor and John Wieczorek), the International Rice Research Institute (Nel Garcia, Aileen Maunahan, Arnel Rala) and the University of California, Davis (Alex Mandel), and with contributions of many others.
Once the administrative boundary has been loaded in QGIS, you can run the Symmetrical Difference function on the administrative boundary and the GSM coverage map to calculate the inverse map. The Symmetrical Difference SAGA tool from the QGIS Processing Toolbox was used to achieve this . This calculated map should then be saved as an independent shape file for future reference.
- Once the inverse 2G coverage map is available, it is then possible to run the Zonal Statistics function on the combination of the inverse coverage map and the population map. The Zonal Statistics function will calculate the population data that lies in the population raster that lies under the no coverage shapefile. The result will be added to the properties of the no coverage shape file. This can then be exported to an Excel spreadsheet using the XY Tools Plug-in for QGIS. The resulting data provides a high level picture of the number of people not currently covered by a mobile signal.
This completes the first level of GIS analysis which gives a sense of whether the country appears to have a sufficiently large unserved population to warrant further investigation.
Identifying Population Centers
In order to make a more accurate estimate of the market viability of the unserved population, it is necessary to make some calculations based on the population demographics in order to identify concentrations of populations in the unserved areas that may be the most likely points for putting up base stations. There is more than one way to address this problem. In this case, the r.neighbors algorithm from the GRASS GIS program is used within QGIS to calculate points of population density. A population raster calculates a population value for each pixel in the map. Each pixel corresponds to a specific range of geographic coverage that depends on the resolution of the map. The CIESIN / Facebook HRSL raster has a resolution of 30 square meters per pixel whereas the WorldPop population map covers 100 square meters per pixel. The r.neighbors algorithm examines the surrounding pixels of any given pixel and performs a chosen function on the pixel values. In this case the surrounding pixel values are summed and the value of that sum is placed in the source location, creating a new raster map. The resulting raster map makes it easier to see the areas of highest population density as compared with surrounding areas. In the map to the left it is possible to see how the points of population density have become more visible. The number of surrounding pixels that are calculated can be varied from as little as a 3×3 grid up to whatever level of neighbouring samples brings out the best population density highlights. In the case of Liberia, a 15×15 grid has been chosen.
Having run the r.neighbors algorithm, it is then possible to filter out population densities that do not meet a given threshold. To do this the Raster Calculator (Raster | Raster Calculator) is used in QGIS to establish a threshold that the r.neighbors map must meet. The calculation to do this within the Raster Calculator is of the format
"RasterMap" * ("RasterMap" > threshold)
where RasterMap is the name of the raster that has been calculated with the r.neighbors function and threshold is the number that has been chosen as the minimum population threshold. There is no hard and fast number to use for a threshold. The value will be dependent on the r.neighbors algorithm results. In the case of Liberia, a threshold of 150 was chosen. The result of running the Raster Calculator function is a new raster that is zero for everywhere except the regions that have met the threshold value. The next step is to create a vector shapefile identifying the regions that meet the selected population threshold value. In the map to the right, the areas in black represent the new threshold raster.
With this resulting map we can use a QGIS or GDAL function to “polygonize” the raster map into a vector format. The polygonize function draws lines around the non-zero parts of the raster map and creates a new vector layer corresponding to the raster layer. In the map below the areas in blue represent the new shapefile layer that represents regions meeting the population threshold set in the previous calculations.
The raster map is converted into this vector format in order to identify the centers of these high population areas. This is done through the Polygon Centroids function in QGIS (Vector | Geometry Tools | Polygon Centroids) or can be calculated on the command line using the gdal_polygonize.py tool or through the development of custom programs. The centroid of a polygon is its assumed center of gravity. This can be visualized as the point on which the polygon would balance if it was made of a rigid sheet. This is used to establish a point that approximates the point of maximum population density within any given polygon on the map. Inevitably this is an approximation but one that allows us to calculate an epicenter within each identified coverage area.
In the map to the right, the calculated centroids can be seen. The resulting calculation is yet another shapefile. The shapefile of points representing the centroids of the polygons representing areas that met the population density threshold serve as possible point for locating towers for new coverage. In calculating this, the first step is to look at centroid points that fall outside existing coverage areas. This can be calculated by using the Clip function (Vector | Geoprocessing Tools | Clip) in QGIS using the centroid points as the Input Layer and the 2G no coverage map as the Clip Layer. This produces a subset of the centroids which fall outside the 2G coverage area. The map below show the centroids will fall into the no coverage areas.
Having established these points as possible locations for base stations, we can calculate a buffer zone around each point to simulate a coverage area. We can choose the radius for the base station based on the technology that is expected to be used in the areas. Operators now have a range of base stations technologies that can offer different coverage options based on power output, frequency, tower height, and antenna type. In this case we have chosen a radius of 4.5 kilometers.
In order to calculate the buffers, we first need to project the centroid points into a GIS projection that supports calculations in meters. The standard QGIS projection is a Mercator projection which measures in radians. By selecting the centroid layer in QGIS, you can Save As and, before saving, select a projection that is appropriate to the region you are working in .
Once the centroid layer has been re-projected, you can then run the buffer function in QGIS (Vector | Geoprocessing Tools | Fixed Distance Buffer). When selecting the Fixed Distance Buffer, tick the Dissolve Result checkbox in order to have overlapping buffers merged into a single polygon. There are trade-offs in choosing to dissolve the buffers though as it creates large coverage areas which cannot be addressed by a single tower. The benefit is that it identifies regions where multiple communities or larger communities may be served. There is probably an improvement that could be made here.
More investigation is yet required to establish the correct unit for the Distance option in the Fixed Distance Buffer. Finding a radius of 4.5 kilometers was done by trial and error. In this case a value of .045 produced that result which suggests the unit was 100 kilometers. In the map to the right the calculated buffers can be seen.
Once the buffers have been calculated, we need to exclude the areas where the buffers overlap with existing coverage areas. Depending on how accurate you feel the mobile coverage maps are, you may or may not choose to do this. These are rough estimates at best now as they do not map actual projections of radio coverage but simple circles as approximations. To remove the overlapping regions, we can Clip the buffers with the map of unserved areas.
Once that is done, we run Zonal Statistics (Raster | Zonal Statistics) again and calculate the population that would be covered under these possible new areas of coverage. Zonal Statistics will calculate population numbers for each polygon. As before, those statistics then become properties of the buffered polygons themselves.
It is then possible to choose to map the colors of those buffers to reflect the estimated population coverage of each buffer zone. Selecting the color gradation for the buffers can be done through the Properties function which is available by right-clicking on the buffer layer in QGIS. In this case, a graduated scale has been used with equal count quantiles. The resulting map, shown below, provides an indication of possible sites for new coverage based on 4.5km radius coverage. This map could be further refined by setting a lower population bound for the buffers.
It is important to point out that this map requires further refinement of both mobile signal coverage sources and population map sources before it can be relied on as more than simply a tool for opening a conversation about coverage. The results require further interrogation and validation. For instance, the Liberian population map appears to indicate a significant population in the north west that is not coverage by a mobile signal. This is unusual as mobile operators have usually provided services to high density population areas. This could be an error in the GSM mobile coverage maps or an error in the population map. A glance at Google Maps would seem to suggest that the error is in the population map. More work is needed in order to better understand the reliability of this methodology based on current data sources.
Essential to the meaningful use of this methodology is an accurate map of existing mobile coverage. Toward that end, access to tower location data would be the next logical step in validating coverage. Knowledge of tower location along with frequencies in use, tower height, and power output would allow for the creation of a detailed coverage map using tools for calculating RF signal propagation and loss based on terrain analysis. The same tool could be applied as an alternative to the buffers calculated around the polygon centroids providing a more accurate estimate of opportunities for new coverage.
As a newcomer to GIS systems, I want to express my appreciation to a number of people who have provided me with guidance along the way. In particular, I would like to thank:
- Steve Esselaar, Principal, Research ICT Solutions
- Gilles Morain, Chief Technical Officer, Masae Analytics
- Rick Pelletier, Faculty Service Officer (Spatial information systems, GIS, remote sensing), University of Alberta
- Greg Yetman , Associate Director, Geospatial Applications Division, Center for International Earth Science Information Network (CIESIN)
Any errors in the above work are likely a failure on my part to fully appreciate the guidance I was being given. GIS StackExchange deserves a shout-out as well as an invaluable resource.