Air transportation network datasets, with city identifiers and coordinates

Many measures, including total number of passengers, number of flights, or amount of cargo, quantifying the importance of the world airports were compiled and publicized (20). We study here the OAG MAX database (, which comprises flight schedule data of >800 of the world's airlines for the period November 1, 2000, to October 31, 2001. This database is compiled by OAG Worldwide (Downers Grove, IL), and includes all scheduled flights and scheduled charter flights, both for big aircrafts (air carriers) and small aircrafts (air taxis).

We focused our analysis on a network of cities, not of airports; for example, Newark Liberty International Airport, John F. Kennedy International Airport, and LaGuardia Airport are all assigned to New York City. We further restricted our analysis to passenger flights operating in the time period November 1, 2000, to November 7, 2000. Even though these data are >4 years old, the resulting worldwide airport network is virtually indistinguishable from the network one would obtain if using data collected today. The reason is that air traffic patterns are strongly correlated with (i) socioeconomic factors, such as population density and economic development; and (ii) geopolitical factors, such as the distribution of the continents over the surface of the Earth and the locations of borders between states (21). Clearly, the time scales associated to changes in these factors are much longer than the lag in the data we analyzed here.

During the period considered, there were 531,574 unique nonstop passenger flights, or flight segments, operating between 3,883 distinct cities. We identified 27,051 distinct city pairs having nonstop connections. The fact that the database is highly redundant, that is, that most connections between pairs of cities are represented by more than one flight, adds reliability to our analysis. Specifically, the fact that unscheduled flights are not considered does not mean, in general, that the corresponding link between a certain pair of cities is missing in the network, because analogous scheduled flights may still operate between them. Similarly, even if some airlines have canceled their flights between a pair of cities since November 2000, it is highly unlikely that all of them have.

We created the corresponding adjacency matrix for this network, which turns out to be almost symmetrical. The very minor asymmetry stems from the fact that a small number of flights follow a “circular” pattern, i.e., a flight might go from A to B to C and then back to A. To simplify the analysis, we symmetrized the adjacency matrix.

Further, we built regional networks for different geographic regions. Specifically, we generated six networks; one each for Africa, Asia and Middle East, Europe, Latin America, North America, and Oceania.

If you use this dataset in a publication, please cite the following articles: You may consider reading and citing the following articles:
  • Air transportation networks: Air transportation networks with city information and coordinates | Download