Are the attributes documented?
Home › Forums › Natural Earth Map Data › Cultural Vectors › Populated Places › Are the attributes documented?
- This topic has 8 replies, 1 voice, and was last updated 12 years, 4 months ago by Nathaniel.
-
AuthorPosts
-
May 26, 2010 at 3:49 am #3388
Erik WestraParticipantHello,
First off, I’d like to thank the developers for this fantastic resource — I can definitely see myself making use of your geospatial data in the near future.
I see that the 10m-populated-places shapefile includes a large number of attributes (93 in total!). While many of these are obvious, some are rather confusing — and I can’t find any documentation on what these various attributes mean.
In particular, I was hoping that the fields labelled:
MAX_BBXMAX
MAX_BBXMIN
MAX_BBYMAX
MAX_BBYMIN
and the fields labelled:
MIN_BBXMAX
MIN_BBXMIN
MIN_BBYMAX
MIN_BBYMIN
might in some way represent a bounding box around the populated place. They definitely seem to contain lat/long values that correspond in some way to the geography of the populated place they represent — but I can’t for the life of me figure out (a) why there are two separate bounding boxes, and (b) how the two boxes are related. Sometimes the boxes intersect, while at other times they don’t intersect at all.
I was thinking that if I can get the bounding box for each populated place, I could then do a check to see which polygons in the 10m-urban-area were inside each bounding box, and thus match up these (unnamed) polygons with named populated places. If I could assign a city name to each populated place, I could then use it in my project — but alas at present there doesn’t seem to be any way of doing this.
Does anyone know what these various attributes mean — or alternatively if there is some other way I could assign a place name to the urban area polygons?
Thanks in advance,
– Erik.
May 26, 2010 at 3:09 pm #4028
NathanielKeymaster@Erik: Glad you like the project. Your guess is correct about the BB being bounding box. These are derived from the Landscan polygons that approximate urban areas, not incorporated boundaries. This distinction is important as many large center cities have a large ring of 2ndary towns and suburbs around them that aren’t technically part of the core’s jurisdiction.
The Natural Earth bounding boxes incorporates those areas, as do the population numbers. The MAX version shows the largest extent of that city in the dataset. This is the first scale that the town appears at (starting back from 300m then several steps to 10m). As the fidelity narrows down by adding in more regional centers, even in metro areas that already have a primary central city, the bounding box narrows down as well. The MIN version is the 10m scale with all the Natural Earth populated places counted. Think of these as the intersection of the landscan population urban area (separate download on the 10m populated places page) and cut by Thiessen polgyons between the points.
Matching up the urban areas to populated places would be awesome!
An approach using Thiessen polgyons against the urban areas and then doing a spatial join between the cut urban areas and the populated places could be a good approach. Not all urban areas will be named in the 10m coverage as there are many more urban areas than towns in the 10m coverage. (The forthcoming populated places 1.2 version will address this to a point.) You might try clipping the Thiessen results by 50 miles for 110m first-appearance cities down to 5 miles for 10m first-appearance cities.
June 2, 2010 at 11:22 pm #4029
Erik WestraParticipantHi Nathaniel,
_
Thanks for the pointers — this was all most helpful, and I’ve learned a lot (probably more than I ever wanted to know) about Thiessen polygons.
_
I abandoned the bounding box approach, as the values seemed to be rather strange (a number of populated places have minimum and maximum bounding boxes that don’t overlap at all). Instead, I went for a simpler approach by just looking at the place’s lat/long and seeing which urban area it intersected with (if any).
_
I’ve been working over the past several days on writing a Thiessen polygon implementation which could be used to split these urban areas according to the populated places within that area. This took some serious work, mainly thanks to the complexity of the Thiessen algorithm and the quirks of the data I’m working with, but the results are finally working. This means I now have a Shapefile constructed out of the urban areas in the 10m-urban-areas shapefile and the populated places in the 10m-populated-places shapefile. My “urban-places” shapfile includes:
_
1. All the populated places which lie in exactly one urban area. In this case, the entire urban area is marked as belonging to that place.
_
2. All the urban areas which contain more than one populated place (as identified by the place’s lat/long coordinate), split using Thiessen polygons. It took me several days to implement this, but its finally working. By the way, I did some further processing on the resulting split polygons, re-assigned any “dangling” polygons back to the main polygon they touched — without this, the Thiessen split would break urban areas up where they are obviously suppossed to be contiguous.
_
3. I also included all the “orphan” urban areas which don’t map to any populated place at all.
_
For #1 and #2, I’ve included the place name, the lat/long, the geonames ID, the original feature ID from the 10m-urban-area shapefile, and the feature ID of the associated place in the 10m-populated-places shapefile. For the orphan areas, only the urban area feature ID is included.
_
This approached seems to work really well — I’m impressed at how well the Thiessen splitting logic works at sub-dividing the urban areas. Of course, it’s just an estimate of the actual legal boundary between the populated places, but it looks to be a very good estimate indeed.
_
I wish I could attach images to postings on these forums — I’ve got a map here showing the San Francisco Bay Area, cleanly breaking the larger urban area (feature ID 1145 in the 10m_urban_area shapefile, in case you’re interested) into San Francisco, San Mateo, San Jose, Oakland and Berkely. The results certainly look correct, and I think this split version of the urban areas could be quite a useful resource.
_
The one thing I feel could be improved is that there are a lot of urban areas which don’t appear in the 10m-populated-places shapefile. These appear as “orphan” urban areas in my generated shapefile. Unfortunately, it looks as though a lot of these ophan areas are quite significant urban areas, and they currently lack a name. I’m tempted to see if I can’t find an alternative source of placenames and associated lat/long values (maybe geonames.org’s “allcountries” file?) and then use this instead of the 10m-populated-places shapefile as the source for place names and lat/long values. That should help to reduce the number of orphaned urban areas immensely — though I’m not sure how accurate the results would be, and I don’t want to use a source which might cause the resulting urban areas to be inaccurately identified.
_
Anyway, I don’t know if you’d be interested in my urban-places shapefile, but of course you’re more than welcome to have a copy if you want to add it to your site. Any further tips or suggestions would also be most welcome.
_
Thanks,
_
– Erik.
June 3, 2010 at 12:31 am #4030
NathanielKeymaster@Erik: Thanks for the fascinating update! Please email me the image previews at nathaniel@naturalearthdata.com and I’ll post them here. Yes, we can host your modified, attributed 10m-urban-areas as part of Natural Earth. Email me to figure out delivery mechanism. For the “missing” populated places, many of those will come with the 1.3 update of the populated places file I’m working on now. For that, the larger unassociated polygons will gain points, but some of the smaller ones will remain unnamed as they are not appropriate to show on a 10m scale map. Still working out that threshold. I’ve gone thru the GeoNames.org all-countries 2.3m feature count file and done much of the work to match the towns with their urban areas. Curious on the bad bounding box overlap, I’ll look into that.
December 21, 2011 at 2:08 am #4031
JustinTBrownParticipantBig thanks for this resource! I’m still wondering if there is documentation for the various attributes? I happen to also be exploring the 10m-populated-places shapefile also.
Best,
Justin
EDIT: Apologies, I could have searched the forums better – https://www.naturalearthdata.com/forums/topic.php?id=144
December 21, 2011 at 3:46 am #4032
JustinTBrownParticipantSo sorry. After reading through the links I actually still wasn’t able to find anything that explains the various attributes/abbreviations. For example there a number of columns for population data but I’m unable to tell exactly what they’re for.
Thanks again!
Justin
December 21, 2011 at 4:00 am #4033
NathanielKeymasterthe pop_min and pop_max columns are most useful. the others depend on your use. please describe your case more fully.
July 8, 2012 at 3:04 pm #4034
annebParticipantpopulated places 110m, 50m, and 10m contain attributes ‘ls_name’, ‘name_ascii’ and ‘gn_ascii’
for instance:
name: Düsseldorf
name_ascii: Dusseldorf (shouldn’t this be ‘Duesseldorf’?)
ls_name: DÌùsseldorf
gn_ascii: Dusseldorf (shouldn’t this be ‘Duesseldorf’?)
Is there a description for these various attribute versions?
July 8, 2012 at 7:18 pm #4035
NathanielKeymastername: according to Natural Earth.
name_ascii: literally without accent marks. In the case of Duesseldorf, it should be spelled different, too. But except for some in Scandinavia that’s not done yet. See below.
ls_name: this is for matching up to the LandScan population estimate contours found elsewhere on the site. It’s a linkage ID, not anything to label with.
gn_ascii: straight from GeoNames.org. If it’s been fixed since the merge a ~20 months ago, then it’s not reflected here yet.
If there is a mistake on any of these, please file a Correction Request: https://www.naturalearthdata.com/corrections/index.php?a=add
-
AuthorPosts
You must be logged in to reply to this topic.