Erik Westra
Forum Replies Created
-
AuthorPosts
-
Erik WestraParticipantHi Nathaniel,
_
Thanks for the pointers — this was all most helpful, and I’ve learned a lot (probably more than I ever wanted to know) about Thiessen polygons.
_
I abandoned the bounding box approach, as the values seemed to be rather strange (a number of populated places have minimum and maximum bounding boxes that don’t overlap at all). Instead, I went for a simpler approach by just looking at the place’s lat/long and seeing which urban area it intersected with (if any).
_
I’ve been working over the past several days on writing a Thiessen polygon implementation which could be used to split these urban areas according to the populated places within that area. This took some serious work, mainly thanks to the complexity of the Thiessen algorithm and the quirks of the data I’m working with, but the results are finally working. This means I now have a Shapefile constructed out of the urban areas in the 10m-urban-areas shapefile and the populated places in the 10m-populated-places shapefile. My “urban-places” shapfile includes:
_
1. All the populated places which lie in exactly one urban area. In this case, the entire urban area is marked as belonging to that place.
_
2. All the urban areas which contain more than one populated place (as identified by the place’s lat/long coordinate), split using Thiessen polygons. It took me several days to implement this, but its finally working. By the way, I did some further processing on the resulting split polygons, re-assigned any “dangling” polygons back to the main polygon they touched — without this, the Thiessen split would break urban areas up where they are obviously suppossed to be contiguous.
_
3. I also included all the “orphan” urban areas which don’t map to any populated place at all.
_
For #1 and #2, I’ve included the place name, the lat/long, the geonames ID, the original feature ID from the 10m-urban-area shapefile, and the feature ID of the associated place in the 10m-populated-places shapefile. For the orphan areas, only the urban area feature ID is included.
_
This approached seems to work really well — I’m impressed at how well the Thiessen splitting logic works at sub-dividing the urban areas. Of course, it’s just an estimate of the actual legal boundary between the populated places, but it looks to be a very good estimate indeed.
_
I wish I could attach images to postings on these forums — I’ve got a map here showing the San Francisco Bay Area, cleanly breaking the larger urban area (feature ID 1145 in the 10m_urban_area shapefile, in case you’re interested) into San Francisco, San Mateo, San Jose, Oakland and Berkely. The results certainly look correct, and I think this split version of the urban areas could be quite a useful resource.
_
The one thing I feel could be improved is that there are a lot of urban areas which don’t appear in the 10m-populated-places shapefile. These appear as “orphan” urban areas in my generated shapefile. Unfortunately, it looks as though a lot of these ophan areas are quite significant urban areas, and they currently lack a name. I’m tempted to see if I can’t find an alternative source of placenames and associated lat/long values (maybe geonames.org’s “allcountries” file?) and then use this instead of the 10m-populated-places shapefile as the source for place names and lat/long values. That should help to reduce the number of orphaned urban areas immensely — though I’m not sure how accurate the results would be, and I don’t want to use a source which might cause the resulting urban areas to be inaccurately identified.
_
Anyway, I don’t know if you’d be interested in my urban-places shapefile, but of course you’re more than welcome to have a copy if you want to add it to your site. Any further tips or suggestions would also be most welcome.
_
Thanks,
_
– Erik.
-
AuthorPosts