Erica's GIS Experience

Saturday, October 10, 2020

Special Topics in GIS, Module 3.1

The topic of this lab assignment was scale and spatial data aggregation. Scale affects the level of detail present in both vector and raster data. The lab demonstrated that the number and size/length of features in vector data is influenced by scale, which can in turn affect any calculations performed using a given dataset. Raster resolution also impacts analysis; the example used in the lab assignment was calculation of slope from DEMs of varying resolutions. Lower resolutions had a smoothing effect on the slope calculations, resulting in decreasing average slope values.

The second part of the lab dealt with the Modifiable Areal Unit Problem (MAUP), which is the phenomenon where different approaches to data aggregation (e.g. ZIP codes vs. census tracts) results in statistical differences, and gerrymandering, which is the practice of drawing legislative boundaries in a way that intentionally favors one group over another. There are several ways to measure gerrymandering, including by evaluating the compactness of a district, its contiguity, and/or the demographic makeup of its constituents.

The lab assignment involved calculating a compactness statistic (the Polsby-Popper score), which showed North Carolina's Congressional District 12 to the be the least compact in the contiguous US:

Wednesday, October 7, 2020

Special Topics in GIS, Module 2.2

This week's lab assignment was about interpolation. In the first portion of the lab, we worked with DEMs, and in the second portion we used several different interpolation methods to create surface rasters depicting water quality in Tampa Bay. The methods used were Thiessen, inverse distance weighting (IDW), and spline. Spline additionally has two possible techniques, regularized and tension.

Thiessen interpolation first requires Thiessen polygons, which are geometrically calculated "neighborhoods" where each polygon contains one input point and any location within the polygon is closer to that input point that to any other in the dataset. To create a raster from this, each cell is assigned the same value as the input point in its neighborhood. IDW calculates each cell value based on the values of the closest X number of input points, and gives more weight to points that are closer to the cell being calculated. In the spline methods, smooth curves are drawn to connect the input points and cell values are based on the cell's position on the curve. Tension spline is more constrained by the input values than regularized spline, though both methods preserve the input values at their individual locations.

Here is the output of the tension spline interpolation for the Tampa Bay data (with the original sample points classified similarly to the surface raster):

Thursday, September 24, 2020

Special Topics in GIS, Module 2.1

In this week's lab assignment, we explored TIN (triangulated irregular network) datasets containing terrain data. Unlike typical raster DEMs, TINs consist of a patchwork of triangles using the input sample points as vertices. Thus, they retain the exact input values at the vertices, which is not always the case with interpolated raster DEMs. That means they can easily accommodate an adaptive sampling strategy in variable terrain. Although TINs are more complex than raster DEMs, they can also be better for some applications because of that added precision. However, their angularity means they do not always capture detailed features accurately, as illustrated by the screenshots below of a TIN from Bear Lake, CA. The outline of the lake was not clear in the original TIN, and had to be incorporated by using a shapefile of the lake to create a hard edge in the TIN surface at the correct elevation.

Sunday, September 13, 2020

Special Topics in GIS, Module 1.3

In this week's lab, we compared to street network datasets to determine which was more complete (based on the total length of road segments) overall and for each grid square within the study area. One of the datasets contained TIGER road data and the other contained street centerlines maintained by the county. On the basis of overall length, the TIGER data was more complete.

To compare completeness by grid square, I first split the street data along the grid and then merged the resulting smaller feature classes back into one large feature set in order to have a single layer containing all the roads segmented by grid square. I then attached the grid information to each of the road datasets using a Spatial Join, which allowed me to calculate the sum of the road segment lengths for each grid square. That data could then be brought into Excel to calculate the difference in length between the two datasets for each grid square.

Monday, September 7, 2020

Special Topics in GIS, Module 1.2

This week's lab assignment was to assess the accuracy of two different sets of street data against reference points taken from high-resolution orthoimagery. The first step was to establish test points according to the National Standard for Spatial Data Accuracy (NSSDA): at least 20 points at well-defined locations (in this case, street intersections) with at least 20% of the points located in each quadrant of the study area and a distance between points of at least 10% the diagonal length of the study area.

Screenshot of test point distribution, with street data:

XY coordinates were obtained at each test point for each of the datasets to be assessed as well as for the actual location of the intersection based on the orthoimagery. Then the errors statistics for each point and the RMSE and NSSDA accuracy statistic were calculated for each dataset.

Result for the first dataset, from the City of Albuquerque: Tested 23.94516 feet horizontal accuracy at 95% confidence level.

Result for the second dataset, from StreetMap USA: Tested 184.40877 feet horizontal accuracy at 95% confidence level.

In other words, 95% of the data is expected to fall within 23.94516 feet or 184.40877 feet, respectively, of its true location.

Monday, August 31, 2020

Special Topics in GIS, Module 1.1

This week's lab was about accuracy and precision of data and related error metrics, using a set of GPS points as a case study. Accuracy measures how close a feature in the data is to the real-life location of that feature and is measured by finding the difference between the data being assessed and a reference dataset known to be more accurate. Precision refers to the consistency of repeated measurements and can be measured by taking their mean and comparing the individual points to that mean. The map below illustrates the GPS data from the lab along with an average of all the points and buffers encompassing 50%, 68%, and 95% of the data points relative to the mean. The distance in meters for each interval is 3.1, 4.5, and 14.8, respectively. The accuracy of the dataset was later evaluated by comparing the average point to a reference point; the difference was 3.2 meters.

Wednesday, August 8, 2018

GIS for Archaeology, Final Project

The final project assignment for GIS Applications for Archaeology dealt with cost analysis. The ultimate outcome for both parts of the project was a least cost path, which is essentially a route from point A to point B that GIS has calculated as being easier to travel than any other possible route, based on the data provided and the parameters set by the user. In this case, we used slope (i.e., it's generally easier to cross flat terrain than to climb up a steep hill) and land cover (i.e., forest vs. agricultural fields vs. the ocean) to determine how hard each pixel would be to cross in the real world--the "cost" of traveling that way. Slope and land cover data are classified according to difficulty of passage and then used to create a weighted overlay raster which then serves as the basis for the cost distance and cost path calculations, which ultimately result in ArcGIS's idea of the optimal route between two points. (For a route with multiple points, you have to run each leg as a separate analysis.) Because this is a somewhat complex, multi-step process, we used Model Builder to set up the analysis rather than running the tools one at a time.

For the first part of the project, in which we learned how to perform the analysis, we examined routes between three prehistoric archaeological sites in Panama:

For the second part, we had a choice of subjects but were on our own to acquire the necessary data and set up the model. I chose to try to reconstruct the route of the Camino de Mulas, a historic mule trail in Costa Rice and Panama. I had to run the model one or two steps at a time rather than all at once because the processing times were extremely long, probably because of the size of the raster files I needed to cover the whole study area. In the end, it looks like some of the trail may have been where the Pan-American Highway is now, but other sections of it may have escaped destruction. That's making a number of assumptions about how "right" the least cost path is, though. It would take a lot more research to figure out if this route actually makes sense (especially since the analysis was performed with modern land cover data that ignored rivers), let alone to determine whether it's the right one. Still, it's a starting point and an interesting way of looking at history.