search on training.esri.com for the Esri certification
Knowledge of spatial relationships such as distance (e.g., horizontal and vertical), direction, and topology (e.g., adjacency, connectivity, and overlap) that are particularly relevant to geospatial data analysis
this just has everything in great detail: https://www.cise.ufl.edu/~mschneid/Service/Tutorials/TutorialSDT.pdf
Directional relations can again be differentiated into external directional relations and internal directional relations. An internal directional relation specifies where an object is located inside the reference object while an external relations specifies where the object is located outside of the reference objects.
Examples for internal directional relations: left; on the back; athwart, abaft
Examples for external directional relations: on the right of; behind; in front of, abeam, astern
Distance relations specify how far is the object away from the reference object.
Examples are: at; nearby; in the vicinity; far away
Euclidean distance is calculated as:
D = sq root [(x1–x2)**2.0 + (y1–y2)**2.0]
Where (x1,y1) is the coordinate for point A, (x2,y2) is the coordinate for point B, and D is the straight-line distance between points A and B.
Manhattan distance is calculated as:
D = abs(x1–x2) + abs(y1–y2)
Where (x1, y1) is the coordinate for point A, (x2, y2) is the coordinate for point B, and D is the vertical plus horizontal difference between points A and B. It is the distance you must travel if you are restricted to north/south and east/west travel only. This method is generally more appropriate than Euclidean distance when travel is restricted to a street network in cases where actual street network travel costs are not available.
Knowledge of standard spatial data models, including the nature of vector, raster, and object-oriented models, in the context of spatial data used in the workplace
Spatial models (at some places GIS models) might describe basic properties and processes for a set of spatial features (Bolstad 13 – you have heard about this )
The aim is to study spatial objects or phenomena in the real world
According to Bolstad
Cartographic models: temporally static, combined spatial datasets, operations and functions for problem-solving
Spatio-temporal models dynamics in space and time, time-driven processes
Network models: modeling of resources (flow, accumulation) as limited to networks
Data models: Entities and fields as conceptual models
Static modeling: taking inputs to transform them into outputs using sets of tools and functions
Dynamic modeling: iterative, sets of initial conditions, apply transformations to obtain a series of predictions at time intervals
Based on purpose descriptive – passive, description of the study area prescriptive – active, imposing best solution
Based on methodology stochastic – based on statistical probabilities deterministic – based on known functional linkages and interactions
Based on logic inductive – general models based on ind. data deductive – from general to specific using known factors and relationships
The aim of spatial modeling is to derive a meaningful representation of events, occurrences or processes by making use of the power of spatial analysis
Vector data are composed of points, lines, polygons
Points represent discrete locations on the ground
Lines represent linear features, such as rivers, roads and transmission cables
Arcs are composed of nodes and vertices. Arcs begin and end at nodes, and may have 0 or more vertices between the nodes. The vertices define the shape of the arc along its length. Arcs which connect to each other will share a common node.
Polygons form bounded areas. In the point and line datasets shown above, the land masses, islands, and water features are represented as polygons. Polygons are formed by bounding arcs, which keep track of the location of each polygon
ASCII Coordinate Data files may also be used in ArcGIS. Point layers can be created from files containing single records for individual points.
Raster datasets are composed of rectangular arrays of regularly spaced square grid cells. Each cell has a value, representing a property or attribute of interest. While any type of geographic data can be stored in raster format, raster datasets are especially suited to the representation of continuous, rather than discrete, data. Some examples of continuous data are:
oil depth across an open-water oil spill
reflectance in a certain band in the electromagnetic spectrum
landform aspect (compass bearing of steepest downward descent)
salinity of a water bod
Pixel or cell? All raster datasets are stored in similar formats. You will want to know the difference between a pixel and a cell, even though they are functionally equivalent. A pixel (short for PICture ELement) represents the smallest resolvable "piece" of a scanned image, whereas a cell represents a user-defined area representing a phenomenon. A pixel is always a cell, but a cell is not always a pixel.
There are many types of raster data you may be familiar with:
grids (ArcGIS & ArcInfo specific)
graphical images (TIFF, JPEG, BMP, GIF, etc.)
USGS DEM (Digital Elevation Model)
remotely-sensed images (Landsat, SPOT, AVIRIS, AVHRR, Imagine IMG, digital orthophotos)
geodatabase is an object oriented spatial model
Geodatabase data model
– Use a relational database that stores geographic data. A type of database in which the data is organized across several tables. Tables are associated with each other through common fields. Data items can be recombined from different files
-A container for storing spatial and attribute data and the relationships that exist among them
-And their associated attributes can be structured to work together as an integrated system using rules, relationships, and topological associations
Primary (basic) components
– feature classes,
– feature datasets,
– nonspatial tables.
complex components building on the basic components:
– relationship classes,
– geometric networks
A feature class is a geographic feature include points, lines, polygons, and annotation feature class.
Feature classes may exist independently in a geodatabase as stand-alone feature classes or you can group them into feature datasets
A feature dataset is composed of feature classes that have been grouped together so they can participate in topological relationships with each other. All the feature classes in a feature dataset must share the same spatial reference (or coordinate system)
Edits you make to one feature class may result in edits being made automatically to some or all of the other feature classes in the feature dataset
Feature class tables and nonspatial attribute tables.
Both types of tables are created and managed in ArcCatalog and edited in ArcMap. Both display in the traditional row-and-column format. The difference is that feature class tables have one or more columns that store feature geometry.
Nonspatial tables contain only attribute data (no feature geometry) and display in ArcCatalog with the table icon . They can exist in a geodatabase as stand-alone tables, or they can be related to other tables or feature classes
In a geodatabase, you can model each of these real-world networks with a geometric network. Starting with simple point and line feature classes, you use ArcCatalog to create a geometric network that will enable you to answer questions such as: Which streams will be affected by a proposed dam? Which areas will be affected by a water main repair? What is the quickest route between two points in the network?
Relationship Classes – In a geodatabase, relationship classes provide a way to model real-world relationships that exist between objects such as parcels and buildings or streams and water sample data. By using relationship classes, you can make your GIS database more accurately reflect the real world and facilitate data maintenance.
1-1 relationship – each object of the origin table/feature class can be related to zero or one object of the destination table/feature class
1-Many relationship – each object in the origin table/feature class can be related to the multiple objects in the destination table/feature class
Many-Many relationship – multiple objects of the origin table/feature class can be related to multiple objects of the destination table/feature class
Understanding of the conceptual foundations on which geographic information systems (GIS) are based, including the problem of representing change over time and the imprecision and uncertainty that characterizes all geographic information
Like, what? of course it’s difficult to spatially represent change of time. and yes, the world is constantly changing and the world is in a globe and GIS is 2D – skipped!
Knowledge of earth geometry and its approximations, including geoids, ellipsoids, and spheres
geoid is the shape that the surface of the oceans would take under the influence of Earth's gravitation and rotation alone, in the absence of other influences such as winds and tides
In geodesy, a reference ellipsoid is a mathematically defined surface that approximates the geoid, the truer figure of the Earth, or other planetary body. Because of their relative simplicity, reference ellipsoids are used as a preferred surface on which geodetic network computations are performed and point coordinates such as latitude, longitude, and elevation are defined.
In geometric geodesy, two standard problems exist:
First (direct) geodetic problem
Second (inverse) geodetic problem
Given two points, determine the azimuth and length of the line (straight line, arc or geodesic) that connects them.
In the case of plane geometry (valid for small areas on the Earth's surface) the solutions to both problems reduce to simple trigonometry. On the sphere, the solution is significantly more complex, e.g., in the inverse problem the azimuths will differ between the two end points of the connecting great circle, arc, i.e. the geodesic.
On the ellipsoid of revolution, geodesics may be written in terms of elliptic integrals, which are usually evaluated in terms of a series expansion; for example, see Vincenty's formulae.
The geoid surface is irregular, unlike the reference ellipsoid which is a mathematical idealized representation of the physical Earth, but considerably smoother than Earth's physical surface
two main reference surfaces have been established to approximate the shape of the Earth. One reference surface is called the Geoid, the other reference surface is the ellipsoid.
The deviation between the Geoid and an ellipsoid is called the geoid separation (N) or geoid undulation
The Geoid is used to describe heights. In order to establish the Geoid as reference for heights, the ocean’s water level is registered at coastal places over several years using tide gauges (mareographs). Averaging the registrations largely eliminates variations of the sea level with time. The resulting water level represents an approximation to the Geoid and is called the mean sea level
The height determined with respect to a tide-gauge station is known as the orthometric height
Obviously, there are several realizations of local mean sea levels (also called local vertical datums) in the world
The most convenient geometric reference is the oblate ellipsoid (figure below). It provides a relatively simple figure which fits the Geoid to a first order approximation, though for small scale mapping purposes a sphere may be used. An ellipsoid is formed when an ellipse is rotated about its minor axis. This ellipse which defines an ellipsoid or spheroid is called a meridian ellipse (notice that ellipsoid and spheroid are used here as equivalent and interchangeable words
The Sphere – As can be seen from the dimensions of the Earth ellipsoid, the semi-major axis a and the semi-minor axis b differ only by a bit more than 21 kilometres (figure below). A better impression on the Earth's dimensions may be achieved if we refer to a more "human scale". Considering a sphere of approximately 6 metre in diameter then the ellipsoid is derived by compressing the sphere at each pole by 1 cm only. This compression is rather small compared to the dimension of the semi-major axis a.
The most important global (or geocentric) spatial reference system for the GIS community is the International Terrestrial Reference System (ITRS). It is a three-dimensional coordinate system with a well-defined origin (the centre of mass of the Earth) and three orthogonal coordinate axes (X,Y,Z). The Z-axis points towards a mean Earth north pole. The X-axis is oriented towards a mean Greenwich meridian and is orthogonal to the Z-axis. The Y-axis completes the righthanded reference coordinate system (figure (a) below).
Global horizontal datums, such as the ITRF2000 or WGS84, are also called geocentric datums because they are geocentrically positioned with respect to the centre of mass of the Earth
Knowledge of georeferencing systems, including coordinate systems, spatial projections, and horizontal and vertical datums
‘To georeference’ the act of assigning locations to atoms of information
Is essential in GIS, since all information must be linked to the Earth’s surface
The method of georeferencing must be:
Unique, linking information to exactly one location
Shared, so different users understand the meaning of a georeference
Persistent through time, so today’s georeferences are still meaningful tomorrow
georeferences are metric (define location using measures of distance from fixed places), based on ordering (street addresses in most parts of the world order houses along streets), nominal (place names do not involved ordering or measuring)
A spatial reference system (SRS) or coordinate reference system (CRS) is a coordinate-based local, regional or global system used to locate geographical entities. A spatial reference system defines a specific map projection, as well as transformations between different spatial reference systems. Spatial reference systems are defined by the OGC's Simple feature access using well-known text, and support has been implemented by several standards-based geographic information systems. Spatial reference systems can be referred to using a SRID integer, including EPSG codes defined by the International Association of Oil and Gas Producers.
What kinds of things can be distorted with different map projections?
Distance ? Direction ? Shape ? Area
Mercator Projection – Developed by Dutch cartographer Gerardus Mercator in 1569 Preserves shape & direction Used widely for navigation charts because direction is preserved.
Three main types of map projections
Cylindrical, conic, azimuthal (planar)
A DATUM is a model of the Earth as a spheroid
2. A curved surface (e.g., portions of the earth) gets distorted when represented on a flat map ? Map projections – transforming coordinates from a curved Earth to a flat map
A geodetic datum is a set of control points whose geometric relationships are known, either through measurement or calculation
Datums have two components: ? The reference ellipsoid ? A set of survey points Both the shape of the spheroid and its position relative to the earth are important
Cylindrical equal-area projections – straight meridians and parallels – meridians are equally spaced and the parallels are unequally spaced – area is true, shape and scale get distorted near the upper and lower regions of the map
Transverse mercator projections – projecting the sphere onto a cylinder tangent to a meridian (line of longitude)
UTM – Universal Transverse Mercator – a global coordinate system – UTM zones are 6 degrees so many studies will fit into this zone – broad study areas
Mercator – shapes are true, but area gets distorted (conformal)
Azimuthal Equidistant – planar (tangent) – used for air route distances – distances measured from the center are true – distortion of other properties increases away from the center point
Conic projections – generated by projecting a spherical surface onto a cone – distorts scale and distance except along standard parallels – areas are proportional and directions are true in limited areas – used in countries with a larger east-west than north-south extent
-area and shape are distorted away from standard parallels – directions are true in limited areas
The State Plane Coordinate System (SPCS) uses a unique set of projection parameters for each of the 50 states
Uses either a Transverse Mercator or Lambert’s conformal conic projection
Suggestions from Theobald (1999: 42) on Selecting a Projection
? If you are making a fairly detailed map, for example a city, or requirements for accuracy is minimal, then you may not have to worry so much about which projection to use.
? If you are making a map of a regional to continental to global scale OR are interested in precise shape, area or distance measurements then you should choose carefully the projection.
? For many study areas there is already standard projects, such as State Plane for county or city governments or UTM for state governments.
? Three factors to consider related to accuracy: Latitude of area, extent and theme
? Low-latitude areas (near equator) use a conical projection
? Polar regions use a azimuthal planar projection
? Broad in East-West (e.g., the US) use a conical projection
? Broad in North-South (e.g., Africa) use a transverse-case cylindrical projection
? If you are doing an analysis that compares different values in different locations, typically an equal-area projection will be used
The horizontal datum is the model used to measure positions on the Earth. A specific point on the Earth can have substantially different coordinates, depending on the datum used to make the measurement. The WGS 84 datum, which is almost identical to the NAD83 datum used in North America and the ETRS89 datum used in Europe, is a common standard datum.
ED50, the European Datum
A vertical datum is used as a reference point for elevations of surfaces and features on the Earth including terrain, bathymetry, water levels, and man made structures.
Vertical datums are either: tidal, based on sea levels; gravimetric, based on a geoid; or geodetic, based on the same ellipsoid models of the Earth used for computing horizontal datums. – could be sea level – so mean sea level
An example of a gravity-based geodetic datum is NAVD88, used in North America, which is referenced to a point in Quebec, Canada. Ellipsoid-based datums such as WGS84, GRS80 or NAD83 use a theoretical surface that may differ significantly from the geoid.
Cartography and Visualization
Knowledge of contour mapping
A contour line (also isoline, isopleth, or isarithm) of a function of two variables is acurve along which the function has a constant value. It is a cross-section of the three-dimensional graph of the function f(x, y) parallel to the x, y plane. In cartography, a contour line (often just called a "contour") joins points of equal elevation (height) above a given level, such as mean sea level. A contour map is a map illustrated with contour lines, for example a topographic map, which thus shows valleys and hills, and the steepness of slopes. The contour interval of a contour map is the difference in elevation between successive contour lines.
iso – equal – equal distances between lines
isoline and isarithm – covers all types of contour lines
isogon – contour line for a variable which measures direction
isocline – a line joining points with equal slope
Equidistants – isodistances – equal distance from a given point, line, polyline
isopleths – contour lines that depict a variable which cannot be measured at a point, but which instead must be calculated from data collected over an area (population density) – can be done using interpolation
isobar – line of equal or constant pressure
isallobars – lines joining points of equal pressure change during a specific time interval
isopycnal – constant density
isotherm – line that connects points on a map that have the same temperature
isogeotherm – line of equal mean annual temperature
isocheim – line of equal mean winter temperature
isothere – line of equal mean summer temperature
isohel – line of equal or constant solar radiation
isohyet – line joining points of equal precipitation
isohume – line of constant relative humidity
isodrosotherm – line of equal or constant dew point
Knowledge of basic physical geography (e.g., types of boundaries, continents, landforms, and topography)
Physical geography (also known as geosystems or physiography) is one of the two major sub-fields of geography. Physical geography is that branch of natural science which deals with the study of processes and patterns in the natural environment like the atmosphere, hydrosphere,biosphere, and geosphere, as opposed to the cultural or built environment, the domain of human geography.
from intro to geography text book:
Basic Physical Geography
two types of forces produce variations on the surface of the earth called landforms
forces that push, move and raise the earth's surface
forces that scour, wash and wear down the surface
tectonic – generated from within the earth
tectonic forces – 2 types
diastrophic – great pressure acting on the plates that deforms them by folding, twisting, warping, breaking or compressing rock
volcanism – force that transports heated material to or toward the surface of the earth
diastrophism – geologists can trace the history of the development of a region
broad warping – changing weight of a large region, movement of continents may bow an entire continent
warping or bending effect and a ridge or series of parallel folds may develop
faulting – fault is a break or fracture in rock along which movement has taken place
escarpment – steep slope
rift valley – separation away from fault causes sinking of land
seismic waves – vibrations which cause earth movement
earthquake, volcanic eruption or underwater landslide occurs below an ocean, jolts water above, causes a tsunami
three kinds of gradation processes
breakdown and decomposition of rocks and minerals at or near the earth's surface from water, air and temperature called weathering – both mechanical and chemical processes
mechanical weathering – physical disintegration of earth materials at or near the surface – large rocks broken into smaller pieces
three important types of weathering – frost action, development of salt crystals, root action
chemical weathering – rocks decompose rather than to disintegrate
oxidation, hydrolysis, and carbonation – depends on availability of water, less chemical weathering in dry places
downslope movement of material due to gravity is – avalanches and landslides
talus – accumulation of rock particles at the base of hills and mountains
erosional agents and deposition
wind, water, glaciers – carve existing landforms into new shapes
landform regions – large section of the earth's surface where a great deal of homogeneity occurs among the types of landforms that characterize it
Types of Boundaries
lithosphere broken into 12 large and many small, rigid plates
theory of plate tectonics – plates slide or drift very slowly over the heavy semimolten asthenosphere
divergent plate boundaries – boundaries where plate move away from each other
transform boundaries – one plate slides horizontally past another plate
convergent boundaries – two plates move toward each other
Continents are understood to be large, continuous, discrete masses of land, ideally separated by expanses of water.
From the perspective of geology or physical geography, continent may be extended beyond the confines of continuous dry land to include the shallow, submerged adjacent area (the continental shelf) and the islands on the shelf (continental islands), as they are structurally part of the continent.
A landform is a natural feature of the Earth's surface. Landforms together make up a given terrain, and their arrangement on the landscape or the study of same is known as topography. Typical landforms include hills, mountains, plateaus, canyons, valleys, as well as shoreline features such as bays, peninsulas, and seas, including submerged features such as mid-ocean ridges, volcanoes, and the great ocean basins.
Topography is a field of geoscience and planetary science comprising the study ofsurface shape and features of the Earth and other observable astronomical objectsincluding planets, moons, and asteroids. It is also the description of such surface shapes and features (especially their depiction in maps). The topography of an area could also mean the surface shape and features themselves.
Techniques of topography
Remote sensing is a general term for geodata collection at a distance from the subject area.
Aerial and satellite imagery
Radar and sonar
Forms of topographic data
Raw survey data
Topographic survey information is historically based upon the notes of surveyors.
Remote sensing data
Digital elevation modeling
Understanding of how data collection methods influence map design and representation
What is the distinction between primary and secondary data sources?
Primary Data – One way to characterize data in geography concerns whether they were collected specifically for the purpose of a researcher’s particular study
An example would be a geographer who interviews people about their attitudes toward bioengineered agriculture
Secondary Data – If, instead, the data have been collected for another purpose, usually by someone other than the researcher
An example of that would be a geographer who uses Landsat imagery to study landslides on the California coast. The imagery was not collected by that researcher, and it was not collected primarily so he or she could study landslides
What are the five major types of data collection in geography?
consist of data collected by recording physical properties of the earth or its inhabitants. Physical properties include size and number, temperature, chemical makeup, moisture content, texture and hardness, the reflectance and transmissivity of electromagnetic energy (including optical light), air speed and pressure, and more
use of aerial and satellite remote sensing as ways to efficiently record large amounts of physical measurement data.
Observation of behavior (Chapter 5)
is the overt and potentially observable actions or activities of individuals or groups of people
It is not their thoughts, feelings, or motivations, although very often behavioral observations provide the data that allow geographers to study thoughts, feelings, and motivations scientifically
Archives (Chapter 5)
A third type of data collection practiced by geographers is the use of existing records that others have collected primarily for non-research purposes, at least not the geographer’s research
Explicit reports (Chapter 6)
beliefs people express about things—about themselves or other people, about places or events, about activities or objects
Actually, explicit reports are also observations of behavior; answering a question on a survey is behaving, for instance. But we distinguish reports as distinct types of data collection because they always involve explicit recognition by people that researchers are studying them, and because research participants’ explicit beliefs and choices determine the data collected with explicit reports
Computational modeling (Chapter 7)
we defined models as simplified representations of portions of reality
We noted that models can be realized in conceptual, physical, graphical, or computational form
What are some of the ways geographers and others have made a distinction between quantitative and qualitative methods, and how do they relate to scientific and humanistic approaches in geography?
Quantitative data consist of numerical values, measured on at least an ordinal level but more likely a metric level.
quantitative methods are those that impose a relatively great amount of prior structure on collected data. That is, such methods involve a prior choice of constructs to study, a prior choice of variables with which to measure those constructs, and prior numerical categories with which to express the measured values of those variables
Qualitative data are nonnumerical, or, as in nominal data, numerical values that have no quantitative meaning
They consist of words (in natural language), drawings, photographs, and so on
Qualitative methods, in contrast, involve less prior structure on data collection. Data collection that is very clearly qualitative might start with little more than a topic area or a broad research question. The constructs, variables, and especially the measurement values for the variables are determined as observations are made or even afterward
Influence Map Design and Representation
all maps are abstractions of reality, maps can subtly or blatantly manipulate the message they impart or contain intentionally false information
ignorance like in the middle ages with mythical beasts in unknown areas
propaganda – nazi germany
maps in soviet russia were distorted for military protection
Knowledge of graphic representation techniques, including thematic mapping, multivariate displays, and web mapping
A thematic map is a type of map especially designed to show a particular theme connected with a specific geographic area. These maps "can portray physical, social, political, cultural, economic, sociological, agricultural, or any other aspects of a city, state, region, nation, or continent"
Isarithmic or Isopleth
-A dasymetric map is an alternative to a choropleth map. As with a choropleth map, data are collected by enumeration units. But instead of mapping the data so that the region appears uniform, ancillary information is used to model internal distribution of the phenomenon. For example, population density will be much lower in forested area than urbanized area, so in a common operation, land cover data (forest, water, grassland, urbanization) may be used to model the distribution of population reported by census enumeration unit such as a tract or county
Choropleth maps – These are maps, where areas are shaded according to a prearranged key, each shading or colour type representing a range of values
Disadvantages of Choropleth Maps
Although choropleths give a good visual impression of change over space there are certain disadvantages to using them:
They give a false impression of abrupt change at the boundaries of shaded units.
Choropleths are often not suitable for showing total values. Proportional symbols overlays (included on the choropleth map above) are one solution to this problem.
It can be difficult to distinguish between different shades.
Variations within map units are hidden, and for this reason smaller units are better than large ones.
Isopleth maps differ from choropleth maps in that the data is not grouped to a pre-defined unit like a city district. These maps can take two forms:
Lines of equal value are drawn such that all values on one side are higher than the "isoline" value and all values on the other side are lower, or
Ranges of similar value are filled with similar colours or patterns.
This type of map is ideal for showing gradual change over space and avoids the abrupt changes which boundary lines produce on choropleth maps. Temperature, for example, is a phenomenon that should be mapped using isoplething, since temperature exists at every point (is continuous), yet does not change abruptly at any point (like population density may do as you cross into another census zone). Relief maps should always be in isopleth form for this reason.
Proportional Symbol Maps
As the name implies, symbols (most often circles) are drawn proportional in size to the size of the variable (e.g. employment change) being represented. Proportional symbol maps are not dependent on the size of the area associated with the variable. In other words, on a proportional symbol map of Europe, tiny Liechtenstein would have the same visual importance as Spain if their unemployment values were the same. This would not be the case with a choropleth map.
An example of proportional circles is shown on the Czech Republic Voting Register map (above).
Scaling proportional symbols. Much research has gone into the optimal scaling for proportional symbols. As a general rule, make sure that the area, rather than linear proportions like radius or length of a side, is the scaled parameter. For example, if there are four times as many gentrified businesses in El Raval Site 1 than in Site 3, the area of the symbol should be four times greater for Site 1. If the symbol choice is a circle, the radius of the Site 1 symbol should thus be only twice as great (since area scales with the square of the radius).
Used to show the distribution of phenomena where values and location are known. Dot maps create a visual impression of density by placing a dot or some other symbol in the approximate location of the variable being mapped. Dot maps should be used only for raw data, not for prearranged data or percentages. Appropriate themes for dot maps include the distribution of dairy farms, and population distribution in a region.
Their limitations include the difficulty of counting large numbers of dots in order to get a precise value and the need to have a large amount of initial information before drawing the map.
Dot map parameters. When constructing a dot map, two parameters must be considered: the graphical size of each dot and the value associated with each dot. For example, you might stipulate that each dot be 2 pixels in diameter, and each represent 100 persons. In general, many small dots, each representing relatively few instances of the attribute, is more effective than a few large dots, but is more tedious to construct.
Multivariate displays is simply putting lots of data on one map and how to do it, like i don’t get it
Web mapping is the process of using maps delivered by geographical information systems (GIS). Since a web map on the World Wide Web is both served and consumed, web mapping is more than just web cartography, it is both a service activity and consumer activity. Web GIS emphasizes geodata processing aspects more involved with design aspects such as data acquisition and server software architecture such as data storage and algorithms, than it does the end-user reports themselves. The terms web GIS and web mapping remain somewhat synonymous. Web GIS uses web maps, and end users who are web mapping are gaining analytical capabilities. The term location-based services refers to web mapping consumer goods and services. Web mapping usually involves a web browser or other user agent capable of client-server interactions.
While web mapping today is still being developed, challenges and innovations involving the feedback of the quality, the usability, the social benefits, and the legal constraints, drive its evolution
Knowledge of principles of map design, including symbolization, color use, and typography, for a variety of print and digital formats
one or more map images, including inset maps
a legend or key
a visual or narrative explanation of the map scale
supporting media, such as photographs, diagrams, and text
a north arrow or other depiction of orientation
metadata, explaining such information as the currency of the information, sources used, projection, copyrights, and authorship
reference maps – general information about the location of features
thematic maps – show the distribution of a specific topic
All maps are representations
Features are generalized because they can’t be shown at their true size
symbols – represent things on a map
map accuracy – difficult to assess, all maps show a selective view of reality – rather than ask is the map accurate, ask is the map appropriate for my purposes
Map scale – 1:100 – one inch represents 100 inches in the real world
Representing scale – scale bar
large scale – show more detail than a small scale – 1:10000 is larger than 1:25000000
Generalization – intended to remove unnecessary detail – maps cannot show everything
select which features to show and omit
Symbolization – assigning symbols to represent features
geographic dimensions – what geographic features will be on the map
measurement level – how data is measured – qualitative vs quantitative
nominal data – differ in type and can’t be ranked (tree species, land uses)
ordinal data – can be ranked but have relative values (low, medium, high) – can rank them, but can’t tell the difference between them
interval/ratio data – have numerical values between them (elevation, population)
Data Processing – know how the data was manipulated, statistics reported as raw values or standardized by some measure
Visual variables – size, shape, orientation, pattern, hue, value
size and value for quantitative, shape, pattern, hue for qualitative
typography is the design of text, point size, line length, typefaces
Text is a crucial part of any quality map. Text simultaneously serves several purposes:
It identifies unique features (e.g., "The United Kingdom")
It places features within broader categories (e.g., "park")
It locates features within a general geographic context (e.g., this vegetation stand is within "Zion National Park")
It explains the characteristics and meaning of features on the map (e.g., "high economic potential zone")
It prescribes and proscribes action (e.g., "camping not permitted here")
It can add to the aesthetic beauty of a map
It can give a map a aesthetic feel (e.g., using a typeface that looks modern or historical)
Understanding of how the selection of data classification and/or symbolization techniques affects the message of the thematic map
Classification – objects with similar symbols
Up to seven classes – most people can distinguish – try to stick with 5
Classes should be exhaustive (describe all possible values) and should not overlap (no value can fall into two classes)
Way to split classes
Equal range – equal distance between class breaks
Quantiles – equal number of observations in each class
Standard deviation – class breaks based on distance of standard deviation from the mean
Natural breaks – class breaks conform to gaps in data distribution
GIS Design Aspects and Data Modeling
Knowledge of data exchange procedures
Three data models– the conceptual model, the data structure model, and the transfer model
Conceptual Model – This model describes the spatial objects, as well as the logical and topological relationships between the spatial objects and the captured spatial entities. This general model is object oriented and is also based on existing topological and graph models for spatial data.
Data Structure Model – This model is used to express the spatial objects of the conceptual model in terms of transfer data structures. The data structures used in this transfer standard are based on the traditional relational and network models. Data structures viewed as spatial data structures are both the traditional vector and raster models
Transfer Model – This model is used to express the logical constructs of the transfer form in terms of implementation-media constructs. The implementation constructs are made operational by an implementation method. The implementation method selects one or more media and defines the constructs pertaining to those media.
The transfer model is defined in terms of its constructs and logical relationships. It deals with three types of transfer constructs: (1) logical constructs solely pertaining to this standard, (2) constructs relating to the implementation method, and (3) constructs solely pertaining to the transfer media.
Data Dictionary/Definition and Data Dictionary/Domain should be included
a) The specific set of attributes in an attribute module
b) The relationship between these attributes and an entity
c) The authorities of the attributes and (or) entity
d) The format, measurement unit, and maximum length of an attribute
e) Whether an attribute is a part of a primary or foreign relational key.
Schema model should be included
–File based approach: geographic data is encoded in a structured file format, for batch transfer or download
–Application programming interface (API) approach: geographic data is accessed and exchanged as needed between software systems on the same workstation, often interactively with the user
–Web services approach: geographic data is accessed and exchanged over networks and the Internet between software components, using http and other web based protocols
Knowledge of security restrictions on data (e.g., user permissions and access rights)
-ArcGIS – user who creates tables, feature classes, etc. own those datasets
User Access – database must verify the user accounts that connect to it – dba has to add users to database
authentication – database checks the list of users to make sure a user is allowed to make a connection
2 types of authentication
Operating system (OS) authentication – indicates a user logs in to the computer and the credentials for authorization are supplied to the database by the OS of the user’s computer
only need to log in once
Database authentication – users log in to the server and then must separately log in to the database
Groups (roles, types or authorities) – grant users based on their common functions
Public role – any right granted to public is granted to everyone with a db connection
-this would be similar to the connect role
Tips for groups:
Create separate groups for system and object privileges
Choose a naming convention that reflects each type of group for easy reference
Grant privileges directly to the gdb administrator and grant privileges via groups for all other users
Avoid mixing roles with directly granted privileges for non-administrator accounts
Knowledge of database administration
geodatabase admin responsible for gdb system tables, triggers, views, and procedures
DEFAULT gdb version
Default schema names – applies to gdb admin as well as nonadmin users who create data
basic tasks –
backup and recovery databases
periodically testing a backup and recovery plan
Backups are being done as scheduled
Three baisc security tasks
Authentication – setting up user accounts to a control logins to the database
Authorization – setting permissions on various parts of the database
Auditing – tracking who did what with the database
can be based on auditing laws as well
Storage and capacity planning
how much disk storage is required and monitoring disk space
Watch growth trends
Performance monitoring and tuning
Monitor database server on a regular basis to identify bottlenecks
Capacity of the server hardware and OS configuration can limit
Database is physically laid out on the disk drives
Types of indexing
Queries against the db can change how fast the results are returned
DBA needs to understand which monitoring tools are available
DBA needs to quickly ascertain the problem to correct it
Other important tasks
High availability – need to be around all the time
Very Large Databases – Data stored in db has changed
Data extraction, transformation, and loading (ETL) – data must be cleansed before loading
Knowledge of systems architecture and design
Removing data duplication
Improving the currency and accuracy of information used in decision-making
Increasing the reliability of systems
Decentralize data maintenance
Improve GIS system availability and stability
Improve utilization of systems resources
3 tiers –
application logic tier
The system architecture design process aligns identified business requirements (user needs) derived from business strategy, goals, and drivers (business processes) with identified business information systems infrastructure technology (network and platform) recommendations.
System design starts with identifying business needs. This includes identifying user locations and required information products, identifying required data resources, and developing appropriate software applications to do the work
System architecture design translates business needs to identified IT requirements.
Hardware requirements are generated based on peak software processing loads.
Network connectivity requirements are generated based on peak data flow.
Capacity Planning tools are provided to automate the design analysis.
Capacity Planning Tools make the process of aligning Business workflows with selected IT resources agile and iterative in nature, rapidly identifying system performance impacts in response to changing business and technology architecture patterns.
identifying business needs, defining project requirements, and reducing implementation risk
How many users can I support with my existing hardware?
What hardware do I need to purchase?
How many servers (cores) do I need?
What are the software licensing requirements?
What workflow loads should I use for my existing applications?
What are my current workflow service times?
What is the capacity of my current system?
4 architecture domains overall enterprise business needs
The Business Architecture defines the business strategy, governance, organization, and key business processes.
The Information Systems Architecture includes a review of the Data and Application architecture.
> The Data Architecture describes the structure of an organization’s logical and physical data assets and data management resources.
> The Application Architecture provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization.
The Technology Architecture describes the logical software and hardware capabilities that are required to support the deployment of business, data, and application services. This includes IT infrastructure, middleware, networks, communications, processing, standards, etc.
User information product needs establish a foundation for completing the design.
User location and peak business loads establish a foundation for system architecture design.
Infrastructure requirements must be identified to quantify deployment costs.
Network communication capacity is an important consideration for GIS deployments.
Hardware and software procurement requirements must be identified.
Software development and data acquisition needs must be identified.
Best Practice: Business decisions for project funding and procurement authorization are often required for project effort to proceed beyond this phase.
System procurement authorization, based on the design budget and deployment timeline.
Data acquisition and database design efforts begin.
Procurement authorization for application design and development.
Prototype testing plans completed and scheduled to validate product delivery within design performance targets.
Initial deployment and operational testing.
Final system delivery, user training, and workflow migration complete.
System maintenance operations.
Capacity Planning Tools (CPT) – developed as a framework to promote successful GIS system design and implementation
System architecture design process
The enterprise GIS system design process aligns identified business requirements (user needs) derived from business strategy, goals, and drivers (business processes) with recommended business information systems infrastructure technology (network and platform) recommendations.
User needs assessment (results of a GIS user needs assessment provides inputs for the system architecture design analysis)
Workflow loads analysis (translate user needs to project workflows with baseline traffic and processing transaction loads based on estimated workflow complexity)
Technical architecture strategy (identify user locations, network connectivity, and data center server locations)
User requirements analysis (translate peak user workflow loads to peak throughput transaction loads)
Network suitability analysis (translate peak site/network throughput loads to peak site/network traffic and compare with available network bandwidth)
Platform architecture selection (Identify data center platform tier configuration and identify platform selection for each tier)
Software configuration (Identify platform assignment for each workflow software component peak transaction processing load)
Enterprise design solution (combine all peak workflow software component processing loads on the assigned platform tier, translate baseline processing load to selected platform processing load, and generate number of nodes required for each platform tier with estimate of capacity utilization)
Understanding of the enterprise environment
Enterprise GIS environments include a broad spectrum of technology integration. Most environments today include a variety of hardware vendor technologies including database servers, storage area networks, Windows Terminal Servers, Web servers, map servers, and desktop clients,—all connected by a broad range of local area networks, wide area networks, and Internet communications. All these technologies must function together properly to support a balanced computing environment.
Centralized computing solutions with a single database environment are the easiest environments to implement and support. Distributed computer systems with multiple distributed database environments can be very complex and difficult to deploy and support. Many organizations are consolidating their data resources and application processing environments to reduce implementation risk and improve administrative support for enterprise business environments
GIS software deployment patterns are optimized to support your business needs:
Planning and analysis
Knowledge of schemas and domains and how they interact
See Also: data dictionary
[computing] The structure or design of a database or database object, such as a table, view, index, stored procedure, or trigger. In a relational database, the schema defines the tables, the fields in each table, the relationships between fields and tables, and the grouping of objects within the database. Schemas are generally documented in a data dictionary. A database schema provides a logical classification of database objects.
[computing] A set of rules, stored in a file, that describe the structure of an XML document. The number, type, and order of elements allowed in an XML document are described in the schema. An XML parser can compare XML documents against the schema. An XML document that uses open and close tags properly is said to be well formed; if it also follows the rules of its designated schema, it is said to be valid.
See Also : metadata
[data management] A catalog or table containing information about the datasets stored in a database. In a GIS, a data dictionary might contain the full names of attributes, meanings of codes, scale of source data, accuracy of locations, and map projections used.
[data transfer] Information that describes the content, quality, condition, origin, and other characteristics of data or other pieces of information. Metadata for spatial data may describe and document its subject matter; how, when, where, and by whom the data was collected; availability and distribution information; its projection, scale, resolution, and accuracy; and its reliability with regard to some standard. Metadata consists of properties and documentation. Properties are derived from the data source (for example, the coordinate system and projection of the data), while documentation is entered by a person (for example, keywords used to describe the data).
domain [data transfer] The range of valid values for a particular metadata element.
[data structures] In a geodatabase, a mechanism for enforcing data integrity. Attribute domains define what values are allowed in a field in a feature class or nonspatial attribute table. If the features or nonspatial objects have been grouped into subtypes, different attribute domains can be assigned to each of the subtypes.
coded value domain
[ESRI software] A type of attribute domain that defines a set of permissible values for an attribute in a geodatabase. A coded value domain consists of a code and its equivalent value. For example, for a road feature class, the numbers 1, 2, and 3 might correspond to three types of road surface: gravel, asphalt, and concrete. Codes are stored in a geodatabase, and corresponding values appear in an attribute table.
[data structures] A type of attribute domain that defines the range of permissible values for a numeric attribute. For example, the permissible range of values for a pipe diameter could be between 1 and 32 inches.
[standards] For a spatial dataset in ArcGIS 9.1 and previous versions, the defined precision and allowable range for x- and y-coordinates and for m- and z-values, if present.
[ESRI software] In ArcGIS Survey Analyst, a constraint that sets the minimum and maximum values for the geometry attributes. The extents of this domain define the precision at which geometry attributes (x, y, z, m, id) can be stored as integers. There is a finite number of integers available in the system, so the x,y spatial domain is analogous to a square grid that always contains the same number of rows and columns.
Knowledge of digital file management
File creation, edit, management
back up data
Used to keep track and organize files
Hierarchical file system – one that uses directories to organize files into a tree structure
OS has file management system, but can purchase more sophisticated FMS – backup procedures and stricter file protection
Individual files – shapefiles, file gdbs, tables/spreadsheets, CAD, rasters
Databases – direct connection to relational database management systems and big data databases
Geodatabases – stores GIS in a central location for easy access
Cloud – store it in the cloud!
Edit data – allows single-user or multiuser editing
Take control of big data – visualize multiple different types
Integrate your enterprise – data stored in big business systems to extend their analytical capabilities
Data Rules and Relationships – define relationships between datasets and set rules (domains and subtypes)
Manage metadata – describes content, quality, origin, and other characteristics of data – data about data – FGDC, ISO, INSPIRE, and Dublin Core
Secures data – flexibility and control over how GIS platform is deployed, maintained, secured, and used
version creation – child version (new version) created from a parent version (existing version) – identical to parent when first created, but will diverge as changes are made to each version – each dataset in database appears only once but behind the scenes, data is in delta tables (“A” (add) and “D” delete tables) – each version has an owner, description, parent version, associated database state, and level of user access
user access -Private (only owner can view and edit), protected (only owner can edit but all can view), public (anyone can view and edit)
version workflows – simplest is concurrent editors editing DEFAULT version, create a separate version for each editor, another is to create a QA version to QAQC edits from users
states – version references a specific database state – a state is a unit of change that occurs in the database – every edit operation performed in the gdb creates a new db state – edit operation is any set of tasks (additions, deletions, modifications) on features and rows – State ID values apply to any and all changes made in the gdb
DEFAULT version – owned by ArcSDE admin – always exists – root version and ancestor to all other versions – published version of the geodatbase – current
Version management – versions can be created or deleted – edits are isolated in that version until admin merges changes with another version.
Schema changes affect all other versions (adding a new field)
reconcile – edits from an ancestor version (target version) are brought into the version being edited in an edit session (edit version) – ancestor version is any version in direct ancestry of the version being edited
-reconcile – bring all edited features and rows into the edit version – any conflicts will be taken care of
Conflicts – when a feature was edited in both the edit version and the target version
Post – second step when merging edits between two version – post process synchronizes the current edit version with the target version – all edits made in the edit version are saved into the target version – both versions are now identical
compress – actively edited enterprise geodatabase accumulates state IDs in delta tables and has a complex state tree – negatively affects performance – compressing never removes data but instead it cleans up only unused data
Knowledge of database design
database design – process of producing a detailed data model of a database
design process – conceptual schema, logical data model, physical database design
determine the relationships between different data elements
Superimpose a logical structure upon the data on the basis of these relationships
Determine the data to be stored – SME – part of requirements analysis
conceptual schema – determine where relationships and dependency is within the data – data could be changed in the background
Logical Data Model – once relationships and dependencies are determined – arrange the data into a logical structure that can be mapped into the storage objects supported by the database management system – each talbe may be a logical object or a relationship joining one or more instances of logical objects
Physical database design – physical configuration of the database on the storage media – includes detailed specification of data elements, data types, indexing options, and other parameters residing in the DBMS data dictionary – detailed design that includes modules & db hardware & software specs
Old school esri 11 steps to gdb design – not the best method
Identify the information products that you will create and manage with your GIS.
Identify the key data themes based on your information requirements.
Specify the scale ranges and the spatial representations of each data theme at each scale.
Decompose each representation into one or more geographic datasets.
Define the tabular database structure and behavior for descriptive attributes.
Define the spatial behavior, spatial relationships, and integrity rules for your datasets.
Propose a geodatabase design.
Design editing workflows and map display properties.
Assign responsibilities for building and maintaining each data layer.
Build a working prototype. Review and refine your design.
Document your geodatabase design.
Knowledge of database general structure (e.g., tables and data)
schema objects (oracle) – tables
-tables – collection of related data held in structured format within a database, contains fields and rows
-views – result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object
–view is not part of the physical schema – virtual table computed or collated dynamically from data in the database when access to that view is requested
Views can represent a subset of the data contained in a table. Consequently, a view can limit the degree of exposure of the underlying tables to the outer world: a given user may have permission to query the view, while denied access to the rest of the base table.
Views can join and simplify multiple tables into a single virtual table.
Views can hide the complexity of data. For example, a view could appear as Sales2000 or Sales2001, transparently partitioning the actual underlying table.
Views take very little space to store; the database contains only the definition of a view, not a copy of all the data that it presents.
Depending on the SQL engine used, views can provide extra security.
sequences – is an ordered collection of objects in which repetitions are allowed – can be finite or infinite – number of elements is called the length of the sequence
synonyms – an alias or alternate name for a table, view, sequence or other schema object – easier for users to access database objects
indexes – data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure – quickly locate data without having to search every row in a db table every time a database table is accessed – indexes can be on one or more columns of a database table
database links –
snapshots – state of a system at a particular point in time – can refer to an actual copy of the state of a system
procedures – subroutine available to applications that access a relational database system – stored in the database data dictionary – typical uses – (data validation, access control mechanisms)
functions – aka subroutine – In computer programming, a subroutine is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Subprograms may be defined within programs, or separately in libraries that can be used by multiple programs.
non-schema objects – users, roles, contexts, directory objects
Knowledge of geospatial data structure (e.g., topology rules)
Vector – Points, Lines, Polygons (enclosed area) – feature layer is linked to an attribute table
Raster – world is represented by an array of gridded cells
– can store values that represent categories (vegetation type) – basic grid attribute table has a value (code or some real number representing information about the grid cell) and count field (how many grid cells have that same value)
-can also store continuous values (elevation)
-aerial photo is a raster but contains no data – they are the background – not considered “data” structure
TIN – Triangulated irregular network – represents surfaces – created from contours
-Advantages – small areas with high precision elevation data – can use multiple data inputs – more efficient storage than DEM or contour lines
-Disadvantages – accurate TINs require very accurate source data – cost to create high precision elevation is very high and data files are large – TIN production and use is very computer intensive – Raster DEM data are more available
Tabular Information – attribute Table
-attribute field types – numeric, text, date, blob
Topology – features need to be connected using specific rules
-Network topology – features can be connected in a network – lines and junctions are specified and connected so that water can be traced along a flow
-Planar topology – specifies topological rules for features (parcel boundaries cannot overlap each other or streets cannot have to intersect)
Topological relationships – do not change if you imagine a map being on a rubber sheet and you pull and stretch in different directions, the rules are still intact – parcels don’t overlap, streets still intersect
Vector advantages –
Vectors just seemed more correcter Can represent point, line, and area features very accurately. Far more efficient than raster data in terms of storage. Preferred when topology is concerned Support interactive retrieval, which enables map generalization
Vector disadvantages –
Vectors are more complex Less intuitively understood Overlay of multiple vector map is very computationally intensive Display and plotting of vectors can be expensive, especially when filling areas
Rasters are faster… Easy to understand Good to represent surfaces, i.e. continuous fields Easy to read and write – A grid maps directly onto a programming computer memory structure called an array Easy to input and output – A natural for scanned or remotely sensed data – Easy to draw on a screen or print as an image Analytical operations are easier, e.g., autocorrelation statistics, interpolation, filtering
Rasters are bigger Inefficient for storage – Raster compression techniques might not be efficient when dealing with extremely variable data – Using large cells to reduce data volume causes information loss Poor at representing points, lines and areas – Points and lines in raster format have to move to a cell center. Lines can become fat Areas may need separately coded edges Each cell can be owned by only one feature Good only at very localized topology, and weak otherwise. Suffer from the mixed pixel problem. Must often include redundant or missing data
Map data collection often tabulates data at significant points Land surface elevation survey – seeks “high information content” points on the landscape, such as mountain peaks, the bottoms of valleys and depressions, and saddle points and break points in slopes Assume that between triplets of points the land surface forms a plane Triplets of points forming irregular triangles are connected to form a network
Triangulated Irregular Networks (TIN) – Advantages More accurate and use less space than grids Can be generated from point data faster than grids. Can describe more complex surfaces than grids, including vertical drops and irregular boundaries Single points can be easily added, deleted, or moved
Understanding of desktop, server, enterprise, and hosted (e.g., cloud) applications available, including their benefits and shortcomings
Desktop – individual user on a computer, make maps, data analysis, data creation
Server – bring gis into hands of everyone in organization, allows access to web GIS, control of GIS data on your own infrastructure, control over how GIS platform is deployed, maintained, secured and used
hosted (cloud) – ability to discover, use, make, and share maps with any device anywhere, anytime – access other users maps and data – connect more people outside of organization and share latest maps, data, and ideas
enterprise? – is this not server?
An Enterprise GIS is a geographic information system that is integrated through an entire organization so that a large number of users can manage, share, and use spatial data and related information to address a variety of needs, including data creation, modification, visualization, analysis, and dissemination
the server is a method of achieving an enterprise GIS system – this is bullshit!
Working knowledge of GIS hardware and sofetware capabilities (e.g., application servers, data servers, storage devics, and workstations)
basically says that hardware and software are of all types –
software runs on a wide range of hardware types from centralized computer servers to desktop computers used in stand-alone or networked configurations
Software may rely on DBMS
The hardware, software, and communication network(s), collectively referred to here as the system infrastructure for an EGIS, deliver to each end user the specific spatial capabilities and resources needed to support their business functions. A conceptual configuration for the EGIS system infrastructure can be established based on the characteristics of existing system infrastructure; required information products and spatial and nonspatial data resources; essential spatial analysis, display, and reporting functions; needed data management resources; and the anticipated number of end users within the departments.
Knowledge of data models, including vector, raster, grid, TIN, topological, hierarchical, network, and object-oriented
see above for vector, raster, grid (is grid the same as Raster), TIN, topological data, network (topology)
hierarchical database – A database that stores related information in a tree-like structure, where records can be traced to parent records, which in turn can be traced to a root record.
[ESRI software] A collection of topologically connected network elements (edges, junctions, and turns) that are derived from network sources, typically used to represent a linear network, such as a road or subway system. Each network element is associated with a collection of network attributes. Network datasets are typically used to model undirected flow systems.
[database structures] A data management structure that stores data as objects (instances of a class) instead of as rows and tables as in a relational database.
See Also : ESRI Grid
[cartography] In cartography, any network of parallel and perpendicular lines superimposed on a map and used for reference. These grids are usually referred to by the map projection or coordinate system they represent, such as universal transverse Mercator grid.
[data models] See raster.
[ESRI software] An ESRI data format for storing raster data that defines geographic space as an array of equally sized square cells arranged in rows and columns. Each cell stores a numeric value that represents a geographic attribute (such as elevation) for that unit of space. When the grid is drawn as a map, cells are assigned colors according to their numeric values. Each grid cell is referenced by its x,y coordinate location.
GIS Analytical Methods
Knowledge of overlay analysis
Vector Overlay Tools
Identity – Input features, split by overlay features
Intersect – Only features common to all input layers
Symmetrical Difference – Features common to either input layer or overlay, layer but not both
Union – All input features
Update – Input feature geometry replaced by update layer
Raster Overlay Tools
Zonal Statistics – Summarizes values in a raster layer by zones (categories) in another layer—for example, calculate the mean elevation for each vegetation category
Combine – Assigns a value to each cell in the output layer based on unique combinations of values from several input layers
Single Output Map Algebra – Lets you combine multiple raster layers using an expression you enter—for example, you can add several ranked layers to create an overall ranking
Weighted Overlay – Automates the raster overlay process and lets you assign weights to each layer before adding (you can also specify equal influence to create an unweighted overlay)
Weighted Sum – Overlays several rasters multiplying each by their given weight and summing them together.
[analysis/geoprocessing] A spatial operation in which two or more maps or layers registered to a common coordinate system are superimposed, either digitally or on a transparent material, for the purpose of showing the relationships between features that occupy the same geographic space.
[analysis/geoprocessing] In geoprocessing, the geometric intersection of multiple datasets to combine, erase, modify, or update features in a new output dataset.
[analysis/geoprocessing] The process of superimposing layers of geographic data that cover the same area to study the relationships between them.
The following lists the general steps to perform overlay analysis:
Define the problem.
Clear definition of each component and how they interact
Break the problem into submodels.
certain attributes can be in multiple sub models
Determine significant layers.
layers and attributes that affect each submodel need to be identified
Some of these leaders may need to be created
Reclassify or transform the data within a layer.
Ratio—The ratio scale has a reference point, usually zero, and the numbers within the scale are comparable. For example, elevation values are ratio numbers, and an elevation of 50 meters is half as high as 100 meters.
Interval—The values in an interval scale are relative to one another; however, there is not a common reference point. For example, a pH scale is of type interval, where the higher the value is above the neutral value of 7, the more alkaline it is, and the lower the value is below 7, the more acidic it is. However, the values are not fully comparable. For example, a pH of 2 is not twice as acidic as a pH of 4.
Ordinal—An ordinal scale establishes order such as who came in first, second, and third in a race. Order is established, but the assigned order values cannot be directly compared. For example, the person who came in first was not necessarily twice as fast as the person who came in second.
Nominal—There is no relationship between the assigned values in the nominal scale. For example, land-use values, which are nominal values, cannot be compared to one another. A land use of 8 is probably not twice as much as a land use of 4.
Weight the input layers.
factors can be weighted based on their importance
Add or combine the layers.
establish the relationship of all the input factors together to identify the desireable locations
fuzzy logic overlay analysis –
analyze the results – do the potential results answer the question
Fuzzy Membership – The Fuzzy Membership tool reclassifies or transforms the input data to a 0 to 1 scale based on the possibility of being a member of a specified set. 0 is assigned to those locations that are definitely not a member of the specified set, 1 is assigned to those values that are definitely a member of the specified set, and the entire range of possibilities between 0 and 1 are assigned to some level of possible membership (the larger the number, the greater the possibility).
The Fuzzy Gaussian function transforms the original values into a normal distribution. The midpoint of the normal distribution defines the ideal definition for the set, assigned a 1, with the remaining input values decreasing in membership as they move away from the midpoint in both the positive and negative directions. The input values decrease in membership from the midpoint until they reach a point where the values move too far from the ideal definition and are definitely not in the set and are therefore assigned zeros.
The Fuzzy Large transformation function is used when the larger input values are more likely to be a member of the set. The defined midpoint identifies the crossover point (assigned a membership of 0.5) with values greater than the midpoint having a higher possibility of being a member of the set and values below the midpoint having a decreasing membership. The spread parameter defines the shape and character of the transition zone.
The Fuzzy Linear transformation function applies a linear function between the user-specified minimum and maximum values. Anything below the minimum will be assigned a 0 (definitely not a member) and anything above the maximum a 1 (definitely a member). The blue line in the image below represents a positive sloped linear transformation with a minimum of 30 and a maximum of 80. Any value below 30 will be assigned a zero and anything above 80 a 1.
The Fuzzy MS Large transformation function is similar to the Fuzzy Large function, except the definition of the function is based on a specified mean and standard deviation. Generally, the difference between the two functions is that the Fuzzy MS Large function can be more applicable if the very large values are more likely to be a member of the set.
The Fuzzy MS Small transformation function is similar to the Fuzzy Small function, except the definition of the function is based on a specified mean and standard deviation. Generally, the difference between the two functions is that the Fuzzy MS Small function can be more applicable if the very small values are more likely to be a member of the set.
The Fuzzy Near transformation function is most useful if membership is near a specific value. The function is defined by a midpoint defining the center of the set, identifying definite membership and therefore assigned a 1. As values move from the midpoint, in both the positive and negative directions, membership decreases until it reaches 0, defining no membership. The spread defines the width and character of the transition zone.
The Fuzzy Small transformation function is used when the smaller input values are more likely to be a member of the set. The defined midpoint identifies the crossover point (assigned a membership of 0.5) with values greater than the midpoint having a lower possibility of being a member of the set and values below the midpoint having a higher possibility of membership. The spread parameter defines the shape and character of the transition zone
The Fuzzy Overlay tool allows the analysis of the possibility of a phenomenon belonging to multiple sets in a multicriteria overlay analysis. Not only does Fuzzy Overlay determine what sets the phenomenon is possibly a member of, it also analyzes the relationships between the membership of the multiple sets.
The Fuzzy And overlay type will return the minimum value of the sets the cell location belongs to.
The Fuzzy Or overlay type will return the maximum value of the sets the cell location belongs to
The Fuzzy Product overlay type will, for each cell, multiply each of the fuzzy values for all the input criteria
The Fuzzy Gamma type is an algebraic product of Fuzzy Product and Fuzzy Sum, which are both raised to the power of gamma.
Functional knowledge of planar geometry (e.g., points, lines, and polygons) required to convert real world examples into spatial concepts
In mathematics, a plane is a flat, two-dimensional surface. A plane is the two-dimensional analogue of a point (zero dimensions), a line (one dimension) and three-dimensional space. Planes can arise as subspaces of some higher-dimensional space, as with the walls of a room, or they may enjoy an independent existence in their own right, as in the setting of Euclidean geometry.
I think this question will ask “you have a river, what is the best geometry representation of a river”
Knowledge of algebra (e.g., deriving values from a basic formula)
just go nuts:
not 100% sure what they may ask
Knowledge of statistics (e.g., descriptives, summary statistics, and R-squared)
-Summarize a sample rather than use the data to learn about the population that the sample of data is thought to represent.
In statistics, the coefficient of determination, denoted R2 or r2 and pronounced R squared, is a number that indicates how well data fit a statistical model – sometimes simply a line or a curve. An R2 of 1 indicates that the regression line perfectly fits the data, while an R2 of 0 indicates that the line does not fit the data at all. This latter can be because the data is utterly non-linear, or because it is random.
Spatial Statistics – Conceptual Models –
Inverse distance (spatial autocorrelation) – all features influence all other features, but the closer something is, the more influence it has
Distance band – features outside a specified distance do not influence the features within the area
Zone of indifference – combines inverse distance and distance band
K Nearest Neighbors – a specified number of neighboring features are included in calculations
Polygon Contiguity – polygons that share an edge or node influence each other
Spatial weights – specified by user (ex. Travel times or distances)
Mean Center ?Average x and y-coordinates for all features ?Useful for comparing distributions of different features or over time
Central feature ?Feature having the shortest total distance to all other features ?Useful for finding the most accessible feature
Standard distance – the extent to which the distance between the mean center and the features vary from the average distance
Orientation – Linear directional mean identifies general mean direction of a set of lines
Standard deviational ellipse is useful for comparing distributions of features and comparing one type of feature at different times
Useful to: Better understand geographic phenomena (ex. Habitats)
Monitor conditions (ex. Level of clustering)
Compare different sets of features (ex. Patterns of different types of crimes)
Average Nearest Neighbor – Measures how similar the actual mean distance between locations is to the expected mean distance for a random distribution
Ripley’s K-function – GIS counts the number of neighboring features within a given distance to each feature based on location. The test compares the observed K value at each distance to expected K value for a random distribution.
Global statistics – identify and measure the pattern of the entire study area ? Do not indicate where specific patterns occur .
Local Statistics – identify variation across the study area, focusing on individual features and their relationships to nearby features (i.e. specific areas of clustering)
Spatial Autocorrelation (Moran’s I)
?Measures whether the pattern of feature values is clustered, dispersed, or random.
?Calculates I values to test for statistically significant clustering
Anselin Local Moran’s I
?Measures the strength of patterns for each specific feature.
Positive I value: ? Feature is surrounded by features with similar values, either high or low. ? Feature is part of a cluster. ? Statistically significant clusters can consist of high values (HH) or low values (LL) Negative I value: ? Feature is surrounded by features with dissimilar values. ? Feature is an outlier. ? Statistically significant outliers can be a feature with a high value surrounded by features with low values (HL) or a feature with a low value surrounded by features with high values (LH).
Getis-Ord General G
?Global statistic that indicates whether similar values (either high or low) are clustered.
?Works best when either high or low values are clustered (but not both).
?Value of G score indicates statistically significant relationships
Hot Spot Analysis (Getis-Ord Gi*)
?Local version of the G statistic that indicates hot (cluster of high values) or cold spots (clusters of low values)
?To be statistically significant, the hot spot or cold spot will be surrounded by features with similar values, but have significantly higher/lower values than its neighbors.
?G=high value=hot spots ?G=low value=cold spots
With Regression Analyses, you ask WHY something is happening.
Model, examine and explore spatial relationships
Used to analyze linear relationships among variables.
Linear relationships are positive or negative
Regression analyses attempt to demonstrate the degree to which one or more variables potentially promote positive or negative change in another variable.
Linear Regression Techniques
Ordinary Least Squares (OLS) is the best known technique and a good starting point for all spatial regression analyses.
? Global model = provides 1 equation to represent the entire dataset
Geographically Weighted Regression (GWR)
? Local Model = fits a regression equation to every feature in the dataset
? Regional variation incorporated into the regression model
Knowledge of basic programming (e.g., scripting, object oriented, query, and extensible)
Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which are data structures that contain data, in the form of fields, often known as attributes; and code, in the form of procedures, often known as methods. A distinguishing feature of objects is that an object's procedures can access and often modify the data fields of the object with which they are associated (objects have a notion of "this" or "self"). In OO programming, computer programs are designed by making them out of objects that interact with one another. There is significant diversity in object-oriented programming, but most popular languages are class-based, meaning that objects are instances of classes, which typically also determines their type.
In software engineering, extensibility (not to be confused with forward compatibility) is a system design principle where the implementation takes future growth into consideration. It is a systemic measure of the ability to extend a system and the level of effort required to implement the extension. Extensions can be through the addition of new functionality or through modification of existing functionality. The central theme is to provide for change – typically enhancements – while minimizing impact to existing system functions.
Knowledge of raster/vector principles
[data models] A coordinate-based data model that represents geographic features as points, lines, and polygons. Each point feature is represented as a single coordinate pair, while line and polygon features are represented as ordered lists of vertices. Attributes are associated with each vector feature, as opposed to a raster data model, which associates attributes with grid cells.
[graphics (computing)] Any quantity that has both magnitude and direction.
vector data model
[data models] A representation of the world using points, lines, and polygons. Vector models are useful for storing data that has discrete boundaries, such as country borders, land parcels, and streets.
[data models] A spatial data model that defines space as an array of equally sized cells arranged in rows and columns, and composed of single or multiple bands. Each cell contains an attribute value and location coordinates. Unlike a vector structure, which stores coordinates explicitly, raster coordinates are contained in the ordering of the matrix. Groups of cells that share the same value represent the same type of geographic feature.
[ESRI software] In ArcGIS, an in-memory representation of a raster dataset. A raster may exist in memory as a subset of a raster dataset; it may have a different cell size than the raster dataset; or it may exist using a different transformation than the raster dataset.
raster data model
See Also: vector data model
[data models] A representation of the world as a surface divided into a regular grid of cells. Raster models are useful for storing data that varies continuously, as in an aerial photograph, a satellite image, a surface of chemical concentrations, or an elevation surface.
Knowledge of scales (e.g., visual, verbal, relative, absolute, physical, and display vs. data)
verbal scale – expresses in words a relationship between a map distance and ground distance
One inch represents 16 miles.
Visual scale – graphic scale or bar scale
representative scale – representative fraction or ratio scale – 1:24,000 – 1” = 24000”
(a) Convert verbal scale of "1" to 18 miles" to RF
An absolute scale is a system of measurement that begins at a minimum, or zero point, and progresses in only one direction. An absolute scale differs from an arbitrary, or "relative," scale, which begins at some point selected by a person and can progress in both directions. An absolute scale begins at a natural minimum, leaving only one direction in which to progress. This natural minimum must be an intrinsic property of the measured dimension rather than a natural side-effect of its progression (i.e.: Water freezes and boils naturally at certain temperatures, but these are not natural minimums or maximums of temperature.)
display vs data?
abstract data – what we draw but isn’t there (political boundaries)
physical – land masses and bodies of water
abstracted from their true physical appearance and simplified in a way that allows me to see only what’s useful
Knowledge of units of measurement (e.g., conversion and angular vs. metric)
1 mi = 5280 ft
1 ft = .3048 m
1 mi = 1.6093 km
1 int nautical mile = 2025.4 yd = 6076.12
90° in a right angle, 60 minutes of arc in one degree, 60 seconds of arc in a minute
Radians – 360° is a whole circle – 2pi x radius is the circle
Bearings – angle less than 90° within a quadrant defined by the cardinal directions
Azimuth – angle between 0° and 360° measured clockwise from north
not sure what angular vs metric is
Knowledge of selection queries (e.g., attribute, spatial, and location)
New Selection, Add to Selection, Remove from Selection, Subset Selection, Switch Selection, Clear Selection
Within a Distance
Contains – features contain an input polygon
Completely Contains – features must completely contained a input polygon
Contains Clementini – feature must completely contain the input polygon but if it is entirely on the boundary, it will not be selected. no part can be on the inside or outside
Within – input layer will be selected if they are within a selecting feature – selecting feature must be polygons
Completely Within – features in input must be completely within the selecting features (polygons)
Within Clementini – features in input must be completely within the selecting features and cannot be entirely on the boundary of the features
Are Identical To – features are identical to input layer
Boundary Touches – features in the input layer will be selected if they have a boundary that touches a selecting feature – must be lines or polygons – must be completely inside or outside the polygon
Share a Line Segment With – features in the input layer will be selected if they share a line segment
Crossed by the Outline of – input features will be selected if they are crossed by the outline of a selecting feature – must be lines or polygons
Have their Center In – features will be selected if their center falls within a selecting feature
Contained By – same as Within
difference between spatial and location?!?!
Knowledge of different data types (e.g., SHP, GDB, Coverage, DGN, TXT, and IMG) and formats (spatial, rendered, and tabular)
SHP – shapefile
.shp – shape format – feature geometry itself
.shx – shape index format – positional index of the feature geometry to allow seeking forwards and backwards quickly
.dbf – attribute information
.prj – projection format
.sbn & .sbx – spatial index
.shp.xml – geospatial metadata in XML format
GDB – geodatabase
.gdb – file geodatabase
.mdb – personal geodatabase based on microsoft access
coverage feature class
[ESRI software] In ArcInfo, a classification describing the format of geographic features and supporting data in a coverage. Feature classes include point, arc, node, route, route system, section, polygon, and region. One or more coverage features are used to model geographic features; for example, arcs and nodes can be used to model linear features, such as street centerlines. The tic, annotation, link, and boundary feature classes provide supporting data for coverage data management and viewing.
DGN – AutoCAD and MicroStation
Txt – Text
IMG – Image
LiDAR – remote sensing technology that measures distance by illuminating a target with a laser and analyzing the reflected light
Raster – .jpg, .tif, .gif
is this just knowing the extensions? and knowing what’s in it?
Knowledge of different field types
Short integer – between -32768 and 32768
Long integer – between -2147483648 and 2147483647
Float (single-precision floating-point numbers)
Double (double-precision floating-point numbers)
Text – could be a coded value – assign to an integer through a domain
BLOBs – data stored as a long sequence of binary numbers – ArcGIS stores annotation and dimensions as BLOBs – images, multimedia, bits of code
Object Identifiers – Unique IDs and FIDs
Global Identifiers – Global ID and GUID – data types store registry style strings consisting of 36 characters enclosed in curly brackets
Raster field types – raster can be stored within the geodatabase
Geometry – point, line, polygon, multipoint, multipatch
Knowledge of data relationships (e.g., one to one and many to many)
1-1 relationship – each object of the origin table/feature class can be related to zero or one object of the destination table/feature class
1-Many relationship – each object in the origin table/feature class can be related to the multiple objects in the destination table/feature class
Many-Many relationship – multiple objects of the origin table/feature class can be related to multiple objects of the destination table/feature class
Knowledge of data collection, transfer, and format conversion (e.g., export formats, properties, and settings)
Primary data sources are those collected in digital format specifically for use in a GIS project
Secondary sources are digital and analog datasets that were originally captured for another purpose and need to be converted into a suitable digital format for use in a GIS project.
Data collection Workflow:
Planning includes establishing user requirements, garnering resources, and developing a project plan.
Preparation involves obtaining data, redrafting poor-quality map sources, editing scanned map images, removing noise, setting up appropriate GIS hardware and software systems to accept data.
Digitizing and transfer are the stages where the majority of the effort will be expended. Editing and improvement covers many techniques designed to validate data, as well as correct errors and improve quality.
Evaluation is the process of identifying project successes and failures.
-3 types of Resolution – key physical characteristic of remote sensing systems
-Spatial Resolution – size of object that can be resolved and the most usual measure is the pixel size
-Spectral resolution – parts of the electromagnetic spectrum that are measured
-Temporal resolution – repeat cycle – frequency with which images are collected for the same area
Surveying – Ground surveying based on the principle that the 3-D location of any point can be determined by measuring angles and distances from other known points
Ground survey – time consuming and expensive activity
-Used to capture buildings, land, and property boundaries, manholes and other objects
GPS is another method
LiDAR – scanning laser rangefinder to produce accurate topographic surveys
-Scan hardcopy maps, film, paper maps, aerial photographs, images
-Map, aerial photographs and images are scanned prior to vectorization
Vector data capture – digitizing vector objects from maps and other geographic data sources
heads-up digitizing and vectorization – process of converting raster data into vector data
Digitize vector objects using a mouse or digitizing cursor
Measurement error – human errors during digitizing – overshoots, undershoots, invalid polygons, sliver polygons
-Rubbersheeting – assumes that spatial autocorrelation exists among errors
Photogrammetry – science and technology of making measurements from pictures, aerial photographs, and images
measurements are captures from overlapping pairs of photographs using stereo plotters
Orientation – process of creating a stereo model suitable for viewing and extracting 3D vector coordinates that describe geographic objects
Triangulation – used to assemble a collection of images into a single model so that accurate and consistent information can be obtained from large areas
Orthoimages – images corrected for variations in terrain using a DEM
COGO data entry – COGO – coordinate geometry – methodology for capturing and representing geographic data
COGO – uses survey-style bearings and distances to define each part of an object
COGO – very precise measurements and are often regarded as the only legally acceptable definition of land parcels
Syntactic translation – converting specific digital symbols (letters and numbers) between systems
Semantic translation – converting the meaning inherent in geographic information
Attribute data – entered by direct data loggers, manual keyboard entry, optical character recognition, voice recognition
Data collection – expensive
Types of collection – data capture or data transfer
Two capture methods – primary (direct measurement) and secondary (indirect derivation)
GPS – 24 satellites – orbit earth twice a day – revolution every 12 hours – altitude of about 12,000 miles – started by us department of defense in the 1970’s for military
Space segment – NAVigation Satellite Timing and Ranging (NAVSTAR) constellation – GPS satellites which transmit signals on two phase modulated frequencies – transmit a navigation message that contains orbital data for computing the positions of all satellites
Ground segment – called the control segment – Master Control Station – near Colorado Springs Colorado – monitoring locations around the world – purpose of control segment is to monitor satellite transmissions continuously to predict the satellite ephemeris, to calibrate satellite clocks and update the navigation message periodically
User segment stands for the total GPS user community – user will typically observer and record the transmissions for several satellites and apply solution algorithms to obtain position, velocity, and time
Standard Positioning Service – signal broadcast for civilian use
Horizontal location – 3 satellites are required
Vertical position – min 4 satellites are required
Calculate distance by measuring the time interval between the transmission and reception of a satellite signal
Trilateration – used to determine position of the GPS receiver
-Accuracy dependent on type of GPS receiver, field techniques, post processing of data, error from various sources
3 Types of GPS receivers
-Recreational Grade – accuracy within 5 to 20 meters, no ability to post process data, can do real time correction using Wide Area Augmentation System (SAAS) – can be used to navigate to a specific area – compile uncorrected GPS data
-Mapping Grade – accuracy from sub meter to 5 meters – GPS receivers can log raw GPS data – enabling data to be post-processed using GPS software – higher level of precision – GPS receiver can communicate with a base station – store attributes of features, use a data dictionary and upload data from the GPS device to a PC
-Survey or High Accuracy Grade – instruments with associated software that can achieve one centimeter relative accuracy – land surveyors for boundary, topographic, and geodetic surveys, photogrammetry and other activities requiring high accuracy
-Multipath – errors caused by reflected GPS signals arriving at the GPS receiver – nearby structures and other reflective surfaces
-Atmosphere – GPS signals can experience delays when traveling through the atmosphere – Common atmospheric conditions can affect GPS signals such as tropospheric delays and ionospheric delays
-Distance from Base Station – differential correction will increase the quality of the data, accuracy is degraded slightly as the distance from the base station increases
-Selective Availability – intentional degradation of the GPS signals by the department of defense (DOD) to limit accuracy for non-U.S. military and government users – currently turned off, but can turn it back on whenever
Noise – error is the distortion of the satellite signal prior to reaching the GPS receiver and or additional signal piggy backing onto the GPS satellite signal
Before collecting – Planning
-Satellite availability and known outages – be sure that satellites will be available – United States Coast Guard maintains a website that generates a digest of known forecasted GPS satellite outages – digest called Notice Advisory to NAVSTAR Users (NANU)
-PDOP – Position Dilution of Precision – collect data when there is an optimum satellite availability (four or more) and when satellites are in an appropriate configuration to produce an acceptable (lower) PDOP value – higher PDOP values are bad –
-Local Obstructions of the Sky – be aware of local obstructions such as a canyon, forest canopy, etc.
-GPS data dictionary design – designed for specific project to make project efficient based on information being collected
Set before going to the field
PDOP values – set to 6 or less. Higher levels will be less reliable data
Signal to Noise Ratio (SNR) mask – set the value of the SNR mask higher to help minimize noise error – user manufacturer recommendations
Elevation Mask – set it to 15 degrees – default angle to minimize the amount of atmosphere through which the satellite signal has to travel
Data Collection Rate (sync rate) – recommended to collect point data at 1-second interval – collect polygon and line data at a 5 second interval – collect point data at the same data collection interval as the base station
Datum – GPS receivers are designed to collect GPS positions relative to the WGS84 datum – can designate what datum to be used
Projection – Make sure projection is correct
Unit of Measure – be aware of the units of measure with each projection
UTM – is in meters
State Plane is in US survey Feet or meters
Latitude/Longitude – Degrees/Minutes/Seconds (DMS) 43o 5’ 20”
Latitude/Longitude – Decimal Degrees (DD) 43.088889o
Latitude/Longitude – Degrees and decimal minutes – 43o 5.33333’
UTM 18 – (4740283N, 434057E)
State Plane – US feet – (312608N, 313525E)
US National Grid – (18T WN 7125315437)
QC – use high resolution orthophotos to see if there are gross errors
GPS Receiver Antenna – orient the GPS antenna skyward – and not block antenna with their hands and body and head
Prohibit Editing the Data Dictionary
Data Download – Download data as soon as possible to minimize risk of losing the data
Post-Processing – As soon as data downloaded
-Rapid Identification of reference stations that are out of service
-Avoidance of encountering a condition where reference stations have been deleted
-Compliance with a standardized workflow procedure
Base station being used – recommended to only use NOAA/NGS base stations – advanced users can establish their own base station
Metadata – According to the FGDC – Federal Geographic Data Committee – Maintains the value of the data set over time, preserves the data description, allows users to search for and use existing geospatial data and contributed to an NSDI clearinghouse
Spatial Data Transfer Standard (SDTS) SDTS is “a robust way of transferring earth-referenced spatial data between dissimilar computer systems with the potential for no information loss. It is a transfer standard that embraces the philosophy of selfcontained transfers, i.e. spatial data, attribute, georeferencing, data quality report, data dictionary, and other supporting metadata all included in the transfer” (USGS, http://mcmcweb.er.usgs.gov/sdts/) Draft standard published in The American Cartographer (1988) FIPS (Federal Information Processing Standards) 173 approved 1992 Standard consists of several parts
The American National Standards Institute’s (ANSI) Spatial Data Transfer Standard (SDTS) is a mechanism for archiving and transferring of spatial data (including metadata) between dissimilar computer systems. The SDTS specifies exchange constructs, such as format, structure, and content, for spatially referenced vector and raster (including gridded) data. The SDTS includes a flexible conceptual model, specifications for a quality report, transfer module specifications, data dictionary specifications, and definitions of spatial features and attributes.
The U.S. Geological Survey (USGS) remains the designated maintenance authority for the base standard and SDTS Parts 4 (TVP) and 5 (RPE). Maintenance of other profiles will be conducted by the sponsoring organization(s)
[data transfer] The process of moving data from one system to another or from one point on a network to another.
GML – open, vendor-neutral eXtensible Markup Language (XML) encoding for transport and storage of geographic information –
Format Conversion –
Hardware Specific Formats
There are two types of formats, those that preserve and use the actual ground coordinates of the data and those that use alternative page coordinate description of the map. Page Coordinates are used when a map is being drafted for display in a computer mapping program or in the data display module. In the late 1970s, programs came out that were device independent.
The Hewlett-Packard Graphics Language (HPGL) is a page description language designed for use with plotters and printers. Each line of the file contains one move command, so a line segment connects two successive lines or points. It is unstructured and does not store or use topology.
PostScript is a page definition language that is usually used to export or print a map rather than data. It supports graphics in both vector and raster formats. Postscript is used commonly by Adobe, and most printers are able to read it.
Digital Exchange Format (DXF)
DXF is an external format for transferring files between computers or between software packages. It is produced by Autocad. It does not have topology, but offers good detail on drawings, line widths and styles, colors, and text. DXF is typically constructed in 64 layers. Each layer consists of different features; allowing the user to separate features.
Omaha Public Power District uses this kind of software. It is a turn-key system with street and power line layers. The problem is that you can not tell what street the power line is on or closest to because it lacks topology and spatial analysis.
Digital Line Graph (DLG)
DLGs are distributed by the government, and are available at 1:100,000 and 1:24,000 scales. Features are in separate files that most GIS packages will import, although extra data manipulation is often necessary. DLGs consist of line work with the contours removed, therefore elevation is not available.
TIGER format was first distributed by the US. Census Bureau in 1990. It includes block level maps of every village, town, and city in the United States. It includes geocoded block faces with address ranges of street numbers. This means than that they include topology and can address match. The maps are a combination of DLG and DBF/DIME files. They used the 1980 Census Bureau's maps along with the USGS's DLG maps, thus combining urban and nonurban areas.
TIGER consist of an arc/node type arrangement with separate files for points (zero cells), lines (one cells) and areas (two cells) that are linked together by cross-indexing. Cross-indexing means some features can be encoded as landmarks that allow GIS layers to be tied together.
A shapefile is a vector data format for storing the location, shape, and attributes of geographic features. A shapefile is stored in a set of related files and contains one feature class.
Scalable Vector Graphics
An SVG is an image that is an extension of the XML language. Any program that recognizes XML can display the SVG image. The scalable part of the term emphasizes that you can zoom- in on an image and not lose resolution. SVG files also have the advantages of being smaller, and arriving faster, than conventional image files such as GIF, PDF, and JPEG.
This is a data model for storing geographic features using ArcInfo software. A coverage stores a set of thematically associated data considered to be a unit. It usually represents a single layer, such as soils, streams, roads, or land use. In a coverage, features are stored as both primary features (points, arcs, polygons) and secondary features (tics, links, annotation). Feature attributes are described and stored independently in feature attribute tables. Coverages cannot be edited in ArcGIS.
Arc-Info Interchange File (.e00)
An ArcInfo interchange file, also known as an export file, is a file format used to enable a coverage, grid or TIN and an associated INFO table to be transferred between different machines. ArcInfo interchange files have a .e00 extension, which increments to .e01, .e02, and so on, if the interchange file is composed of several separate files.
A geo-database is an object-oriented data model that represents geographic features and attributes as objects and the relationships between objects but is hosted inside a relational database management system. A geodatabase can store objects, such as feature classes, feature data sets, nonspatial tables, and relationship classes.
Standard Raster Format
Many of the formats are based on photographic formats. The file structure has a header with a fixed length and a keyword or "magic number" to identify the format. In the header the length of one record in bits and the number of rows and columns can be found. Often the header also has a color table. This explains what colors to project.
Tagged Image File Formats (TIFF)
This format is associated with scanners. It saves the scanned images and reads them. TIFF can use run length and other image compression schemes. It is not limited to 256 colors like a GIF.
As part of a header in a TIFF format it puts Lat/Long at the edges of the pixels.
Graphic Interchange Format (GIF)
Graphic Interchange Format. A file format for image files, commonly used on the Internet. It is well-suited for images with sharp edges and relatively few gradations of color.
Joint Photograph Experts Group (JPEG)
JPEG is a common picture format. It uses a variable-resolution compression system offering both partial and full resolution recovery.
Digital Elevation Models or DEM have two types of displays. The first is 30-meter elevation data from 1:24,000 seven-and-a-half minute quadrangle map. The second is the 1:250,000 3 arc-second digital terrain data. DEMs are produced by the National Mapping Division of USGS.
Band Interleaved by Pixel (BIP), Band Interleaved by Line (BIL)
BIP and BIL are formats produced by remote sensing systems. The primary difference among them is the technique used to store brightness values captured simultaneously in each of several colors or spectral bands.
Landsat satellite imagery and BIL information are used in RS Landsat. In one format, using BIL, pixel values from each band are pulled out and combined. Programs that use this kind of information include IDRISI, GRASS, and MapFactory. It is fairly easy to exchange information from within these raster formats.
Raster-to-Raster & Vector-to-Vector
There are many types of vector formats used in GIS, and even more raster formats. It is often necessary to change between file formats, even if they are both raster, or both vector, to make data sets useable together. There are many free, and commercial, translator and converter software available on the web. Some GIS programs support this type of conversion also; for example, the conversion tool available in ArcGIS can be used to switch between a number of formats.
Raster-to-Vector & Vector-to-Raster
Moving from vector to raster is not that difficult. A line or polygon is simply given a pixel value. The opposite is not true though. The problem is that one line might be several pixels wide, therefore one has to skeletonize the line, often leaving it very jagged. This is a time consuming and complicated procedure. Sometimes it is impossible to exchange, and one cannot move between the formats. If this is the case, the map has to be re-digitized. In other instances, there is just a poor translation, and data is lost in the exchange.
The Federal Information Processing Standard 173, called the spatial data transfer standard (SDTS), was established for the exchange of data between different formats. It is extremely complicated because it has to produce a bibliography, a terminology, and a complete list of geographic and map features. It also has to address the issue of data accuracy.
Two major points can be made about the industry. The first is that none of the industry standards exchange topology with the data; they only transfer the graphic information. The second point is that with many different formats each package has to include a large number of format translators.
Open GIS Consortium
The Open Geospatial Consortium, Inc. (OGC) is a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services. Through member-driven consensus programs, OGC works with government, private industry, and academia to create open software application programming interfaces for geographic information systems (GIS) and other mainstream technologies.
GML or Geography Markup Language is an XML based encoding standard for geographic information developed by the OpenGIS Consortium (OGC). The objective is to allow internet browsers the ability to view web based mapping without additional components or viewers.
Knowledge of data quality, including geometric accuracy, thematic accuracy, resolution, precision, and fitness for use
Knowledge of metadata and its standards (e.g., ISO and FGDC)
Content Standard for Digital Geospatial Metadata (CSDGM)
ISO 19115:2003 Geographic information – Metadata (corrigendum 1): The base ISO metadata standard for the description of geographic information and services. Expected to be replaced by ISO 19115-1:2014 – Geographic Information – Metadata – Part 1: Fundamentals.
ISO 19115–2:2009 Geographic information – Metadata – Part 2: Extensions for the description of imagery, gridded data and data collected using instruments, e.g. monitoring stations and measurement devices. These extensions also include improved descriptions of lineage and processing information.
North American Profile (NAP) of ISO 19115: A US and Canada joint profile of ISO 19115:2003 that extends some domains, increases conditionality for some elements, and specifies best practices for populating most elements.
ISO 19110:2005 Geographic information – Methodology for Feature Cataloging: An affiliate standard that supports the detailed description of feature types (roads, rivers, classes, rankings, measurements, etc.) in a manner similar to the CSDGM Entity/Attribute Section. The standard can be used in conjunction with ISO 19115 to document geospatial data set feature types or independently to document data models or other feature class representations.
ISO 19119:2005 Geographic information – Services – Amendment 1: Extensions of the service metadata model An affiliate standard that supports the detailed description of digital geospatial services including geospatial data portals, web mapping applications, data models and online data processing services. The standard can be used in conjunction with ISO 19115 to document services associated with a specific data set/series or independently to document a service.
ISO 19139:2007 Geographic information — Metadata — XML schema implementation: An XML document that specifies the format and general content of an ISO 19115 the metadata record.
Data Type – The type of geospatial resource you document will affect your standard selection.
CSDGM was developed for the documentation of GIS vector, raster and point data.
ISO 19115 was developed for the documentation of GIS vector and point data and geospatial data services such as web-mapping applications, data catalogs, and data modeling applications.
ISO 19115-2 fully includes ISO 19115 and adds elements to describe imagery and gridded data as well as data collected using instruments, e.g. monitoring stations and measurement devices.
metadata represents the who, what, when, where, why, and how of the resource
-include core library catalog elements such as title, abstract, and publication data
-geographic extent and projection information
-database elements such as attribute label definitions and attribute domain values
Understanding of the difference between quality control and quality assurance in the context of a given geospatial project
Quality Control – used in developing systems to ensure products or services are designed and produced to meet or exceed customer requirements
Quality Assurance – refers to planned and systematic production processes that provide confidence in a product’s suitability for its intended purpose – set of activities intended to ensure that products satisfy customer requirements in a systematic, reliable fashion. QA cannot absolutely guarantee the production of quality products
two principles – QA
Fit for purpose – the product should be suitable for the intended purpose
Right first time – mistakes should be eliminated
Quality Assurance is process oriented and focuses on defect prevention, while quality control is product oriented and focuses on defect identification.
Quality Assurance – QA is a set of activities for ensuring quality in the processes by which products are developed
Quality Control – QC is a set of activities for ensuring quality in products – the activities focus on identifying defects in the actual products produced
Knowledge of data archiving and retrieval
-provides a mechanism for capturing, managing, and analyzing data change
-creates and maintains a separate feature class schema associated with the versioned geodatabase
-when enabled, maintains all changes saved or posted to the DEFAULT version in an associated archive class
-enables temporal analysis of geospatial resources over time
Knowledge of the differences among a join, a merge, a union, a clip, and an intersect
join – multiple tables duplicating information in the database –
Merge – Combines multiple input datasets of the same data type into a single, new output dataset. This tool can combine point, line, or polygon feature classes or tables.
Use append tool to combine input datasets with an existing dataset
Union – Computes a geometric union of the Input Features. All features will be written to the Output Feature Class with the attributes from the Input Features, which it overlaps.
Clip – Extracts input features that overlay the clip features.
Intersect – Computes a geometric intersection of the input features. Features or portions of features which overlap in all layers and/or feature classes will be written to the output feature class.
Knowledge of basic Geomatics
Geomatics – Geomatics is the science and technology of gathering, analyzing, interpreting, distributing and using geographic (or spatially referenced) information. Geomatics encompasses a broad range of disciplines: Cartography, surveying, mapping, remote sensing, GIS and GPS
Tools and techniques used in land surveying (Total Station, Level Machine, Theodolite, Plane Table, Chain etc.) , remote sensing, GIS, global navigation satellite systems (GPS, GLONASS, GALILEO, COMPASS), photogrammetry, and related forms of earth mapping
GLONASS – Russian
Upcoming Galileo positioning system
proposed COMPASS navigation system of China
IRNSS of India
Knowledge of basic field data collection
check out data collection guidelines at BC