Published in 2020
Crash analysis methods typically use annual average daily traffic as an exposure measure, which can be too aggregate to capture the safety effects of variations in traffic flow and operations that occur throughout the day. Flow characteristics such as variation in speed and level of congestion play a significant role in crash occurrence and are not currently accounted for in the American Association of State Highway and Transportation Officials’ Highway Safety Manual. This study developed a methodology for creating crash prediction models using traffic, geometric, and control information that is provided at sub-daily aggregation intervals. Data from 110 rural four-lane segments and 80 urban six-lane segments were used. The volume data used in this study came from detectors that collect data ranging from continuous counts throughout the year to counts from only a couple of weeks every other year (short counts). Speed data were collected from both point sensors and probe data provided by INRIX.
The results showed that models that used data aggregated to an average hourly level reflected the variation in volume and speed throughout the day without compromising model quality. Crash predictions for urban segments underwent a 20% improvement in mean absolute deviation for total crashes and a 9% improvement for injury crashes when models using average hourly volume, geometry, and flow variables were compared to the model based on annual average daily traffic. Corresponding improvements over annual average daily traffic models for rural segments were 11% and 9%. Average hourly speed, standard deviation of hourly speed, and differences between speed limit and average speed had statistically significant relationships with crash frequency. For all models, prediction accuracy was improved across all validation measures of effectiveness when the speed components were added. The positive effect of flow variables was true irrespective of the speed data source. Further investigation revealed that the improvement achieved in model prediction by using a more inclusive and bigger dataset was larger than the effect of accounting for spatial/temporal data correlation. For rural hourly models, mean absolute deviation improved by 52% when short counts were added in comparison to the continuous count station only models. The respective value for urban segments was 58%. This means that using short count stations as a data source does not diminish the quality of the developed models. Thus, a combination of different volume data sources with good quality speed data can lessen the dependency on volume data quality without compromising performance. Although accounting for spatial and temporal correlation improved model performance, it provided smaller benefits than inclusion of the short count data in the models.
This study showed that it is possible to develop a broadly transferable crash prediction methodology using hourly level volume and flow data that are currently widely available to transportation agencies. These models have a broad spectrum of potential applications that involve assessing safety effects of events and countermeasures that create recurring and non-recurring short-term fluctuations in traffic characteristics.
Last updated: November 9, 2023