From Twitter to Traffic Predictor

Sean Qian

and Weiran Yao have used information extracted from tweets to provide unparalleled accuracy for predicting morning traffic patterns. Qian, associate professor of civil and environmental engineering, and Yao, Qian’s Ph.D. student, published their results in Transportation Research.


traffic-prediction-900x600-min.jpg
The morning commute period is one of the busiest times of day for traffic; however, it has also proven to be the most difficult time to predict traffic patterns. This is because most methods for traffic prediction rely on having a consistent flow of traffic data from the time leading up to the predicted period. However, the majority of people spend the time preceding their commute sleeping or performing their morning routines at home, leaving a large gap in predictive traffic data.

Qian and Yao’s method solves this problem by pulling data from tweets sent between the evening prior and early morning of the following day. They first used Twitter’s application programming interface (API) to identify tweets within a given area (in this case, the city of Pittsburgh) with geotags indicating from where they were sent. They then used another application called Twint, a web scraper, which pulled other posts from users with geotagged tweets, to create a better picture of the times and general area within which that user was active. All data was anonymized and stripped of any personally identifiable information prior to publishing.

“We argue that tweets capture three types of useful information for explaining next-day morning traffic, which includes people’s sleep-wake status, local events and (planned) traffic incidents,” Qian and Yao wrote.

Further augmentation of this dataset allowed the researchers to extract additional information. Using language analysis, the team identified search terms that might indicate a traffic incident. This includes not only accidents, but also planned closures or large events like a concert, sports game or holiday celebration.

Simple personal tweets like “Had a blast at the Pirates game!” or “This fender bender ahead is going to make me late,” can actually provide crucial information, especially when tagged with a geotag or informed by other tweets from that user. Further data was also pulled from official accounts, such as news outlets and local government, which often tweet direct reports on accidents and planned closures.

When combined, these methods provide a large dataset of information that indicated the geographic distribution and sleep/wake time of likely commuters, as well as both planned and accidental traffic incidents that may affect their commute. This bridged the information gap created by the overnight lull in traffic.

With this information, Qian and Yao were able to provide traffic predictions for Pittsburgh’s morning commute period with previously unseen accuracy and have created a comprehensive framework for predicting morning traffic conditions in urban areas. This information also allows them to start making observations and predictions on a larger, day-to-day scale. That includes finding that Pittsburgh’s morning traffic was generally more congested on Tuesdays, Wednesdays and Thursdays, which could enable transportation agencies to better manage the morning commute. These kinds of observations — previously impossible, due to the inability to accurately predict morning conditions — may inform larger decisions in travel demand management, signal timing control and personal destination routing.

“This research leverages machine learning and big data to understand human behavior while preserving individual privacy,” said Qian. “It is super exciting to see this method leading to better predictions of morning commute traffic as late as 5 a.m., and I believe this can swiftly be deployed in many of our transportation management centers.”

/Public Release. This material comes from the originating organization and may be of a point-in-time nature, edited for clarity, style and length. View in full here.