Artificial Intelligence (AI) is everywhere and it is gradually hitting newsrooms and society as a whole. PHOTO/ MAXRESDEFAULT
By TECH CORRESPONDENT
To deal with the sheer volume of information and gain competitive advantage, the news industry has started to explore and invest in news automation.
In this area, Reuters News Agency is taking the lead through its news gathering innovation dubbed, Reuters Tracer; an automated news gathering system that uses large scale social media data.
The advent of the internet and the subsequent information explosion has made it increasingly challenging for journalists to produce news accurately and swiftly.
For Reuters, the problem has been made more acute by the emergence of fake news as an important factor in distorting the perception of events.
Nevertheless, news agencies such as the Associated Press have moved ahead with automated news writing services.
So there is significant pressure on other news agencies to automate news production. And today, Reuters outlines how it has almost entirely automated the identification of breaking news stories.
Xiaomo Liu and pals at Reuters Research and Development and Alibaba say the new system performs well. Indeed, it has the potential to revolutionize the news business. But it also raises concerns about how such a system could be gamed by malicious actors.
The new system is called Reuters Tracer. It uses Twitter as a kind of global sensor that records news events as they are happening. The system then uses various kinds of data mining and machine learning to pick out the most relevant events, determine their topic, rank their priority, and write a headline and a summary. The news is then distributed around the company’s global news wire.
The first step in the process is to siphon the Twitter data stream. Tracer examines about 12 million tweets a day, 2 percent of the total. Half of these are sampled at random; the other half come from a list of Twitter accounts curated by Reuters’s human journalists. They include the accounts of other news organizations, significant companies, influential individuals, and so on.
The next stage is to determine when a news event has occurred. Tracer does this by assuming that an event has occurred if several people start talking about it at once. So it uses a clustering algorithm to find these conversations.
Of course, these clusters include spam, advertisements, ordinary chat, and so on. Only some of them refer to newsworthy events.
Thomson Reuters has employed research scientists who are developing Artificial Intelligence solutions for its newsrooms. PHOTO/COURTESY
So the next stage is to classify and prioritize the events. Tracer uses a number of algorithms to do this. The first identifies the topic of the conversation. It then compares this with a database of topics that the Reuters team has gathered from tweets produced by 31 official news accounts, such as @CNN, @BBCBreaking, and @nytimes as well as news aggregators like @BreakingNews.
At this stage, the algorithm also determines the location of the event using a database of cities and location-based keywords.
Recommended for You
- Progress in AI isn’t as Impressive as You Might Think
- Blink and You’ll Miss How Fast This Souped-Up 3-D Printer Makes Prototypes
- How Do You Get a House in a Steep Valley Forest Online? With a Drone
- Lidar Just Got Way Better—But It’s Still Too Expensive for Your Car
- Artificial Intelligence Can Translate Languages Without a Dictionary
Once a conversation or rumor is potentially identified as news, an important consideration is its veracity. To determine this, Tracer looks for the source by identifying the earliest tweet in the conversation that mentions the topic and any sites it points to. It then consults a database listing known producers of fake news, such as the National Report, or satirical news sites such as The Onion.
Finally, the system writes a headline and summary and distributes the news throughout the Reuters organization.
During trials, the Reuters team say, the system has performed well. “Tracer is able to achieve competitive precision, recall, timeliness, and veracity on news detection and delivery,” they say.
And they have stats to back this up. The system processes 12 million tweets every day, rejecting almost 80 percent of them as noise. The rest fall into about 6,000 clusters that the system categorizes as different types of news events. That’s all done by 13 servers running 10 different algorithms.
By comparison, Reuters employs some 2,500 journalists around the world who together generate about 3,000 news alerts every day, using a variety of sources, including Twitter. Of these, around 250 are written up as news stories.
Reuters compared the stories that Tracer identifies with those that appear in the news feeds of organizations like the BBC and CNN. “The results indicate Tracer can cover about 70 percent of news stories with 2 percent of Twitter data,” say Lui and co.
And the system certainly works quickly. The team highlight the example of the Las Vegas shooting in October 2017, which left 58 people dead. A witness reported the incident at 1:22 a.m., which triggered a Tracer cluster. However, the cluster did not meet the system’s criteria for an event to be included in the news feed until 1:39 a.m. “Reuters reported the incident at 1:49 a.m.,” say Lui and co.
That’s interesting work that raises a number of questions, especially about how easy the system is to manipulate. It’s not hard to imagine malicious actors designing Twitter feeds with the specific intent of fooling Tracer.
But whether this system will be easier to game than the current one, in which humans are regularly tricked, is hard to say.
Then there is the role of humans in the news business. The future of news is clearly one of increasing automation. How humans fit in is yet to be determined.
THIS STORY WAS FIRST PUBLISHED BY MIT TECHNOLOGY REVIEW