Disruption to Indicator Risk Scoring

Incident Report for Recorded Future

Postmortem

Issue:

On Monday, June 20th, 2022, the Recorded Future risk scoring feature was affected by a bug introduced during a platform update that resulted in incorrect calculation of risk scores for many entities, across all entity types. As a result, false positives due to inflated suspicious and malicious risk scores on IOCs (indicators of compromise) were experienced across all parts of the platform including the Portal, Alerts, API, and our partner integrations.

Impact:

The effects of this issue causing inflated risk scores and resulting false positives were encountered in a variety of ways:

displayed in Intelligence Cards, both Portal and Mobile.
alerts pertaining to risk-score driven Intelligence Goal Library use cases
risk lists and indicator enrichment in third-party technology integrations

Although this incident affected all types of indicators in the platform, the total number of entities affected made up roughly 5% or less of each of our default risk lists for each entity type.

Root Cause:

As part of our product release during the morning of June 20th, there was an update to an external library used by our risk scoring code. The update in question included a breaking change that wasn’t highlighted in the library’s change log. As a result, risk rules were computed incorrectly.

Remediation:

The Recorded Future data science and platform engineering teams kicked off several processes to recover from this incident including both a manual correction effort to remediate individual entities, largely domains, that were reported as high-impact false positives and an automated process to reevaluate risk rules for all impacted entities. Our initial focus was on reprocessing and updating the risk scores related to entities that were inaccurately put on our default risk lists because of the error.

Monday, June 20th - Issue first reported in the morning EDT. By 12pm EDT, it was determined to be a systemic issue related to risk scoring. Manual fixes of individual, incorrect scores began.

Tuesday, June 21st - Early morning EDT, new false positives were stopped. The root cause was confirmed. Shortly after a fix was in place and systemic reprocessing of risk scores began.

Wednesday, June 22nd - Reprocessing of risk scores continued with priority given to domains incorrectly scored > 65 (malicious/very malicious level) to reduce further impact on integrations.

Friday, June 24th - By 2pm EDT, all domain risk scores and most IP and URL risk scores were corrected. Risk scores related to affected hashes and other entities were still being reprocessed.

Monday, June 27th - By afternoon EDT, all IPs and URL indicators with scores of 65 or higher had been fixed. This is in addition to domains with scores 65+, which were fixed previously.

Path Forward:

We are taking several actions as priority to prevent and mitigate impact of similar systemic issues related to risk scoring in the future. This includes steps to better avoid, identify, verify, revert, and recover more efficiently.

Optimize A/B testing to include the impact of new libraries on the risk scoring framework
Lower threshold for escalating data issues that would trigger reversion of deployment
Manual monitoring of risk score quality as backup to existing, automated alert monitoring
Adding flexibility to reprocessing of risk rules to speed up recovery on risk scores

Posted Jun 30, 2022 - 15:01 EDT

Resolved

Dear Client,

We're happy to say this incident is fully resolved, with entity and platform risk scoring functioning at optimal levels.

A post mortem will be made available for clients that require it via https://status.recordedfuture.com/#past-incidents , and can be obtained directly from your customer success consultant. Otherwise, please contact our support team at support@recordedfuture.com if you have any questions.

Regards,
Recorded Future Platform Operations

Posted Jun 30, 2022 - 14:58 EDT

Update

Dear Client,

We wanted to provide an update pertaining to the correction and recovery of risk scoring data affected by this service disruption. At this point in time, all domain indicators with false malicious level scoring have been corrected in the platform, with only a very small amount URLs and IPs to correct that are of a false malicious risk level. All affected hashes, as well as any remaining indicators with false malicious level scoring should be fixed over the weekend. Any other related issues, including false suspicious level risk indicators, should be completely corrected by early next week.

We will continue to provide updates as they become available, and a post mortem will be provided for those clients that require it. Please contact our support team at support@recordedfuture.com if you have any questions in the interim.

Regards,
Recorded Future Platform Operations

Posted Jun 24, 2022 - 14:01 EDT

Monitoring

Dear Client,

We have seen incidents pertaining to indicators in our platform reflecting false positive elevations in risk scoring, possibly to malicious / very malicious levels. Although this affects all types of indicators in the platform, the total number of entities affected make up roughly 5% or less of each of our default risk lists for each entity type.

At this point in time, we have fixed the defect that has caused this behavior, and have already started applying mitigation or removing incorrect risk scores where applicable. We expect corrections to be in place within the next 24 hours for the majority of high risk indicators, but we will continue monitor the rest of this week for any other issues, and fix all remaining discrepancies.

We will continue to provide updates as they become available. Please contact our support team at support@recordedfuture.com if you have any questions.

Regards,
Recorded Future Platform Operations

Posted Jun 21, 2022 - 13:54 EDT

This incident affected: Collection and Processing.