Device Defender
AMAZON WEB SERVICES (AWS) INTERNET OF THINGS (IoT)
Summary
My role
As the design lead I directed (but did not manage) a team of 2 designers plus an external user research agency, and was accountable for both the quality and velocity of the design team. I partnered with the Product Management Lead and Engineering Manager throughout the process to define and refine product requirements.
In addition to managing the relationship with the external agency I was a hands-on contributor to process and requirements definition, interaction design, and UI design.
Internet of Things (IoT) fleets are difficult to secure on an ongoing basis and are an attractive target for hackers. Amazon Web Services (AWS) Device Defender provided tools to identify security issues and deviations from best practices:
Design team
- Design lead (me)
- Sr UX designer
- Jr UX designer
- External agency for user research
TL;DR
WHO: The user
IoT Fleet Managers, Security Operations, and Security Architects responsible for a large fleet of IoT devices.
WHY: The problem
IoT fleets are difficult to secure on an ongoing basis and are an attractive target for hackers. Existing network security tools were insufficient for the specific threat profiles of internet-connected devices.
HOW: The process
Co-design with expert users across Amazon within a SCRUM framework. This project was a pilot for a cross-business-unit UX quality control and approval process.
WHAT: The solution
The first two modules (Audit and Detect) of an IoT-native security solution (Device Defender) integrated into the AWS IoT platform UX.
Business problem
IoT fleets often consist of large numbers of devices that have diverse capabilities, are long-lived, and are geographically distributed. These characteristics make fleet setup complex and error-prone. And since devices are often constrained in computational power, memory, and storage capabilities, this limits the use of encryption and other forms of security on the devices themselves. Also, devices often use software with known vulnerabilities. The combination of these factors makes IoT fleets an attractive target for hackers and makes it difficult to secure them on an ongoing basis.
About Device Defender
AWS IoT Device Defender enables customers to secure their IoT fleets on an ongoing basis by providing them the tools to identify security issues and respond to security breaches quickly before they cascade to other devices within their fleet.
Continuous auditing
Monitors device related policies to ensure proper security settings are in place. Device Defender detects any drifts from security best practices or access policies. Customers can run audits on a need basis or schedule them to be run periodically.
Fast investigation and mitigation
Enables customers to investigate alerts by providing contextual information about the alert such as device information, diagnostic logs and historical alerts for the device.
Real-time detection and alerting
Detects changes in connection pattern, devices communicating to unauthorized or unrecognized endpoints, and changes in inbound and outbound device traffic patterns.
Device Defender integrates with the AWS IoT Connected Device Management (CDM) service, allowing customers to perform actions such as revoke permissions, reboot a device, reset it to factory defaults, or push security fixes.
Device Defender: Audit
Audit monitors device related policies to ensure proper security settings are in place.
Best practices
Device Defender Audit runs a set of pre-defined rules — mapped to common IoT security best practices and vulnerability definitions — against a customer’s fleet.
Focus on noncompliant resources
Security professionals care most about active or emerging issues. Device Defender Audit identifies noncompliant resources and provides context for the scale of the issue.
Continuous auditing
Continuous auditing ensures that the current security posture of the device fleet is known, good, and trusted. Customers can run audits on a need basis or schedule them to be run periodically.
Mitigation
Device Defender Audit identifies noncompliant resources and enables the user to investigate down to the individual resource. For each type of compliance issue, Audit provides a suggested mitigation action.
Device Defender: Detect
Detect monitors the fleet and identifies abnormal behavior on devices, and provides device management tools to investigate and mitigate security issues.
Define security profiles
Security profiles contain a set of behaviors that describe how the device should be operating, e.g., “packets out less than 150 bytes in 5 minutes.”
Detect anomalies and publish alerts
Once Detect is configured to monitor a group of devices, it starts recording security-related attributes (e.g., connection attempts, bytes/sec/protocol, set of open ports). When a violation occurs, the system sends an alert via CloudWatch events, SNS notifications, or 3rd party systems.
Assign profiles to groups of devices
Security profiles are attached to new or existing groups of devices. Devices in those groups will be monitored for compliance to the behaviors defined in the attached security profiles. Security profiles may also be attached to the account, where they will apply to all devices in the fleet.
Investigate and mitigate device behavior
Alerts can contain device information, device statistics (e.g. last connection time, number of active connections, data transfer rate), and historical events for the device. Within the console, the operator can both view broad trends and drill down into the details of a specific device.
Personas
There are eight members of the IoT Platform Persona Family; three of them are relevant to Device Defender.
Fleet Manager
The Fleet Manager keeps the fleet of devices updated and operating smoothly.
Security Ops
Security Ops makes sure everything is functioning correctly from a security perspective and sounds the alarm when it’s not.
Security Architect
The Security Architect is responsible for maintaining the security of the systems. They must anticipate all of the moves and tactics that hackers will try to gain unauthorized access,
How personas interact
The Fleet Manager is the first line of defense. They monitor all aspects of the fleet and can escalate quickly to Security Ops if they see something concerning.
The relationship between Security Ops and the Security Architect is more complex. The Security Architect sets up the security parameters while Security Ops monitors devices and remediates issues. The Security Architect also serves as an expert consultant and escalation path for issues.
Walkthrough
The Device Defender product is immensely complex. In this case study I elide most of the Audit functionality and focus on only two of the six core tasks of the Detect module.
Glossary
Audit
This is the system flow for the Audit module. The product is conceptually simple: Security Ops creates an audit by selecting the security checks they want performed and then setting the schedule for the Audit to run.
Detect
There are six core user tasks within the Detect module.
Onboard & configure – Security engineer [primary task]
Task 1: Create a security profile
Monitor & Investigate – Security operations [primary task]
Task 2: Investigate violations
Update configuration – Security engineer [secondary tasks]
For brevity, I’ve included only the primary tasks for the Security Architect and Security Operations.
Task 1: Create a security profile (Security Architect)
Creating a security profile requires the following steps:
The Security Architect’s journey begins on the AWS IoT platform’s dashboard / home page.
They navigate to the Detect module. For this walkthrough this is the first use of Device Defender so the post-tour landing page prompts the user to create a security profile.
They give the profile a name and description, and begin defining their first behaviour. They choose the metric (in this case, Bytes out)
the operator,
the value, and duration.
This profile requires multiple behaviors, so they click the button to add another
They can also define behaviors in JSON if they prefer.
Once they’re finished with behaviors, they click Next to configure an SNS alert
by choosing a topic
and an IAM role
The next step is to attach the profile to either a specific Thing group or to all devices.
The creation of a Thing group has multiple steps (define group attributes, find relevant Things, add Things to the group) so we’ll elide that process and skip to where groups have been selected. The Security Architect confirms the security profile settings
And a new security profile has been created and attached to the 4 selected Thing groups.
They navigate to the Security Profiles hub, from which they can create the remaining security profiles necessary to configure Detect for their needs.
Task 2: Investigate violations (Security Operations)
The specific actions taken to investigate a violation vary widely, but generally there are three steps:
Security Operations’ journey begins on the AWS IoT platform’s dashboard / home page. They see that there are active alerts
and expand the Alerts panel. (Alerts did not exist in the IoT platform prior to Device Defender; we added the icon and created the panel as a new interaction pattern in the IoT design system library.) They see that there are 18 violations against the security profile traffic-shanghai and click on the link.
This opens the details page for the security profile, displaying the “Now” tab Violations section. They see 17 Things are in alarm, and hover over the timestamp for the Thing humidifier-88 to see that it has been in alarm for 2 minutes and 45 seconds.
They hover over the behaviour to remind themself of the underlying rule that is being violated. They click on the “History” tab
and see a summary of the Violation events for this security profile. They return to the “Now” tab
and click on the Thing name.
The Thing details page loads in the same context (Violations > Now). Humidifier-88 is actually violating three behaviors on two different security profiles. They click on the “History” tab
and see the prior pattern of violations for that single Thing. It’s important that Security Operations check both the profile that is violated and individual violating Things, as root causes can be complex. They return to the Violations hub (Violations > History)
and filter on the traffic-shanghai security profile.
In this Hub view they can interact with the data via filters and visualization via clicking on the timeline; the system keeps both views in sync.
Based on their analysis, Security Operations can take various actions at either the security profile level from the security profile detail page (e.g., upgrade firmware, take all devices offline) or at the individual Thing level (e.g., reboot device) from the Thing detail page.
Process
Discover
Device Defender was a project defined by AWS leadership based on a market opportunity. Foundational user research specific to the security space was conducted by an external agency before a design team was assigned to the project.
Research questions
Findings
Iterate
Co-design and evaluative research
While we (the AWS design team in partnership with the external agency) conducted cycles of co-design and evaluative user research for both Audit and Detect, for brevity I’ve included only some of the findings for the Detect module.
The research question was simple: Does this service meet your needs? Does it work the way you expect?
Themes
Applying research findings
We tracked findings and how user feedback has changed our solution. We layered in these insights and impacts into our persona task matrix.
We also acknowledged larger issues uncovered in user feedback and identified how they will be addressed in future.
Finding
Impact
Every participant successfully completed all tasks, and reported minimal friction. The core mental models (security profile, behavior, profile attached to a group, violation event) were well understood.
Changes to the UX based on research findings were straightforward refinements; most of the feedback will be used to define and prioritize future enhancements of the tool.
Participants requested additional flexibility in defining behaviors, including conditional operators (AND/OR) and custom metrics and values.
Custom metrics were already in the backlog. Currently behaviors are evaluated as independent entities; conditional operators will be further evaluated post-GA.
In AWS IoT customers organize their fleets via Thing groups. Participants expected to Thing group as a dimension by which to evaluate relationships between violations.
Given foundational technical constructs of the AWS IoT service, integrating an awareness of Thing groups into Detect would be non-trivial. Deeper investigation is required.
For every data table they encountered, participants expected to be able to sort, search, and filter by every column.
Additional APIs to provide filter, sort, and search capabilities are in the post-GA backlog.
Participants expected long-term access to every byte of data any device in their fleet had emitted.
Due to operating costs, the Detect service is structured to store only data related to behavior violations, and only for a relatively short period of time. This may be an opportunity for a premium offering. Deeper investigation is required.
Approvals
Device Defender was the first large project to go through the AWS Design Leadership’s QA process, and was specifically selected as a pilot because of my seniority and prior experience at IBM.
There were four checkpoints in the process, with the Design Leadership board enforcing a go / no go release gate based on a final fit and finish review on the live system. Each checkpoint had a structured agenda; while PM and Engineering were encouraged to attend, the Design Lead was accountable for presenting their service to the Board.
The Design Leadership board for any one product was composed of the design system and content leads, plus at least two design managers and two design leads. The same group of people reviewed every step of the process for a product. I served on the board for the initial launch of Elastic Container Service (ECS) and a major enhancement to Athena.
Deep Dive
The Deep Dive presentation was intended to give the Design Leadership board context on the project: its business value, personas, and user value, as well as a summary of any discovery work thus far. The Deep Dive was mandatory only for new Tier 1 products (high priority with significant strategic value), although it was encouraged for major new functionality in existing Tier 1 and Tier 2 products. Device Defender was a Tier 1 product.
Agenda
UX Sign Off
UX Sign Off was intended to be the gate before significant development work began on the product. In reality, most Tier 1 products were very far along in the development cycle, as design teams tended to be assigned after a proof of concept or alpha product was already built.
In practice the UX Sign off typically generated mandatory and optional feedback that required either a follow-up presentation (for major issues) or email responses (for minor issues). All major releases of a product were required to pass a UX Sign Off before the Fit & Finish review was scheduled.
Agenda
Below are some examples of feedback (re: the Violations Hub) from the Deep Dive that were addressed for UX Sign Off:
Feedback
Outcome
The list view should have a time window attached to it.
The views of violations have been refactored to explicitly identify the time window for the displayed violations.
[In the data viz] The default view has “today” highlighted – the highlight state should match the selected time window.
User research identified that any interaction with the data viz should be reflected in the data table below (and vice versa), including filters on time and content.
Once a user has zeroed in on a specific device or time window, the system should support the transfer of the contextual data as the user navigates away from this page to take action
To the extent possible, the interaction model has been refined to maintain context (e.g., resource details, time windows, filter selections) across pages within the UI. As additional APIs are developed post-GA, transfer of context will be enhanced.
Fit & Finish
The Fit & Finish review was a hard gate on product release. While the Design Lead presented the front matter, the core of the presentation was a demo of the live system on staging by the Engineering Lead or Product Manager. Any issues identified in the Fit & Finish review were either remediated before launch (as proved via video clips) or put in the backlog as mandatory enhancements for the next release.
Agenda