Machine learning–assisted discovery of sensitive data in Amazon S3
Amazon Macie is a data security and privacy service that uses machine learning and pattern matching to discover and protect sensitive data stored in Amazon S3. It helps teams find buckets that are overly exposed, classify objects that may contain PII/PHI/financial data, and generate findings for governance workflows.
Macie combines managed data identifiers (built-in detectors for many global identifier types) with custom data identifiers (regex + optional keywords) tuned to your business. ML-assisted models improve recall/precision on semi-structured and unstructured content such as logs, exports, and documents.
Macie maintains an inventory of buckets and evaluates their security and privacy posture: public access settings, encryption defaults, sharing, and policy conditions. Use this inventory to drive data perimeter projects and to prioritize scans for high-value buckets.
| Type | Description |
|---|---|
| Policy / access | Bucket becomes public, ACL changes, risky cross-account access. |
| Sensitive data | Objects matching PII/PHI/financial or custom identifiers. |
| Anomalies | Unusual data volume or access patterns (where enabled). |
| Encryption | Objects stored without expected encryption context. |
Send Macie findings to Amazon EventBridge for SNS, Slack, Jira, or Lambda-driven response: tighten bucket policies, trigger object tagging, or open a governance ticket. Pair with Step Functions for human-in-the-loop approvals before remediation.
Macie integrates with Security Hub so sensitive-data and policy findings appear beside GuardDuty and Config results. Security teams can build a single severity-ranked queue and track mean-time-to-remediate for data exposure classes.