⌂ Home

Amazon Macie — Sensitive Data Discovery

Machine learning–assisted discovery of sensitive data in Amazon S3

What is Macie?

Amazon Macie is a data security and privacy service that uses machine learning and pattern matching to discover and protect sensitive data stored in Amazon S3. It helps teams find buckets that are overly exposed, classify objects that may contain PII/PHI/financial data, and generate findings for governance workflows.

Machine learning for PII / PHI detection

Macie combines managed data identifiers (built-in detectors for many global identifier types) with custom data identifiers (regex + optional keywords) tuned to your business. ML-assisted models improve recall/precision on semi-structured and unstructured content such as logs, exports, and documents.

  • Examples of categories: government IDs, credit card numbers, credentials, health information (region and feature dependent).
  • Findings include location (bucket, object), severity, and sample evidence metadata appropriate for analyst review.

S3 bucket inventory

Macie maintains an inventory of buckets and evaluates their security and privacy posture: public access settings, encryption defaults, sharing, and policy conditions. Use this inventory to drive data perimeter projects and to prioritize scans for high-value buckets.

Finding types (representative)

TypeDescription
Policy / accessBucket becomes public, ACL changes, risky cross-account access.
Sensitive dataObjects matching PII/PHI/financial or custom identifiers.
AnomaliesUnusual data volume or access patterns (where enabled).
EncryptionObjects stored without expected encryption context.

Alerting and automation

Send Macie findings to Amazon EventBridge for SNS, Slack, Jira, or Lambda-driven response: tighten bucket policies, trigger object tagging, or open a governance ticket. Pair with Step Functions for human-in-the-loop approvals before remediation.

Integration with AWS Security Hub

Macie integrates with Security Hub so sensitive-data and policy findings appear beside GuardDuty and Config results. Security teams can build a single severity-ranked queue and track mean-time-to-remediate for data exposure classes.

Use cases