User Activity Monitoring and User Behavior Analytics for Enterprise Security
Sensitive data (Customer data – PII, PCI, PHI, Intellectual property etc.) is the most important asset of a company. Lack or even lag in the ability to detect and respond to security threats can be costly for businesses of all sizes.
Challenges for securing today’s IT environment
- The attack surface and threat is ever increasing
- Conventional security techniques are typically focused on the perimeter, so they fail to identify insider threats or attacks in progress within the network
- Plethora of data sources – structured and unstructured, new relationships, new entities
- Companies lack visibility into employee activity and usage across critical IT systems
- Time-consuming threat remediation
- Achieving auditable compliance across IT regulatory frameworks is not an easy task
- Deploying advanced, large-scale analytics using machine learning and anomaly detection will be the most valued tools for mitigating internal threat and achieving compliance.
At Informatica, we are focused on protecting our customer environment from internal threats and it is challenging to choose techniques that can be used to detect and address threats in a scalable way.
Also, security is a process, not an event. So, it is not just about detecting threat; rather, it’s about changing the paradigm of how emerging threats are addressed!
Using log data to detect threats
- Identifying security incidents
- Monitoring policy violations
- Establishing baselines
Database logs are parsed, enriched and analyzed to detect threats. These logs have many fields, out of which a few are important from a security perspective – User, Host, IP address, Resource, Action, Timestamp.
Other approach is to analyze activity on network. This blog covers analysis at data store access points (example databases, file repositories, applications, etc.)
The pipeline involves capturing all the relevant data, and then being able to put it together in an applicable processing environment. This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of, it is reasonable to work on infrastructure to process this data in various ways – Spark, query systems etc.
Data collection and processing
Data collection framework should be chosen considering the following requirements:
- High guarantee that the service will be available and non-blocking under any circumstances
- High throughput
- Moderate resource usage
- Ability to replay messages and ease of replication for higher data availability
Data Processing involves correlating user activities against sensitive data.
- Parser should be able to handle various formats in a performant way, and consider hierarchical structures. Should support query parsing on a plethora of data sources.
- Some of the log data might be meaningless unless it is enriched with information from external data. For instance location can be figured out using IP lookup.
- Establish relationship with other types of data such as User Privilege to enable specific type of analytics. User, group and access information can be synced from LDAP (Active Directory, IBM Tivoli)
- Identity resolution – mapping multiple application user names to a single enterprise user name.
Detecting abnormal behavior
A platform that looks at all the events in an organization can build baselines that differentiate between business as usual and catastrophic loss of data. This addresses the big challenge of security personnel having to go through a huge log data constantly emerging from complete IT infrastructure or manually filtering the true threats from large number of false positives generated by SIEM tools.
- To reduce false positives, intelligent solutions should examine elements beyond log entries and incorporate real world knowledge of the environment.
For example, increased data access during year-end audit by a user in a finance department might be a routine whereas such activity from a user in non-finance department who has never performed such an action before might be flagged as an anomaly.
- Theft patterns are ever evolving and hackers might alter their ways to avoid the usual pattern. So, malicious activity can still be within well-defined limits. Analytics founded on base-lining of past behavior to spotlight current and future threats is a core feature. A good technique can bring down the time/events needed to establish the baseline.
- Modeling needs complete understanding of a hacker’s mind. A good solution should be able to detect.
– Compromised accounts (perhaps relocation with credential usage)
– Snooping (collecting data which was never accessed before)
– Hoarding (disproportionate amounts of data)
– Scanning (looks at many sources the with goal of understanding the information within)
– Ground-speed violation (detect events which are geographically spaced apart but occur within seconds/minutes of each other)
- Should facilitate deep-dive into the incidents. Analysis time window should be configurable
- High detection accuracy to operationalize more timely protection or remediation to potential threats.
- Threat report should be complete (Context of resource, host, IP, user, domains involved).
Informatica Secure@Source applies analytics at data store access points thus, can provide additional context that is not available at the network and endpoint layer. Due to this, it can better baseline an insider’s normal vs. unusual behavior, detecting malicious insiders as well as activities from stolen credentials, which would not be apparent from just analyzing activities on the network and achieves high detection accuracy.
Alerting in real-time
An engine that provides security incident awareness in real-time to enable responding to suspicious user activities will be a valuable feature. Alerting mechanisms may be as simple as sending alert pop-ups or emails.
User’s risk score metric is calculated based on events associated with a user. This helps organizations keep track of who their insecure and privileged users are.
Creating a standardized basis for measuring risk, communicating status, and making decisions gives organizations a solid foundation for defining security objectives and allows prioritization of data protection efforts, efficient allocation of resources and monitoring change or progress.
Data Security controls are often platform-specific or technology-specific (for example, addressing distributed platforms but not mainframe or midrange platforms) and therefore leave gaps. A platform for aggregating data from multiple sources across a modern enterprise (cloud, hadoop, unstructured, traditional databases) is needed to get the clearest picture of potential threats to an enterprise environment.
While anomaly detection enables security and incident response teams to identify insider threats and compromised accounts quickly, it needs to be augmented by remediation action for risk management.