The crazy IMPACT of the Data Lake
When we began building the security data lake solution, we had no idea that this solution would evolve and meet so many important needs for the enterprise. Most importantly, we found the solution is changing the way we approach security engineering. We didn’t anticipate that we would be able to bring together multiple IT silos and security operations as well data and operations/infrastructure teams. It has been fascinating to see security teams recognize the potential of the cloud, especially the concept of building a data lake to support all aspects of their business at a cost that is easy to adopt.
Initially, we set out to solve two BIG problems:
Provide a solution that increases data aggregation and gives more visibility to the SOC analyst, while maintaining the security value of the data. Eliminate any reason for not logging a particular source.
Reduce the cost of SIEM by moving high volume, low value logs out of the system and into a storage account. This allows customers to allocate their security budget toward other important solutions, such as Microsoft Defender for Cloud, which can cover all workloads both on-premises and in the cloud.
What happened from there was truly magical. Here is a summary of the technical and organizational IMPACTS this is making.
Technical Lessons of Data Lake
We decoupled COMPUTE from STORAGE to provide unprecedented control of costs, role-based access to data, and to enable data analytics when needed.
We reduced the cost of data aggregation by 50-70%. This allows for increased spend for defense in depth tools such as Cloud Workload Protection Platform and External Attack Surface monitoring.
We learned the importance of structured data (data warehouse) vs unstructured data (data lake) and the need to have both. ETL is a critical path for security operations and the entire enterprise.
We taught security and infrastructure ops teams how to use powerful cloud tools, such as Azure Data Explorer, for super-fast querying of data lake.
Organizational Lessons of Data Lake
We thought it was about Security Data Lake but it turned almost instantly into needs of the enterprise that is data warehousing and analytics serves everyone not just security.
We provided enterprises with the ability to build a data estate that leverages Artificial Intelligence and Machine Learning, Data/Knowledge Mining, and Anomaly Analytics.
We provided a way to optimize data retention for long-term storage needs, like compliance or keeping data for historical security forensics and analysis. IT operations now has a place for ALL their logs!
We brought together data scientists/engineers, operations, and security teams to build a OneLake data capability. Shared access to data is very important.
We talked to CISOs across industries who affirmed this vision and recognized the importance of big data analytics for the future of security.
Security engineers realized that their roles and skills need to rapidly adapt to become “Data Engineers with a Security Specialty”.
Disconnecting your security data from your threat analytics and automation platform (SIEM) is not a good idea. Having the SIEM leverage the data lake and data lake leveraging the SIEM always for better security value.
Gave compliance and business analyst access to enterprise data for real-time reporting and business intelligence.
Finally, we demonstrated the true power of Microsoft Azure Cloud and its wide range of tools especially Azure Open AI and Microsoft Fabric. This power highlights the importance of being able to create a solution that does not silo or constrain the enterprise. We encourage you to explore and try out these capabilities.
Azure Data Explorer – Powerful, scalable compute when you need it. Search with KQL like you do in log analytics.
Azure Storage Account Generation 2 – Cloud storage with so many controls from segmentation, RBAC controls to life cycling data retention.
Event Hub and Stream Analytics – Loading and transforming data into a data lake makes your data structured which is the key to unlocking its value for analytics and retention.
Microsoft Fabric – SaaS platform for all things data. The future of data analytics platforms and muti-cloud data analytics.
We know the idea of applying data science and artificial intelligence can seem daunting, but technology can meet you where you are. Whether you are new to building your first SIEM or looking to replace your legacy logging solutions, now is the time to re-envision the future of your SOC. The biggest takeaway is that you control your data, and you can build your data estate to harness the power of our AI future.
What are we doing next, you ask? Stay tuned for more information on how to leverage these learnings to set yourself up for success in the artificial intelligence revolution to come. Read More about Data Lake and Security in this series.