Wednesday, July 3, 2024
HomeBig DataGroup 1001: An Energetic Metadata Pioneer - Atlan

Group 1001: An Energetic Metadata Pioneer – Atlan


Demystifying the Energetic Metadata Administration Market

The Energetic Metadata Pioneers sequence options Atlan clients who’ve lately accomplished an intensive analysis of the Energetic Metadata Administration market. Paying ahead what you’ve realized to the subsequent knowledge chief is the true spirit of the Atlan neighborhood! In order that they’re right here to share their hard-earned perspective on an evolving market, what makes up their trendy knowledge stack, modern use instances for metadata, and extra!

On this version, we meet Gu Xie, Head of Knowledge Engineering at Group 1001 and two-time consumer of Atlan, who explains Group 1001’s distinctive construction and the way that impacts their knowledge wants, his hard-earned perspective on the lively metadata administration market, and the way he’ll use Atlan to drive productiveness and readability throughout his group.

This interview has been edited for brevity and readability.


Would you thoughts describing Group 1001 and your knowledge group?

Our group is the info engineering group. Group 1001 is an insurance coverage holding firm that truly is an umbrella firm of a number of totally different manufacturers, together with Delaware Life, Gainbridge, Clear Spring Life and Annuities, and a number of other others.

What we’re centered on inside our group is the annuity aspect of the enterprise. So we straight interface with our core coverage administration system for provisioning and dealing with the entire annuities enterprise. Our engineering group is accountable for guaranteeing that we are able to present analytics, whether or not or not it’s on the info that’s throughout the annuity aspect of the enterprise to our operations group, or from a gross sales perspective, or from a advertising and marketing perspective. 

Every enterprise is a bit bit totally different. Gainbridge is a direct-to-consumer enterprise model, whereas Delaware Life revolves round a extra monetary advisor-level enterprise the place we’re doing extra of B2B2C. So two totally different companies, totally different manufacturers, totally different merchandise, however we’re offering the breadth of analytics throughout these views.

And the way about you? Might you inform us a bit about your self, your background, and what drew you to Knowledge & Analytics?

I’ve been working in knowledge engineering and knowledge & analytics for the reason that very begin of my profession. I’ve been on this business for… gosh, I feel it’s about 11 plus years now. 

Proper out of school, I had a very good alternative to dive into the world of CRM, however ended up doing something however CRM and centered extra on the info itself. Whether or not or not it’s constructing out enterprise intelligence, doing report migrations, doing knowledge migrations, tons of labor when it comes to main knowledge warehouse groups, in addition to main and driving the modernization of recent knowledge & analytics platforms as organizations moved to the cloud. That’s the place I’ve constructed my core competency; actually enabling and stitching collectively this contemporary knowledge stack for a company, such that they’ll get actually complete knowledge & analytics capabilities with out hiring a large group.

So I’ve executed this earlier than in my prior group with a group of 40 plus engineers. In that group, we selected then carried out a conventional knowledge catalog, however spent a ton of engineering hours integrating it, then had hassle getting it adopted by shoppers and stewards. We weren’t very proud of it. Then we migrated to Atlan and had a lot better luck activating the info stack all of us constructed collectively.

Right here at Group 1001, we’re capable of construct a whole end-to-end knowledge analytics platform in beneath 10 months with a group of 4. That simply goes to indicate, you probably have a very robust psychological mannequin of this contemporary knowledge & analytics stack, and realizing the place your group might want to match and piece issues collectively, you don’t must have a large engineering group. You’ll be able to have a very small group that may actually construct and allow this. 

We’re leveraging a whole lot of CI/CD and automation, and on the similar time, are capable of get the advantages of the trendy knowledge stack, which is unbelievable end-to-end velocity from concept to perception. That’s the point of interest of the imaginative and prescient: Thought-to-insight, and getting velocity there.

What does your stack seem like? 

We have now knowledge sources whereby knowledge resides in databases, file logic storage, SaaS functions like Zendesk, Google Analytics, and Salesforce. We have now APIs, whether or not or not it’s inner APIs or occasions and logs. 

The way in which we began with this tech stack, we constructed round Snowflake as our core knowledge platform. We have been on GCP, so we did in depth POC between BigQuery in addition to Snowflake, and ended up selecting Snowflake. 

Then we ran right into a state of affairs whereby, “Okay, we have to replicate our knowledge into Snowflake,” as a result of up to now we have been constructing ETL pipelines ahead into Postgres initially, and it simply doesn’t scale. So we leveraged Fivetran as each our CDC replication in addition to SaaS replication. So we are able to entry the info from the database aspect of the fence, in addition to faucet into all of the totally different SaaS functions that Fivetran helps. So we are able to onboard Google Analytics, Zendesk, Google Adverts, in addition to Salesforce knowledge onto Snowflake to have that holistic centralization of all of our knowledge and belongings.

Then we additionally went down the trail of, “We have to mannequin and form this knowledge so we will be available for analytics and unify the info mannequin throughout our numerous strains of companies.” So we introduced in Coalesce as a result of that gave us the size, the standardization, the automation that we want in an effort to create the info fashions and form them for consumption. On prime of that, we introduced in Dagster as an orchestrator to totally change Airflow. After organising the infrastructure, one week later, three days after that, we migrated all 73 DAGs over to Dagster from Airflow. That was simply enormous.

We then even have Soda for constructing numerous knowledge high quality guidelines to make sure we’ve got all of the monitoring in place, and what the standard standards are, and integrity, completeness, freshness, these sorts of elements. We use Soda to allow our group to construct high quality guidelines. After which the place Atlan comes into the journey. We see it as a part of our knowledge administration suite. Soda from a high quality monitoring perspective, in addition to Atlan to allow knowledge discovery.

So an engineer, or an analyst, or perhaps a enterprise consumer can discover out what knowledge we’ve got within the group, who owns it, what it means, when it was final refreshed, and if it may be trusted. And likewise the place is it getting used and the way is it being sourced? Atlan offers that holistic image of that journey. 

When it comes to the analytical outputs, we use PowerBI in our present reporting platform. We additionally introduced in Sigma for embedded and exploratory analytics use instances.

Why did you want an Energetic Metadata answer?

That’s the toughest promote: “Why do we want a catalog answer? Why do we want an Energetic Metadata answer?” 

And the way in which I strategy this downside is simply because of the underlying want. Knowledge is all the time going to develop 2X each two years. That’s been the business development for the reason that Seventies. Knowledge grows twice each two years.

So the issue that I see is as extra knowledge grows, there’s extra metadata of that knowledge, and that may very well be within the type of extra database objects that you just’re going to create, extra recordsdata that it’s important to course of, extra sources that they ingest. Particularly if you embrace extra techniques that it’s important to assist, extra BI instruments that it’s important to allow, extra something. Take into consideration that, doubling the info. The metadata is a magnitude-like issue on prime of that.

One of many largest struggles in any knowledge group is answering inquiries to and from a enterprise consumer perspective, “How do I discover X, Y, Z knowledge? The place do I get this? The place do I discover this report?” And even when knowledge groups do have that, they’ll ask, “Effectively, the place’s it coming from? How do I get the underlying element of that data?” 

And when one thing goes unsuitable, which it inevitably will, “How do I troubleshoot that?” And my expertise is that if there’s one little column on that report in PowerBI that’s damaged, a consumer will come and ask me, “Okay, what occurred?” 

And I don’t know, so I’ve to dig in. So that you open up the report, and it’s an archeological train to excavate from the report back to the pipelines, to the info units, to the net supply knowledge to determine that out. 

That’s all the time been a problem. And that in my view, is the true technical debt that weighs on each single knowledge group on the market. It’s the truth that there’s by no means a great way of dealing with that metadata. And it rears its ugly head, identical to each tech debt does, within the type of the group spending 80% of their time doing this, answering questions concerning the knowledge, determining how folks get entry to knowledge, and troubleshooting.

I’ve seen the info groups can spend upwards of 80% of their time in reactive mode. And in the event you common it out, I’ve seen it’s normally a couple of good 40% or 50% of their time is spent answering questions. And that could be a elementary sink throughout all developer productiveness within the group. 

How do you get extra velocity? That’s the place Atlan comes into play. Perhaps we are able to allow a enterprise consumer to reply the query themselves, or somebody like an information analyst would be capable of reply a query with out involving engineering groups. 

An engineering group can then deal with what they’re actually purported to do: Purchase extra knowledge, allow extra insights, and sit down with the enterprise customers that may assist collaborate in that dialogue about, “Hey, I’ve this concept, how do I allow this perception?” Quite than spending time answering the query of, “What went unsuitable right here?” In order that’s the way in which I see it, that’s the necessity, and to promote that want will be tough.

I introduced in Atlan as a result of it can assist our group be higher at dealing with knowledge. As soon as we onboard Atlan, that’s the productiveness I wish to get to, groups spending much less time answering questions, and spending extra time collaborating on knowledge. 

We’re additionally utilizing Atlan as a method of making an authoritative set of datasets so customers would know which knowledge they’ll belief and use. We’re increasing our group to collaborate with different enterprise teams such that they’ll self-service their knowledge analytics and Atlan might be key to allow the collaboration mannequin between engineering and enterprise.

What made Atlan stand out out there to you and your group?

Right here’s the issue that I see within the market. Each single catalog answer appears centered on simply the catalog, or they deal with different product strains which might be extensions of the catalog. Within the case of conventional knowledge catalogs like Alation, they deal with the truth that, “Hey, you’ll be able to democratize knowledge stewardship throughout the group. Your entire group may very well be stewarding knowledge.” That was the genesis of it. So it’s the Wikipedia strategy of knowledge stewardship. 

The truth is, there’s no group on the market that has an information steward. Perhaps in a big group you’ve a number of of them, however that’s not a task that you just wish to rent. What’s the worth add, what’s the ROI for the info group, or from an information governance perspective?

Up to now, I labored at a big Monetary Providers agency, and we skilled all of the challenges concerned with a conventional catalog. We might spend a ton of engineering hours integrating to our current techniques, after which we would want a military of knowledge stewards to construct and preserve every little thing.

The truth with this strategy is that you just’re forcing knowledge stewardship throughout each group and so they simply don’t have the bandwidth to do it. That’s why I noticed an enormous retraction from Alation, with folks going to make use of Confluence pages as a result of it’s simply simpler to edit Confluence than to replace a catalog. 

So I knew there needed to be a greater strategy to this downside, and that’s after I got here throughout this text about “Knowledge Catalog 3.0” by Prukalpa, and I used to be intrigued by this new strategy. And I selected Atlan not simply now, for Group 1001, however again in my earlier position, too.

So one of many primary the explanation why I selected Atlan is that Atlan is concentrated on a really robust mission. That’s the core of it. Sure, it’s Energetic Metadata Administration, however the actual kicker of that’s Atlan’s imaginative and prescient is knowledge collaboration between engineering, analysts and enterprise groups.

Alation will not be that. Their enterprise mannequin is to catalog the info of their system, and that method they might promote you on the Composer (a SQL editor). That’s the bread-and-butter moneymaker, from what I’ve seen. Their core product of enabling the cataloging answer? They’ve by no means improved, and so they deal with Composer. I didn’t like that from a product improvement perspective. 

And with Atlan, I see their journey is de facto enabling collaboration with knowledge, whether or not or not it’s simplifying the quantity of labor from an engineering perspective to onboard the varied knowledge instruments into Atlan. Or if it’s from an analyst perspective, having the ability to see the net knowledge units, see the lineage and leverage it, understanding the place a dataset has been, or integrating Slack to allow that communication about knowledge throughout the group.

In order that’s what I focus extra on, primary, is the product imaginative and prescient and what their primary mission is. And secondarily, on prime of it, is simply seeing the proof within the pudding, the developer velocity.

I do know that in my earlier group we spent a ton of engineering hours to combine our current techniques to a conventional knowledge catalog. With Atlan, I used to be capable of get Group 1001 up and operating in beneath two hours. So simply the developer velocity of not having to spend all that point configuring and constructing integrations as a result of Atlan has out-of-the-box integrations to a whole lot of the core trendy knowledge stacks? That is enormous. 

We might focus extra on the higher-value ask, and the higher-value ask is to allow higher collaboration throughout the group round knowledge. That’s the actual cause why I selected Atlan.

What do you propose on creating with Atlan? Do you’ve an concept of what use instances you’ll construct, and the worth you’ll drive?

The use case that we’ve got Atlan utilizing proper now will not be the one use case that we ultimately wish to construct sooner or later. And the rationale why is correct now, we’re actually centered on our core analytics stack, which includes Snowflake, Fivetran, Coalesce, Dagster, and the like. Positive, Atlan will resolve that, however how can we prolong Atlan throughout the enterprise? So enabling cross-enterprise knowledge governance, a holistic view of our enterprise’s knowledge belongings, monitoring PII and making use of governance and insurance policies associated to it. 

Any new enterprise that we’re onboarding can include their very own knowledge stack. So one of many core elements from an information technique perspective, is that we are able to leverage Atlan as a central governance framework. That every one organizations will publish knowledge belongings into Atlan to have one, holistic umbrella.

One other key use-case is enabling self-service of analytics throughout our group. We plan to leverage Atlan to doc our newly curated knowledge so different departments can uncover, perceive what the dataset is, how you can use it, and whether or not they can belief the knowledge. This might be key to facilitating the collaboration with knowledge and enabling our group to be knowledge centric.

Picture by Benjamin Little one on Unsplash

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments