Examining Continuous Data Delivery (CDD Part 2)

If you google the term ‘Continuous Data Delivery’, the search engine results come up pretty thin. This sparsity of information and research on the subject is due to the present evolution happening right now in this field.

Since the term does already exist on Google, at Cherre we know we can’t go so far as to say we’ve coined the phrase. However, we believe our current work is laying the groundwork in clarifying the concept and clearly outlining how to establish the blueprint foundations for a successful Continuous Data Delivery pipeline. 

Continuous Data Delivery facilitates the ambition of achieving data availability and continuity through the core components of Continuous Delivery for pushing code deployments reliably and consistently. 

The Importance of Data

Data availability is an essential competitive edge in today’s business market regardless of an organization’s size or sector. Ensuring the high-availability of mission-critical data by decreasing the space between a user and many data sources can be the thin line between success and market failure.

2020 has seen an incredible move online in response to COVID-19 for many different business sectors.  According to April 2020 research by Domo, 59% of the world’s population are accessing the internet daily, with 4.57 billion active users—that’s roughly a 3% increase from January 2019. This infographic highlights the staggering amount of data being generated worldwide by the minute and demonstrating how data never sleeps.

#data never sleeps

Domo Resource – Data Never Sleeps 8.0. (2020)

All applications are driven by data whether it is an entertainment, eCommerce or health app. As such, the business quest to provide an ever more personalized user experience is seeing a major surge in the adoption of ‘Big Data’. As of 2019, 53% of organizations already used big data technologies, and 38% plan to use it in the future. 

#bigdata technology

(Statista, 2020)

In 2018, the Global Big Data Analytics Market was worth US$ 37.34 billion in 2018 and research forecasts it to reach an expected value of US$ 105.08 billion by 2027. The increasing volume of data and increased adoption of Big Data tools are predicted to be the big drivers of this estimated revenue growth during the forecast period. (Research and Markets, 2020)

Big Data Vs. Wide Data

The term Big Data is used interchangeably to cover large data sets that are processed regularly and typically come from a single source i.e., Walmart processes every single one of its transactions worldwide. Big data can be complex or simple.

Wide Data is the process of taking complex data sets from multiple silos and transforming them from largely unusable and labour intensive entries into meaningful, connected artifacts from which business analytics, decisions, and forecasts can be made. It can be Big Data, it can be simple, but typically, it is complex. 

At Cherre, our company mission is to transform real estate investing and turn underwriting into a science through this process. Our team connects data sets of all sizes—unlike “Big Data,” “Wide” encompasses both large and small data sets—and unites them together from diversified sources. Transforming our data pipeline with CDD practices helps us empower real estate professionals through a powerful and flexible application that connects ‘Wide,’ disparate real estate data and makes it more accessible for better investment and underwriting decisions. 

For Cherre, connecting data at this scale is all about providing turn-key access to unified public data, paid vendor feeds, and third-party application vendors. Accelerating data flow and accessibility for data sets of all sizes and mitigating the time needed to support analytics and business intelligence processes is the goal of utilizing Continuous Delivery practices in Cherre’s data landscape.

Moreover, not only is the systematic collection and connection of disparate sources of real estate data at the CoreConnect Role of our mission, but CDD enables us to support our real estate clients further by making the connected data usable; whether it be public data (taxes, zoning, mortgages, demographics, etc.), paid data (lease rolls, debts, weather, foot traffic, etc.), and internal data (deal management, leasing, CRM, energy, etc.). 

As the adoption of Big/Wide Data increases, so too does the complexity of the data pipeline grow to match. A reliable and agile data pipeline is the backbone for any organization to move quickly with new features to market, win more clients, and improve satisfaction with current clients. This is why the principles of Continuous Data Delivery pipeline are becoming more important than ever in data engineering.

As discussed in the first part of this blog series; Laying the Foundations: CD to CDD, the foundations for best practice continuous delivery include testing, resiliency, and immutability. These are key elements in the final CDD alloy. 

Data Pipelines: The Early Years

Let’s consider the progression of CD to CDD by relating this journey through a hypothetical data pipeline—the connection for the flow of data between two places: A and B.

Essentially, our initial aim with our theoretical  team’s pipeline is to make data available to our ‘clients’ for analysis and (eventually) visualization by transforming it through ETL/ELT. Our first ‘pipeline’ achieves its objective of transforming a data set into a simple artifact to realize this. However, this early on in the process it’s essentially the barebones CD equivalent of an artifact; it’s basically a skateboard being pushed across a plank. It certainly moves our data packet/artifact from A (data source) to B (end user/client) but the whole thing can go array. With a lack of testing, low availability, and poorly built infrastructure, the skateboard is an unreliable data pipeline. 

#datapipeline

Agile methodologies teach us to capture and process feedback from our clients/end-users, streamline processes, and drive innovation. We start to refine and shape our pipeline by looking at what our clients need; note in Cherre’s case, part of this is data connectivity and the other part transforming entries into usable, actionable data. Our pipeline starts to look more like a bridge which can support our latest data packet iteration across. 

#Agilepipeline

A Continuous Cycle of Iteration and Improvement

As new feedback comes in, our data models may drift off target and we need to recalibrate or re-engineer with updated data sets. To build on the Agile success of responding to client feedback, DevOps and continuous delivery (CD) take us onwards to connect our team’s development and IT operations (through the introduction of automation, testing, and version control) to support and amplify our data pipeline’s agility, responsiveness, and availability.

#CDpipeline

We know there’s still more work to be done. Client feedback informs us as such. Scale and availability are both starting to become increasingly important to our end-users. We need to improve on our fragmented processes, inefficient operations, and functional silos (isolated and enclosed organizational structures that result in artificial barriers between departments preventing the smooth flow of work and data downstream). 

DevOps best practices of continuous loops of learning, collaboration, and feedback support our team to reduce repetitive work, improve cross-team skilling, and create an environment where building and testing occur simultaneously. We start to see elements of immutability in our data pipeline so the data sets progress exactly the same through testing, pre-production, and production. Our immutable artifacts give our team confidence that the artifact that was tested is what will be deployed to production. 

By adopting a DevOps mindset, our team starts to shorten resolution time and improve on collaboration between different departments. We can implement version control through blue/green deployments and rollbacks to ensure our clients never experience downtime in their need to access data. We can expand on our foundations of testing, immutability, and resiliency through automation to create a CDD pipeline that is swift and resilient. 

#CDDpipeline

Conclusion

At Cherre, we built our CDD pipeline around this very framework. Management processes like Agile and DevOps influence how we continuously construct, refine and upgrade the pipeline and its functionality. For us, CDD has no defined ending or final destination in terms of when our pipeline will be considered ‘complete’. 

The challenge with this CDD pipeline is the level of complexity involved with construction,  maintenance, and improvements. Agile and DevOps methodologies provide us with the road map to keep responding to end-user feedback by continuously iterating and improving our process.


Cherre is the leader in real estate data and insight. We help companies connect and unify their disparate real estate data so they can make faster, smarter decisions. Contact us today to learn more.

Stefan Thorpe

▻ VP of Engineering @ Cherre ▻ Cloud Solutions Architect ▻ DevOps Evangelist 

Stefan is an IT professional with 20+ years management and hands-on experience providing technical and DevOps solutions to support strategic business objectives.

References