Introduction
Within the current commerce conflict, governments have weaponized commerce by means of cycles of retaliatory tariffs, quotas, and export bans. The shockwaves have rippled throughout provide chain networks and compelled firms to reroute sourcing, reshore manufacturing, and stockpile vital inputs—measures that reach lead occasions and erode once-lean, just-in-time operations. Every detour carries a value: rising enter costs, elevated logistics bills, and extra stock tying up working capital. Consequently, revenue margins shrink, cash-flow volatility will increase, and balance-sheet dangers intensify.
Was the commerce conflict a singular occasion that caught international provide chains off guard? Maybe in its specifics, however the magnitude of disruption was hardly unprecedented. Over the span of only a few years, the COVID-19 pandemic, the 2021 Suez Canal blockage, and the continuing Russo-Ukrainian conflict every delivered main shocks, occurring roughly a 12 months aside. These occasions, tough to foresee, have brought about substantial disruption to international provide chains.
What will be accomplished to organize for such disruptive occasions? As a substitute of reacting in panic to last-minute adjustments, can firms make knowledgeable choices and take proactive steps earlier than a disaster unfolds? A well-cited paper by MIT professor David Simchi-Levi affords a compelling, data-driven strategy to this problem. On the core of his technique is the creation of a digital twin—a graph-based mannequin the place nodes symbolize websites and amenities within the provide chain, and edges symbolize the stream of supplies between them. A variety of disruption situations is then utilized to the community, and its responses are measured. By way of this course of, firms can assess potential impacts, uncover hidden vulnerabilities, and determine redundant investments.
This course of, often known as stress testing, has been broadly adopted throughout industries. Ford Motor Firm, for instance, utilized this strategy throughout its operations and provide community, which incorporates over 4,400 direct provider websites, tons of of hundreds of lower-tier suppliers, greater than 50 Ford-owned amenities, 130,000 distinctive elements, and over $80 billion in annual exterior procurement. Their evaluation revealed that roughly 61% of provider websites, if disrupted, would don’t have any influence on earnings—whereas about 2% would have a major influence. These insights essentially reshaped their strategy to provide chain threat administration.
The rest of this weblog publish offers a high-level overview of how one can implement such an answer and carry out a complete evaluation on Databricks. The supporting notebooks are open-sourced and out there right here.
Stress Testing Provide Chain Networks on Databricks
Think about a state of affairs the place we’re working for a worldwide retailer or a shopper items firm and tasked with enhancing provide chain resiliency. This particularly means guaranteeing that our provide chain community can meet buyer demand throughout future disruptive occasions to the fullest extent doable. To realize this, we should determine susceptible websites and amenities inside the community that might trigger disproportionate injury in the event that they fail and reassess our investments to mitigate the related dangers. Figuring out high-risk areas additionally helps us acknowledge low-risk ones. If we uncover areas the place we’re overinvesting, we will both reallocate these sources to steadiness threat publicity or cut back pointless prices.
Step one towards reaching our objective is to assemble a digital twin of our provide chain community. On this mannequin, provider websites, manufacturing amenities, warehouses, and distribution facilities will be represented as nodes in a graph, whereas the sides between them seize the stream of supplies all through the community. Creating this mannequin requires operational information similar to stock ranges, manufacturing capacities, payments of supplies, and product demand. Through the use of these information as inputs to a linear optimization program—designed to optimize a key metric similar to revenue or price—we will decide the optimum configuration of the community for that given goal. This permits us to determine how a lot materials needs to be sourced from every sub-supplier, the place it needs to be transported, and the way it ought to transfer by means of to manufacturing websites to optimize the chosen metric—a provide chain optimization strategy broadly adopted by many organizations. Stress testing goes a step additional—introducing the ideas of time-to-recover (TTR) and time-to-survive (TTS).

Time-to-recover (TTR)
TTR is without doubt one of the key inputs to the community. It signifies how lengthy a node—or a bunch of nodes—takes to get better to its regular state after a disruption. For instance, if considered one of your provider’s manufacturing websites experiences a fireplace and turns into non-operational, TTR represents the time required for that website to renew supplying at its earlier capability. TTR is often obtained immediately from suppliers or by means of inside assessments.
With TTR in hand, we start simulating disruptive situations. Underneath the hood, this includes eradicating or limiting the capability of a node—or a set of nodes—affected by the disruption and permitting the community to re-optimize its configuration to maximise revenue or decrease price throughout all merchandise underneath the given constraints. We then assess the monetary lack of working underneath this new configuration and calculate the cumulative influence over the period of the TTR. This provides us the estimated influence of the precise disruption. We repeat this course of for hundreds of situations in parallel utilizing Databricks’ distributed computing capabilities.
Under is an instance of an evaluation carried out on a multi-tier community producing 200 completed items, with supplies sourced from 500 tier-one suppliers and 1000 tier-two suppliers. Operational information have been randomly generated inside affordable constraints. For the disruptive situations, every provider node was eliminated individually from the graph and assigned a random TTR. The scatter plot beneath shows complete spend on provider websites for threat mitigation on the vertical axis and misplaced revenue on the horizontal axis. This visualization permits us to shortly determine areas the place threat mitigation funding is undersized relative to the potential injury of a node failure (purple field), in addition to areas the place funding is outsized in comparison with the danger (inexperienced field). Each areas current alternatives to revisit and optimize our funding technique—both to boost community resiliency or to scale back pointless prices.

Time-to-survive (TTS)
TTS affords one other perspective on the danger related to node failure. Not like TTR, TTS will not be an enter however an output—a call variable. When a disruption happens and impacts a node or a bunch of nodes, TTS signifies how lengthy the reconfigured community can proceed fulfilling buyer demand with none loss. The chance turns into extra pronounced when TTR is considerably longer than TTS.
Under is one other evaluation carried out on the identical community. The histogram reveals the distribution of variations between TTR and TTS for every node. Nodes with a detrimental TTR − TTS are usually not a priority—assuming the offered TTR values are correct. Nonetheless, nodes with a optimistic TTR − TTS could incur monetary loss, particularly these with a big hole. To reinforce community resiliency, we will reassess and probably cut back TTR by renegotiating phrases with suppliers, improve TTS by constructing stock buffers, or diversify the sourcing technique.

By combining TTR and TTS evaluation, we will achieve a deeper understanding of provide chain community resiliency. This train will be carried out strategically on a yearly or quarterly foundation to tell sourcing choices, or extra tactically on a weekly or day by day foundation to watch fluctuating threat ranges throughout the community—serving to to make sure easy and responsive provide chain operations.
On a light-weight four-node cluster, the TTR and TTS analyses accomplished in 5 and 40 minutes respectively on the community described above (1,700 nodes)—all for underneath $10 in cloud spend. This highlights the answer’s spectacular pace and cost-effectiveness. Nonetheless, as provide chain complexity and enterprise necessities develop—with elevated variability, interdependencies, and edge instances—the answer could require larger computational energy and extra simulations to take care of confidence within the outcomes.
Why Databricks
Each data-driven resolution depends on the standard and completeness of the enter dataset—and stress testing isn’t any exception. Firms want high-quality operational information from their suppliers and sub-suppliers, together with info on payments of supplies, stock, manufacturing capacities, demand, TTR, and extra. Amassing and curating this information will not be trivial. Furthermore, constructing a clear and versatile stress-testing framework that displays the distinctive points of your enterprise requires entry to a variety of open-source and third-party instruments—and the flexibility to pick out the suitable mixture. Particularly, this contains LP solvers and modeling frameworks. Lastly, the effectiveness of stress testing hinges on the breadth of the disruption situations thought of. Operating such a complete set of simulations calls for entry to extremely scalable computing sources.
Databricks is the best platform for constructing such a resolution. Whereas there are various causes, a very powerful embrace:
- Delta Sharing: Entry to up-to-date operational information is important for growing a resilient provide chain resolution. Delta Sharing is a robust functionality that allows seamless information trade between firms and their suppliers—even when one social gathering will not be utilizing the Databricks platform. As soon as the information is on the market in Databricks, enterprise analysts, information engineers, information scientists, statisticians, and managers can all collaborate on the answer inside a unified, information clever platform.
- Open Requirements: Databricks integrates seamlessly with a broad vary of open-source and third-party applied sciences, enabling groups to leverage acquainted instruments and libraries with minimal friction. Customers have the flexibleness to outline and mannequin their very own enterprise issues, tailoring options to particular operational wants. Open-source instruments present full transparency into their internals—essential for auditability, validation, and steady enchancment—whereas proprietary instruments could provide efficiency benefits. On Databricks, you might have the liberty to decide on the instruments that finest fit your wants.
- Scalability: Fixing optimization issues on networks with hundreds of nodes is computationally intensive. Stress testing requires operating simulations throughout tens of hundreds of disruption situations—whether or not for strategic (yearly/quarterly) or tactical (weekly/day by day) planning—which calls for a extremely scalable platform. Databricks excels on this space, providing horizontal scaling to effectively deal with complicated workloads, powered by robust integration with distributed computing frameworks similar to Ray and Spark.
Abstract
International provide chains usually lack visibility into community vulnerabilities and wrestle to foretell which provider websites or amenities would trigger probably the most injury throughout disruptions—resulting in reactive disaster administration. On this article, we introduced an strategy to construct a digital twin of the provision chain community by leveraging operational information and operating stress testing simulations that consider Time-to-Get well (TTR) and Time-to-Survive (TTS) metrics throughout hundreds of disruption situations on Databricks’ scalable platform. This technique permits firms to optimize threat mitigation investments by figuring out high-impact, susceptible nodes—just like Ford’s discovery that solely a small fraction of provider websites considerably have an effect on earnings—whereas avoiding overinvestment in low-risk areas. The result’s preserved revenue margins and lowered provide chain prices.
Databricks is ideally suited to this strategy, because of its scalable structure, Delta Sharing for real-time information trade, and seamless integration with open-source and third-party instruments for clear, versatile, environment friendly and cost-effective provide chain modeling. Obtain the notebooks to discover how stress testing of provide chain networks at scale will be applied on Databricks.