**Tags**

data mining, Data Visualization, General thinking, uncertainty analysis, uncertainty quantification

Why there are so many methods developed for uncertainty quantification? Why many research works seem to be very far from actual industry practice? What is the reason for the big gap between industry and academia in thinking about practical problems? How to close the gap and make one’s research more meaningful?

In an earlier post Practical Elements of “Decision Making under Uncertainty”, I introduced the topic of Decision Making Under Uncertainty (DMUU)–the fundamental challenge in petroleum reservoir development planning. In this post, I want to discuss more details, partially based on our recently published paper Scenario Discovery Workflow for Robust Petroleum Reservoir Development under Uncertainty [1].

### How does human mind handle uncertainty?

Uncertainty, as explained in the Practical Elements post, is a subjective state of not knowing the future. Human minds rely on something solid to make decisions, hence naturally hate uncertainty. We have fear when not knowing whether a company will offer you a job position, not knowing your boyfriend/girlfriend is really serious about the relationship, not knowing when and how disaster happens–that is why all the insurance companies exist.

From all human minds, This fundamental feeling of insecurity is resolved by establishing a clear boundary between “**my territory**“–something that is knowable, predictable, and controllable, and “**God’s territory**“–the realm of unknown, unpredictable, and uncontrollable. People put faith in the former one and let go of the latter one. Examples of the “my territory” include various human rational and irrational elements: logic and reasoning, previous knowledge & experience, partial differential equations, various sciences, suggestions from experts, intuition, guessing, and maybe astrology or even a coin flip. The “God’s territory” responses according to the attitudes in “my territory” with an actuality: the actual validity of the reasoning, the actual model suitability, the actual correctness of initial condition & boundary condition, the actual head/tail of a coin flip, and finally the actual outcome of the whole situation. This dualistic thinking separating the subject (I or me) and object (unknown situation) is the essential way of how the human mind functions (BTW, Zen or Dzogchen practice is to go beyond that!).

### Attitude taken by academia

Among numerous ways to establish the boundary between the two territories, the most influential way is the **framework of probability and statistics**, especially **Bayesian statistics**. It is the standard framework used by **academia**. Base on this framework, probabilities are something that at least we can try to make certain of. We can say, “the surgery has 95% chance of success”. And it is up to the “God” (meaning something we have no knowledge or control of) that my dear friend’s surgery is actually successful. Moreover, before the surgery has completely finished, the probability of success will vary if I have more understanding about the actual progress. Maybe the probability changed to 99% when I saw everything went well or changed to 70% when I saw the surgeons frowned time to time.

Statisticians phrase the uncertainty using the language of probability. In the ideal world viewed from the Bayesian framework, all the state of knowledge about the future for every little detail is represented using probability distribution functions. The logical dependency between different events is represented by conditional probabilities. So, whenever an actual result of the situation comes out, then the probabilities of all other events are updated according to the Bayes’ rule. This idea is of crucial importance in modeling the dynamics of the knowledge about uncertain situations. There are many great tools developed based on Bayes’ rule. For example, my previous post explained the Probabilistic Graphical Model (PGM)–the amazing representation of a complicated situation where many factors depend on each other, with an example illustrating a pricing model used by insurance companies.

In UQ (Uncertainty Quantification) for oil/gas reservoir development, the framework of Bayesian inference serves as the basis for a vast majority of academic papers. A few examples are [2-6]. Basically, the state of knowledge before the observation corresponds to the prior probability distribution. When new data come out, one can use the observation result to update the probability of the parameters so that we have the posterior probability distributions. For example, a newly drilled dry hole over the supposedly sweet spot may suggest the previous model is less likely to represent the reality.

History matching, according to Schlumberger Oilfield Glossary, is “[t]he act of adjusting a model of a reservoir until it closely reproduces the past behavior of a reservoir”. Under the framework of Bayesian statistics, this process may be rephrased as “to minimize the mismatch between the model prediction and actual observation”. Solving this problem automatically using a computer is called Assisted History Matching (AHM). When the program finishes running, the user obtains the adjusted parameters so that the model prediction maximally matches the history. Whether or not the adjusted parameters are realistic is heavily depending on the prior distribution of those parameters and the reliability of the model assumption. The computer algorithm may give history matching result that is very unrealistic. However, there are also academic works trying to deal with this model realism problem. This work [3] tried to filter out the history-matched unrealistic models by computing their “affinity” to some training models which are “realistic”.

Nevertheless, the way that academia handles uncertainty is to put faith in the statistical framework and its mathematical rigor.

### Attitude taken by industry

While academia is responsible for producing novel ideas and making them mathematically rigorous, it is the industry who actually develop the field and make things happen! Industrial practices involve many dirty works and one has to handle many irregular situations that never appear in the textbook. Black swans are not uncommon, just like situations where people talk about incredible comebacks in football and other sports games. Nevertheless, one can still find many places where Bayesian statistics can be used to represent the situation, especially for tasks that tend to have a clear and consistent pattern. For example, the Bayesian spam filtering is built based on the probabilistic relationship between whether or not an email is a spam and the various features it has (e.g. the appearance of “free money” in the email).

The probabilistic and Bayesian framework becomes even less favorable when it comes to reservoir modeling. The subsurface uncertainty is huge because we lack reliable ways to see the underground. Modelers has to integrate different types of information and figure out a big thread, just like a detective: geologists speak in terms of “channel” or “lobe”; geophysicists talk about “amplitude variation” and are good at tracing lines on seismic images; petrophysicists talk about “quartz” and “organic matters”; well logs has “microresistivity” and so on. It is hard to imagine that all these factors and the complicated reasoning logic can be represented in a Bayesian inference framework with prior probabilities and conditional probabilities. In fact, in most of the oil companies, especially ExxonMobil, reservoir engineers don’t trust AHM (assisted history matching).

Then one may ask, how does the industry handle the uncertainty? Where do they put faith on when facing uncertainties? How do they draw the line between what one can control and what one cannot?

It is their **past experience** that they rely on! They trust their previously succeeded approaches, working process, establishments, etc. Large oil companies usually have the most experienced experts spending a huge amount of effort resolving issues in the geomodeling process.

To my knowledge, ExxonMobil does a fantastic job meticulously building a single geological model that best conforms with existing data, and gives an estimated ultimate recovery (EUR) that is very accurate. Because ExxonMobil has more faith in their past experience on geomodeling, they started to consider uncertainty quantification much later than other big companies. Their decision makers know that it is very easy to overlook practical factors and underestimate the uncertainty if one adopts Bayesian framework too early.

### Closing the gap: decision making for green field development

Greater value comes from the mutual understand between the academia and the industry. As an example of collaboration, the paper Scenario Discovery Workflow for Robust Petroleum Reservoir Development under Uncertainty [1] is based on my ExxonMobil intern project that was done in 2015. It is an attempt trying to take uncertainty into consideration. In existing reservoir modeling framework, development plans were made based on a single model in ExxonMobil. I will try to explain the key ideas presented in that paper and connect those ideas to the larger context of green field development planning.

#### “What if the assumptions are wrong?”

In green field oil & gas development, decisions in early stage could have a huge impact, both financially and from an engineering point of view. The oil companies sometimes have to make billion-dollar investments, conduct seismic acquisitions, build large facilities including pipelines and water treatment facilities, and arrange drilling schedules. Many of those practices can cost millions of dollars each day.

However, in green fields where very few wells have been drilled, the data about the reservoir is very limited. Under this huge uncertainty, great caution has to be taken in making assumptions about subsurface situations. Geoscientists often have to guess what the underground looks like mostly based on analogous geologic concepts (that is to say, past experience), and see whether those assumptions have enough explaining power when integrating other sources of information. Then, the modeling team has to come up with a reservoir model that best represents the state of knowledge. And that reservoir model will be served as a basis for making detailed development plans (how to drill, what facility one should use, etc.). The human mind simply cannot function without something concrete, tangible and controllable. A model maximally serves that need.

The impact of uncertainty is traditionally analyzed at a later stage using **parametric variation**, which involves perturbing parameters such as permeability multipliers for different zones, connectivity multipliers between different blocks, or strengths of the aquifer. By considering possible ranges of different uncertain parameters, the possible range of the oil recovery can be obtained.

This figure is based on a realistic oilfield project. The left plot depicts the oil recovery estimation based on a set of assumptions made by the experts. It was believed that the entire reservoir has a same oil-water contact level, and the two sides of the reservoir have a similar net pay (net thickness of oil-producing rock layers). Given the estimated amount of recoverable oil, it is economically viable to use an FPSO (floating production storage and offloading) vessel to develop the field.

However, after a new well was drilled, the asset team re-examined the reservoir data and found that those assumptions were actually wrong. The oil-water contacts were much higher, and the net pay is much less than what was initially expected. The adjusted estimated oil recovery became the red curve in the right plot. If they had signed the FPSO contract based on the previous estimation, this would be a big financial loss. Based on this context, our Scenario Discovery paper [1] is trying to develop the way of handling uncertainty (summarize uncertainty into the form as shown in the right plot) while keeping the geologic realism as much as possible.

#### Scenarios: consistent descriptions about possible futures

The paper [7] by Bryant and Lempert from RAND Corporation (a famous global policy think tank) describes scenarios as “internally consistent and challenging descriptions of possible futures”. Here, the point is **consistency**. Arbitrarily varying parameters may undermine the geologic realism hence break the consistency. As a result, for robust decision making, the first step here for the oilfield development team is to build many different models instead of one model and keeping the model as consistent and realistic as possible. This stage is called exploratory modeling (ExxonMobil used the phrase “scenario generation”).

For many companies, that may sound really difficult. How can we have multiple geologically consistent models if creating only one such model may take 10 people working hard for one year? However, that is gradually becoming possible due to two reasons:

- The faster-growing computer technology greatly helps people build geo-models with automated workflows, better UI design, and customizable scripts. The new generation of modeling is upcoming! It will save tons of repetitive mouse clicks.
- Hints for alternative model descriptions are actually there, residing in geoscientists’ mind. While the modeling team may sit together and collectively figure out one single geologically consistent picture, individual geoscientists often have different opinions on how to interpret information. It might be surprising, but you will get many different results if you interview those geoscientists individually. So, different models could be obtained with some change in the team workflow.

In the Scenario Discovery paper [1], we used the SAIGUP dataset [8] created by a research group from University College Dublin to represent the range of uncertainty. SAIGUP is short for “Sensitivity Analysis of the Impact of Geological Uncertainties on Production”. It was a large research project involving tens of researchers (see page). The models in SAIGUP project was carefully created to maximally conform the geology. This consistency and geologic realism can be seen from some plots shown in the paper [9].

This dataset is an extreme case that it includes all possible combinations of geological parameters. We chose this dataset mainly to demonstrate our scenario discovery workflow. In the real project, the uncertainty would not be that extreme.

#### What does the decision maker want?

Now, we have a series of maximally realistic models that represents different possible realities as much as possible. What is the next step to enable decision makers to make detailed plans for development?

Business decisions are made oriented toward the outcomes. Financially, the outcomes are the return of investment and the robustness of the cash flow. From the engineering side, the outcome is the cost behind numerous practices of planning, implementation, logistics, safety considerations, etc. The decision maker does not just want to know the uncertainty range of financial outcomes but also want to know how different source of uncertainty affects the outcome, what are the engineering implication for those different possible outcomes, and how to plan for recourse–what actions to take in the event a low-side or high-side outcome is actually realized. To summarize, we need a method of taking into account uncertainty that (adapted from [1])

- Fully accounts for the best-estimated range of uncertainty in the subsurface by including all credible geological scenarios;
- Identifies geological origins of low-side or high-side outcomes to guide information gathering and recourse decisions;
- Can be practically implemented with existing or emerging reservoir modeling and simulation technologies combined with data mining and data visualization methods;
- Can be concretely and vividly communicated to decision-makers, who may not be simulation, statistics, or data analytics experts.

To address these questions, our paper [1] developed a post-processing workflow for different reservoir simulation models. The workflow is based on the idea of “scenario discovery”, which is defined as a decision analysis procedure of identifying regions in the space of uncertain input parameters that lead to acceptable or unacceptable outcomes, originated in the community of robust decision analysis and has had applications mainly

in government policy making. The original description of scenario discovery appears in [7]. For applications of scenario discovery in different areas, see [10-12].

The workflow in our paper serves as a helpful guide for decision making under uncertainty, and is designed to answer the following questions (except from [1]):

- What are possible production outcomes? How they are distributed?
- What are the main geological variables controlling outcomes?
- Which reservoir models can we select that best represent subsurface uncertainty relevant to business decisions?

This flowchart shows the steps that guide decision makers better understand the uncertainty. In Step 1, decision makers define one or several performance metrics so that the main issue of concern can be quantified. Then Step 2 gives a chance to examine the possible outcome ranges and their correlations between different outcome aspects (e.g. early water breakthrough may be positively correlated with low oil production). Step 3 demonstrates different ways of discovering the patterns between different aspects of uncertainty and outcome. Step 4 allows decision makers to individually examine the models, gain more insights, and finally choose a handful of representative models that best encode uncertainty information for robust decision making. Finally, all those findings can be summarized so that it can be easily understood by other people in the team.

#### Example workflows and data mining and visualization techniques

Our paper gives two examples of scenario discovery workflow. The first example is to cluster different types of field production behaviors directly from the shape of oil production curve. The second example is focusing on the different water breakthrough times and their geologic origins.

Having the background and the goal clarified, various standard data mining techniques can be used as a tool to address various needs in different contexts. Our paper demonstrated techniques including K-means clustering, PCA (principal component analysis), decision tree algorithms and the random forest algorithm. Moreover, a special data visualization technique called “**dimensional stacking plot**” is used to present multidimensional data on a 2-D plot. This type of plot has the benefit of simultaneously showing the geological parameters and outcome uncertainties. Decision makers can tune the number of uncertain parameters (*x*-axes and *y*-axes) and their order, and thus can see different levels of detail of the entire ensemble of the reservoir models. The dimensional stack plot was first used to visualize reservoir geological uncertainty and its effect by Suzuki et al. [13].

Above shows several plots related to the scenario discovery workflow. Please read our entire paper [1] if you have more interests about this topic.

### Summary of this post

We all face to uncertainty, and to make decisions in uncertain situations. However, human minds react to uncertainties in very different ways, depending on one’s role, one’s personality, potential risks, the context of the situation, etc. The academia is focusing on proposing new ideas and providing the theoretical basis for their ideas, thus naturally put more faith in the mathematical rigor of the framework of statistics. The industry has to really deal with the dirty work, facing real risks. Hence, the industry builds trust on past experience.

In decision making under uncertainty for petroleum reservoir development, it can be easily seen the division between the academia and the actual industrial practices. Fortunately, there are trends trying to close that gap in the recent years. I believe that the work behind our scenario discovery paper [1] is a part of that trend. It was rooted in the real business need and the context of actual industry practices in field development. At the same time, it also incorporated the ideas and techniques from the subject of decision analysis, data mining and data visualization. A real synergy thus occurred from there.

I hope, with a mutual understanding of each other, there will be more and more helpful collaboration between the industry and academia in the future!

### References:

[2] Scheidt, Céline, et al. “Probabilistic falsification of prior geologic uncertainty with seismic amplitude data: Application to a turbidite reservoir case.” *Geophysics* 80.5 (2015): M89-M12.

[3] Rojas, S., et al. “Controlling the sedimentological realism of deltaic reservoir models by the use of intelligent sedimentological prior information.” *First Break* 32.10 (2014): 69-72.

[4] Josset, Laureline, et al. “Accelerating Monte Carlo Markov chains with proxy and error models.” *Computers & Geosciences* 85 (2015): 38-48.

[5] Shirangi, Mehrdad G., and Louis J. Durlofsky. “Closed-loop field development under uncertainty by use of optimization with sample validation.” *SPE Journal* 20.05 (2015): 908-922.

[6] Sun, Wenyue, and Louis J. Durlofsky. “A New Data-Space Inversion Procedure for Efficient Uncertainty Quantification in Subsurface Flow Problems.” *Mathematical Geosciences* (2016): 1-37.

[7] Bryant, Benjamin P., and Robert J. Lempert. “Thinking inside the box: a participatory, computer-assisted approach to scenario discovery.” *Technological Forecasting and Social Change* 77.1 (2010): 34-49.

[8] Manzocchi, Tom, et al. “Sensitivity of the impact of geological uncertainty on production from faulted and unfaulted shallow-marine oil reservoirs: objectives and methods.” *Petroleum Geoscience* 14.1 (2008): 3-15.

[9] Howell, John A., et al. “Sedimentological parameterization of shallow-marine reservoirs.” *Petroleum Geoscience* 14.1 (2008): 17-34.

[10] Groves, David G., et al. *Preparing for an uncertain future climate in the Inland Empire*. Rand Corporation, 2008.

[11] Kwakkel, Jan H., Willem L. Auping, and Erik Pruyt. “Dynamic scenario discovery under deep uncertainty: the future of copper.” *Technological Forecasting and Social Change* 80.4 (2013): 789-800.

[12] Gerst, Michael D., Peng Wang, and Mark E. Borsuk. “Discovering plausible energy and economic futures under global change using multidimensional scenario discovery.” *Environmental modelling & software* 44 (2013): 76-86.

[13] Suzuki, Satomi, Dave Stern, and Tom Manzocchi. “Using association rule mining and high-dimensional visualization to explore the impact of geological features on dynamic flow behavior.” *SPE Annual Technical Conference and Exhibition*. Society of Petroleum Engineers, 2015.