Machine Learning for Integrity & Risk Management
Oil, Gas & Water Pipeline Systems
Machine Learning Practice
Our machine learning solutions follow a CRISP-DM based process (cross industry process for data mining) purposed for integrity & risk management. As shown below, the practice is an iterative, business objective driven process including key elements of learning & prediction data preparation, method selection, model validation, scoring & predictions, results analysis and continuous improvement. Check out our technical paper Machine Learning for Pipeline Integrity for more details.


Establish Business Case
Establish the business case, and identify the expected role and value of incorporating the machine learning practice. Set clear metrics of success, outline the overall process, identify team members & stakeholders, set roles & responsibilities, and secure financial commitments and leadership buy-in.

Define Data Requirements
Assess data requirements to meet the business case. This is often an iterative process leveraging domain expertise and learning results. Create a clear meta-data document identifying source, cost & value of data, and initiate integration into the learning process to quickly learn the value of data.

Prepare, Integrate, DynSeg & QA Data
Often the most resource intensive part of machine learning, the preparation, integration, dynamic segmentation, and QA of data is an iterative process leveraging statistical & feature analysis methods, learning curves to optimize learning data sets, outlier analysis and purposed quality metrics to ensure data is ready for learning and subsequent prediction.

Evaluate & Optimize Learning Methods
Hundreds of methods are available to the process to learn underlying patterns, the selection of which depends on required transparency, process performance and performance of unseen data processed thru the method. Learning curves support optimization of method hyper-parameters, sampling, feature selection and method types. Once a method is "learned" and "validated" it is defined as a model appropriate to support scoring & predictions.

Model Validation
The "models" resulting from the learning methods process are validated thru cross-validation and testing with unseen observations. Prediction results are then uniquely associated with levels of confidence and other performance measures.

Model Application & Prediction Results
The "models" resulting from the learning process are validated thru cross-validation and testing with unseen observations. Prediction results are then uniquely associated with levels of confidence and other performance measures, and ready for use to support the business case.

Risk Analysis & Decision-Making
Results are normally expressed as levels of susceptibility or probability of the target of interest and may be monetized to support mitigation decision-making and maintenance and capital planning.