Data Mining Objectives

The data mining objectives associated with preparing a highly curated engineering data set for an AIM program typically consist of the following components:

  • Establishing a data schema which meets the needs of the engineering and risk analysis
  • Locating and managing high value records across silos of physical and electronic repositories in a cost efficient manner, where the records are usually poorly indexed and not searchable
  • Locating complementary data from current and legacy structured data which can be used as validation and/or lookup tables for correlating data
  • Identifying gaps in record coverage and establishing a remediation plan to identify or generate missing records
  • Applying business rules (ex: convert fractions to decimals), low-level engineering tasks (ex: lookup applicable ASME codes) and data normalization.
  • Producing a database containing the data elements collected from the available documents and databases
  • Populating software applications including risk based inspection (RBI), plant management systems (PMS) and geographical information systems (GIS),
  • Performing trending and anomaly analysis of such as vessel and pipeline inspection points and corrosion
  • Achieving ROI on investments in software and services to perform risk assessment

The Data Gathering Processes

Our observations of the difficulties associated with current data gathering processes include the following:

  • Poor knowledge of available records needed to complete the engineering and risk assessment
  • Silos of on-site physical records and corporate repositories of images which are poorly organized and indexed
  • Poor quality record images (PDF and TIFF) which are not searchable
  • Large volumes of components (vessels, piping, valves, meter stations, pipeline) which require engineering review
  • Engineering resources are used to perform manual tasks to locate records and perform data entry
  • Poorly defined quality control processes to insure data integrity

The table below is reflective of the amount of missing data (white columns) needed to perform the risk evaluation (blue columns). Missing data (red cells) is potentially resident in historical records. Operators typically do not have good methods to locate the relevant records.

RADIX DATA, LLC
1773 Westborough Dr.
Katy, TX 77449
info@radixdata.com