Frequently Asked Questions (FAQ)
This Frequently Asked Questions page provides answers to common questions we receive about PipelineML. If your questions are not addressed here, please post your question in the comments section for consideration of inclusion.
- The ability to discover asset information at the earliest possible state (when it is cheapest to discover) and then persist that information for the duration of asset life
- The means to significantly reduce the cost of information discovery
- The facility to get information into the hands of decision-makers faster and more reliably
- The mechanism to check the quality of data when it goes out and comes back so that issues can be discovered as quickly as possible before it has the opportunity to negatively impact operational decisions
- PipelineML provides a breadcrumb trail of asset information every time information trades hands that can be used in numerous processes such as TVC compliance
- A PipelineML file can be opened on any platform, software, version, or device (server, PC, Mac, tablet, or smartphone) today or 20 years in the future and its meaning will remain unchanged, providing a solid foundation for highly durable and available data persistence upon which sound business decisions can be made for decades into the future.
Every pipeline operator, department, service provider, and software application have different terminology and code values that mean the same things. Hence, the biggest challenge faced with information sharing is addressing how different parties who may utilize dissimilar terminologies and values are able to convey their information to other parties in a meaningful way that does not cause ambiguity or confusion.
Prior to PipelineML, if oil and gas stakeholders wanted to share information, someone needed to create seed files or translation maps that defined how to translate every value used by the sending party to the values used by the receiving party. This is a tedious and onerous (i.e. slow and expensive) process that creates bottlenecks in information sharing. Hence, if 7 parties wanted to exchange data with one another, they had to create 42 seed files or translation maps (see Figure 1). This created a multiplicative exchange model (that multiplies every time a new party wants to exchange data, n-1 x n).
PipelineML changes this paradigm by establishing a common industry vocabulary comprised of reference code lists and values—known technically as a controlled vocabulary. This allows everyone who shares data via PipelineML to map their internal codes and values to a single standard that everyone else uses. This drastically reduces the amount of work and preparation time required to share information. Once a software application translates their internal values and codes to those used by PipelineML, they can import and export data to an unlimited number of parties. This creates an additive exchange model (n) that allows 7 parties to exchange data by requiring only 7 translation maps instead of 42.
Non-PipelineML Data Exchange Model Party-to-Party Mapping (Web) | PipelineML Data Exchange Model Universal Mapping (Hub and Spoke) |
---|

A data interchange standard without a controlled vocabulary is like two people trying to communicate who do not speak the same language. It can be done, but not efficiently. The goal with a controlled vocabulary is to eliminate ambiguity—as ambiguity is the enemy of interoperability. If two systems want to share information, all ambiguity must be eliminated so every term has a universally agreed up on meaning, spelling, and discrete set of code values. This does not mean that everyone must change their internal terms, definitions, and code values to match those used by this data interchange standard to utilize it. It simply means that when an application wants to communicate with another system via PipelineML (whether across operator departments or between an operator and its service providers), it simply has the capacity to translate its internal terminologies and codes into the universal ones (controlled vocabulary) so it can speak a common language with all other PipelineML-compliant applications.
PipelineML was designed to overcome the biggest efficiency bottleneck in oil and gas data sharing. Existing data interchange solutions require a person to manually handhold the exchange process. Someone needs to create a seed file to map data between two specific sources. A database administrator must back up a database and someone else must restore the copy and perform transformations on the data. Someone maps columns of data in a spreadsheet and manipulates the data to suit a business unit’s needs. A GIS specialist must import a Shapefile, analyze it, and export data for another application.
There is nothing wrong with any of these approaches to exchanging data as each is well suited to a given set of business needs (and PipelineML will not replace any of them). These are all effective data management tools in the belts of professionals getting work done every day. The opportunity that PipelineML provides is to remove the most onerous and time-consuming aspect of information sharing—translating and mapping data between parties. Because everyone agrees to use a common controlled vocabulary, a person is no longer required to map or transform data between systems. Since all systems know the exact meaning of all terms and values, machines can natively read the information from one system to another. This streamlines the process of sharing information so people can focus on the important task of analyzing the data and making critical decisions about the information.
Hence, PipelineML facilitates machine-readability and machine-to-machine data sharing. This positions the industry to move into a new era of data processing scalability. This capability undergirds PipelineML’s objective to help people move information quickly and easily between disparate platforms, systems, applications, and devices without the need for people to interpret its meaning. There will always be some data that require subject matter expertise for interpretation. However, the majority of pipeline information (anecdotally, perhaps 80%) that needs to change hands is very simple in nature. Most individuals making decisions that keep product safely flowing through pipes and satisfy regulatory requirements simply need answers to basic questions. Where is this asset located? When was it installed? What is the maximum operating pressure? Who was the manufacturer? When was the last pressure test performed? Where are the control valves located on this line? What is the spill volume of this segment? What products are flowing through this line and at what operating pressures? Where is the MTR? What is the average diameter of pipe running through this county? How close does this line come to a high consequence area? What foreign lines utilize this section of a shared right of way? What remediation activities are scheduled for this system? Which integrity engineer is responsible for this section of pipe?
Currently, someone in integrity needs to call someone in the field to determine the current operating pressure of a particular system. This information is already known and its status is up-to-date in an operational software application somewhere. Yet, the integrity department is running one application and the field operations group is using another. Even though both data silos may exist in the same building, they might as well be worlds apart. An integrity engineer’s software cannot communicate with the operations system and even if they could, the vernacular used between these two groups differ significantly. What one group calls “wall thickness,” another calls “NWT.” Many such interchange hurdles exist throughout the various data systems managed by every operator.
PipelineML was engineered to address these lightweight data exchange use cases to support full automation so every application in the enterprise has the ability to send and receive basic information between systems about pipeline assets and the activities being performed on them. PipelineML lays a solid foundation for machines to communicate with other machines, understand the context of inquiries, find answers, and process requests without involving a human being and the entire process can be completed in seconds. Automating the processing and flow of lightweight information exchanges removes the workload off subject matter experts whose time is better spent solving problems machines are not capable of doing. GIS specialists can concentrate on performing real value-add analytics instead of performing menial tasks that can be offloaded to automated processing, such as converting files between formats or projection systems.
Every software application that wants to exchange information using PipelineML must simply conform to explicit naming conventions and a well-defined set of data rules. For example, if two software applications want to exchange information about a pipeline, they must communicate wall thickness as NominalWallThickness and the unit of measure must be clearly articulated as either inches or millimeters (PipelineML was engineered from an international consortium of subject matter experts and supports multiple systems of measurement and languages). This strict vocabulary allows machines to understand the meaning of data without involving a human being.
Besides enabling automation, PipelineML can speed up the process of manually moving data between systems. Someone can export data out of one system and immediately import it into another system with a few clicks (provided that both applications have PipelineML import and export features added). This has the potential to unify operator data submissions to regulatory agencies. If all regulatory agencies standardized on this data interchange standard, any software in the industry could be retrofitted to output data such that the filing process could be done quickly, easily, and in a unified manner (even automatically). This would expedite the process for operators fulfilling reporting requirements for all agencies using a single standard as well as empowering regulatory agencies to utilize machine-readable data imports from all operators using a controlled vocabulary (ready-made scalability). This translates to the saving of large amounts of money on the parts of all parties involved.
The design characteristics that shaped PipelineML allow it to support another feature capable of expediting the movement of information between disparate systems. As is, operators are typically faced with two options when it comes to receiving information. They can allow it to come in as is and clean it up later, or they must clean it up prior to accepting it. Both options are problematic. PipelineML enables a third option. Because the data format, structure, rules, and vocabulary are so well defined, computers can test the data automatically and immediately provide a list of warnings, errors, and validation ratings back to the submitting party. The current version of the PipelineML standard includes 354 tests that can be performed on a PipelineML file, each of which can produce warnings or errors if data is out of compliance with the standard. A free PipelineML data validation service is in the works that will automatically perform all these tests in seconds or minutes (depending on the size of the PipelineML file) and provide machine-readable validation results. This allows operators to set automated error thresholds to determine what data is received, accepted, or earmarked as requiring additional processing before it can be trusted and utilized in critical business processes.
Automated data validation allows someone in the field to submit data as soon as connectivity is available and know the quality of the data in seconds or minutes—long before the ditch is covered up. The receiving party can also enable automatic validation on their end to determine whether it meets their quality assurance standards or not. Each operator can pre-define how to respond to incoming data of various qualities (i.e. automatically reject incoming data with an error rate higher than 5%). Also, some business processes could be configured to make use of data with higher than 10% error rate while others require the data to contain no errors (but permits any number of warnings) before it can be utilized. PipelineML provides the mechanisms to facilitate such advanced business process automation.
- New Pipeline Construction Projects Data Intake
- Pipeline Rehabilitation Projects Data Intake
- Asset Acquisitions and Divestitures
- Internal Business Units- To keep product flowing safely, numerous business units within an operator must continually exchange information. This includes such business units as system design, commercial, construction, field operations, land administration, GIS, SCADA, facility management, risk assessment, integrity management, regulatory compliance, etc.
- Service Providers - Pipeline operators need to provide information to their service providers as well as to receive information back from them at completion of work. This includes such service providers as construction management companies, survey and mapping companies, pipeline rehab vendors, pressure testing companies, ILI pigging vendors, alignment sheet vendors, etc.
- Regulatory Agencies - Pipeline operators are required to submit up-to-date information about assets being managed to various regulatory agencies such as PHMSA, PPTS, TRRC, etc.
- Emergency Responders - Pipeline operators have the opportunity to share information about operated assets to emergency responders.
- Other Pipeline Operators - Pipeline Operators frequently exchange pipeline data during acquisitions, divestitures and transfer of operating responsibilities.
- Step 1: Operator requests service providers download the latest PipelineML industry standardized codes to use for the duration of the project. Service providers downloads the latest PipelineML codes and loads them into their construction management software and field collection devices.
- Step 2: Operator uses their operational software to determine the section of pipe needing remediation. They output the inventory of affected components, their attributes, and weld location information into a PipelineML file. They run the PipelineML file through automated validation and finds no errors. They provide the PipelineML file to the rehab vendor as part of a digital dig package as well as to the survey and mapping company for review.
- Step 3: The rehabilitation vendor uses the free PipelineML validation service to validate the incoming PipelineML file. They determine that it contains no errors. They import the PipelineML file into their construction management application. They use the component and weld location information to locate and verify the affected components. They perform the necessary remediation on the components and then output a PipelineML file containing all work completed. They validate it for accuracy and when no errors are found, they hand off a copy to the survey and mapping company as well as the operator.
- Step 4: The survey and mapping vendor checks the PipelineML file by running it through the free PipelineML validation service and determines that it is valid and contains no errors. They complete the survey and mapping process and consolidate the results into their proprietary software in preparation for delivery. They export a PipelineML file from their internal software and then validate it to ensure it contains no warnings or errors. It passes. The survey and mapping vendor sends the file to operator.
- Step 5: The operator receives the PipelineML file and runs it through the validation process where it passes all tests. They import the PipelineML file into their GIS software. They run a comparison between the incoming PipelineML file and the original outgoing PipelineML file and find no anomalies. They review the rehab work as well as the survey and mapping work and approve the results. The operator retains a copy of the initial outgoing PipelineML file, the rehab PipelineML file, and the survey and mapping PipelineML file in their project records repository (that contains metadata identifying the affected components and locations where it is discoverable by analytics engines in the future). The project is completed and closed.

The goal of PipelineML is to provide a data interchange standard that is capable of supporting the entire lifecycle of pipeline assets from cradle to grave. The diagram below illustrates a simplified model of the asset lifecycle containing just 3 stages and a dozen vendor offerings in each category. In reality, the lifecycle stages might include design, land/row of way management, parts acquisition and inventory management, construction management, accounting, as-built survey and mapping, field operations, SCADA, regulatory compliance, integrity management, risk assessment and management, pressure testing, ILI pigging, aerial dispersion and flow simulation (computational fluid dynamics), public awareness, divestiture, etc. Within each of these categories exists dozens of vendor product offerings.

A large operator may have over a hundred software applications deployed at various stages of asset life cycle management. Many of these applications touch or manipulate asset information at some point. Without a means to share data between these applications, data silos develop where critical business decision information is isolated in numerous desperate platforms and software applications. The end result is that the people who need information from many of these systems cannot easily get access to it. PipelineML was designed from the ground up to solve this issue. The yellow arrows in the diagram show the movement of information between these various systems. In reality, every application in the diagram could freely move information between any other applications (within the same stage or downstream) using PipelineML.
The ideal approach to managing data is to gather information at the earliest point in the life of assets in the most accurate condition possible and then to persist it in that condition for the duration of its life. The alternatives are just too slow and costly to justify. Printing a digital document from one system only to scan it back into digital format in another system is too inefficient to sustain, especially given the projected growth curves of data. Manually re-keying information between systems is the most common entry point for data errors in the enterprise. Capturing data in digital format and then persisting it in that pristine condition is the most economical way to manage information. The cost to rediscover information that was once known and lost in data silos is too high to be considered a valid means of managing information.
This means that when a pipeline system is first designed using AutoCAD, Bentley, etc., that the asset information can be exported out of those system design applications as PipelineML and fed downstream to other stakeholders. For example, the group responsible for purchasing the thousands of components that will make up that new system could take the PipelineML file sent from the system designer and import into their proprietary parts inventory management system. Once the parts have been purchased, additional information will be known such as manufacturers, model numbers, specifications, MTR’s, etc. Then, the parts acquisition group can export the detailed asset information out of their proprietary system as PipelineML and pass it along to the construction group.
This process can continue through each stage of the asset life cycle such that information is never lost into data silos that are only known to select groups. Because the information is captured in digital form at the beginning and passed along through each stage of the pipeline system development, the pedigree of the data is known to be high and accurate. Instead of the asset information having to be rediscovered through costly processes downstream, everyone gets the information they need to make informed decisions in a timely, cost-effective manner carries a high pedigree of accuracy, detail, and completeness.
Foundational to PipelineML is a set of reference code lists and standardized values known technically as a controlled vocabulary. To understand what a reference code list is and its corresponding values, consider the following example. This code list is known as LinePipeBoundSpecification.
At the time of this writing, this code list contains 18 sets of values. As many of these codes as possible have been compiled from existing industry standards, such as those defined by the American Petroleum Institute (API). These values are not directly embedded in the PipelineML standard but are managed as external resources the standard uses for validation. This separation allows the codes to change independent of the standard. Whereas the PipelineML standard may only be revised every few years, these code values may be changed whenever industry stakeholders require.
At the time of this writing there are 8,756 codes contained in PipelineML’s controlled vocabulary. These values have undergone numerous phases and have been reviewed by many pipeline engineers. They are expected to be in a state of continual evolution for years to come with the continued help of subject matter experts from both pipeline engineering and data standards professionals. The OGC PipelineML SWG is currently managing these code lists, yet this responsibility may eventually be offloaded to an authoritative body capable of judiciously administrating them.
Coupling PipelineML with a controlled vocabulary opens the door to numerous opportunities in the future. For example, a pipeline designer creates a set of schematics for a new pipeline system in AutoCAD and then exports the data out of AutoCAD as a PipelineML file. They hand it off to someone in the parts acquisition department who imports it into their proprietary inventory management system. It contains all the component specifications and attributes chosen by the designer. Once the components have been purchased, the results are output from the acquisition software as a new PipelineML file that now contains additional manufacturer information and details relevant to each individual component instance. This PipelineML file is sent to the construction management team who imports it into their proprietary construction management software. The survey and mapping company imports the PipelineML file into their internal software. Once their work is complete, the survey and mapping company returns their as-built results to the operator as a PipelineML file. The operator now has the ability to automatically compare the pipeline design PipelineML file, parts acquisition PipelineML file, construction PipelineML file, and as-built survey PipelineML file to see any problematic delta points from concept to full execution. This presents new business opportunities.
Because we define a controlled vocabulary that does not change across the asset’s life, PipelineML is designed to facilitate the interchange of data through the entire life cycle of pipeline assets (design, parts acquisition, construction, as-built survey and mapping, operations, integrity, rehabilitation, risk assessment, regulatory reporting, ILI tool runs, corrosion prevention, NDE, recoating, divestiture, etc.). When a pipeline designer calls for a 24” O.D.; API-5L; X-60; 60000 section of linepipe, that standardized definition will remain constant and machine readable across every software application that reads it across the enterprise over the course of the life of that asset. This consistency and durability of data open the door for an array of new products and services to emerge within the pipeline ecosystem. Every application that reads this data has the flexibility to label it differently based on the vernacular of a given operator, department, and its selected software applications. Yet, each of those applications knows how to define terminologies and values when sharing information with another system using PipelineML. This provides true interoperability that leaves no room for ambiguity.
LinePipeBoundSpecification Reference Code List and Values
ID | Code | Specification | Grade | Yield Strength | Bound Specification |
DB55DD29-E754-48AC-A455-7F9D0A086194 | 0 | No Data | No Data | NULL | No Data |
8EF2D185-0C01-E811-80E8-38EAA735D691 | 6164 | API-5L | A | 30000 | API-5L; A; 30000 |
8FF2D185-0C01-E811-80E8-38EAA735D691 | 6165 | API-5L | A-25 | 25000 | API-5L; A-25; 25000 |
90F2D185-0C01-E811-80E8-38EAA735D691 | 6166 | API-5L | B | 35000 | API-5L; B; 35000 |
91F2D185-0C01-E811-80E8-38EAA735D691 | 6167 | API-5L | X-42 | 42000 | API-5L; X-42; 42000 |
92F2D185-0C01-E811-80E8-38EAA735D691 | 6168 | API-5L | X-46 | 46000 | API-5L; X-46; 46000 |
93F2D185-0C01-E811-80E8-38EAA735D691 | 6169 | API-5L | X-52 | 52000 | API-5L; X-52; 52000 |
94F2D185-0C01-E811-80E8-38EAA735D691 | 6170 | API-5L | X-56 | 56000 | API-5L; X-56; 56000 |
95F2D185-0C01-E811-80E8-38EAA735D691 | 6171 | API-5L | X-60 | 60000 | API-5L; X-60; 60000 |
96F2D185-0C01-E811-80E8-38EAA735D691 | 6172 | API-5L | X-65 | 65000 | API-5L; X-65; 65000 |
97F2D185-0C01-E811-80E8-38EAA735D691 | 6173 | API-5L | X-70 | 70000 | API-5L; X-70; 70000 |
98F2D185-0C01-E811-80E8-38EAA735D691 | 6174 | ASTM A-106 | A | 30000 | ASTM A-106; A; 30000 |
99F2D185-0C01-E811-80E8-38EAA735D691 | 6175 | ASTM A-106 | B | 35000 | ASTM A-106; B; 35000 |
9AF2D185-0C01-E811-80E8-38EAA735D691 | 6176 | ASTM A-135 | A | 30000 | ASTM A-135; A; 30000 |
9BF2D185-0C01-E811-80E8-38EAA735D691 | 6177 | ASTM A-135 | B | 35000 | ASTM A-135; B; 35000 |
9CF2D185-0C01-E811-80E8-38EAA735D691 | 6178 | ASTM A-53 | A | 30000 | ASTM A-53; A; 30000 |
9DF2D185-0C01-E811-80E8-38EAA735D691 | 6179 | ASTM A-53 | B | 35000 | ASTM A-53; B; 35000 |
9EF2D185-0C01-E811-80E8-38EAA735D691 | 6180 | API-5LS | X-70 | 70000 | API-5LS; X-70; 70000 |
No, although PipelineML uses a common set of codes and values (collectively known as a controlled vocabulary), no one needs to change the terms, names, codes, and values they use internally. All that is required is that your software vendor/s develop import/export routines that convert your internal codes to the universe ones used by PipelineML. This crucial part of a data interchange standard ensures that everyone uses a common language that leaves no room for confusion or ambiguity.
- Platforms (operating systems like Unix, Windows, Apple, Android, Google)
- Information Systems (software applications and suites, data models)
- Devices (mobile, tablet, desktop, servers, etc.)
- Pipeline system information
- Collections of components
- Individual components (with a full complement of attributes including precise geographic location).
- Casing
- Elbow
- Linepipe
- Reducer
- Tee
- Coating
- Flange
- Meter
- Sleeve
- Valve
- Compressor
- Launcher Receiver
- Pipe Connector (Weld)
- Tap
- A pipeline system summary including the location of its centerline
- Package containing the locations and attributes of all welds on a section of pipe
- A small segment of a pipeline (of any size) and its attributes
- An inventory of all known individual components that are located in a pipeline system or segment (with or without weld information)
- A single component such as a meter, valve, compressor, etc. along with its attributes and geospatial location.
JSON is a newer data encoding solution than XML that provides the robust capability to embed the schema (data structure rules) along with the encoded data all as a single file or data stream. This makes for a powerful feature that abbreviates and streamlines the process of sharing information. However, there is one use case where this capability is not only not helpful but in fact works against its purpose. This use case is the definition of data interchange standards.
In a data interchange standard, the data structure rules are established as a well-defined, universal, and rigid set of rules that do not change. These rules are negotiated by an international body of data standard experts and industry subject matter experts. Once all parties agree on these rules, they are carried through a vetting and voting process where they become ratified as an international standard. These rules can only changes when this process is repeated and all parties agree on the new rules.
This creates a solid, rigid set of data rules that vendors can develop into their product suites (data import and export routines). Data files that conform to this standard can be tested to determine whether they adhere to those rules and can produce errors and warnings that do not vary from one product implementation to another. This ensure that every piece of software that implements this data interchange standard will always interpret that meaning of that data the same way. This consistency is essential in creating a solid international data interchange standard that is capable of supporting the needs of an industry such as oil and gas.
The flexibility JSON provides works against these goals and hence it is not supported in the current PipelineML standard. Additionally, the JSON ecosystem is missing some key elements that are needed in the support of an interchange standard. Most prominent among them is the ability to define a set of external reference code values that data in a JSON file can be tested against. The XML ecosystem provides such tools as Schematron and GML dictionaries (among others) that can fulfill this need. As the JSON ecosystem continues to mature in the future, we will continue to assess its ability to support our data encoding needs. If that maturity emerges, we will consider adding JSON encoding support to the PipelineML standard.
If you are an oil and gas stakeholder, all you need to do to begin leveraging the opportunities afforded by PipelineML is to contact your software vendor and ask them to build PipelineML data import and export capabilities into your software holdings. This will allow you to pull information into your internal systems from other parties as well as to get information out of your systems into the hands of other stakeholders who utilize other software products. We are in the process of building an assortment of resources designed to help you adopt PipelineML as quickly as easily as possible. This includes resources for software developers to create code that imports and export PipelineML data. Stay tuned for additional announcements and resources.
- Sign up for the PipelineML newsletter and keep abreast of ongoing developments (see bottom of this webpage).
- Use the contact page to submit your questions and inquiries.
- Join the OGC, attend PipelineML meetings, and vote on issues.
- Write papers on how you implemented PipelineML, what lessons you learned through that process, and how it benefits your company.
- Contribute open source code you write that utilizes PipelineML for use by the greater PipelineML community.