Well Structured File Extractor (WSFE)

The well-structured-file-extractor is currently open for testing. It is considered experimental, so please consider it still subject to change.

The objective of this extractor is to provide tools and functionality enabling Solutions Architects and Data Engineers to ingest and parse industry standard files in the case when the source of the subsurface data targeted does not have a specialized extractor in place. This format files supported provided a more predictable set of metadata and will enhance our capacity to contextualize subsurface data eficiently

The service reads Standard oil and Gas Industry files from CDF, and parses them using the Petroware log I/O library, and creates sequences. The sequences will contain metadata found in the files.

We currently support these file formats:

  • LIS

  • DLIS

  • LAS

  • ASC (ASCII Files) ASC is a collective term used in the industry for denoting any ASCII based file that may contain well log information. The ASC read is implemented towards this structure but can be adapted https://petroware.no/html/logio.html#asc

Please read the section on specifications of the target file in the above log I/O library link before planning which works best for your specific use case.

Overview of current features:

  1. Load Files from a Folder Location into a Dataset in a CDF Project

  2. Get the files in CDF to a WSFE Queue and manage the process status

  3. Create Sequences Resource Types based on the extracted data

    1. Since some of these files can contain multiple tables based on Sampling rate and Top/Bottom Intervals, the WSFE may create multiple sequences for each file.

    2. Sequences can only support a maximum of 200 columns, so the WSFE will split larger tables into multiple sequences. The created sequences will be linked together with the nextSequence and previousSequence metadata value.

Communicate your issues through slack to the Channel: #help-wells.
Resident Product Manager - Subsurface Engineering Drilling & Wells: @Javier Leal.
Resident TechLead Subsurface Engineering: @Sigurd Holsen.

Metadata

WSFE attempts to find metadata about the files it processes. This metadata can be found in the table below.

Due to the way metadata is stored in these file formats, it is possible to find the same information under different names/tags/locations. WSFE uses the LogIO library to locate where the metadata is usually found. Still, it is reasonable to assume that there are some files that store metadata in some unpredictable and exotic way. It won’t necessarily find all the values listed in the table.

For more technical information about how the WSFE/LogIO searches for a specific metadata property, contact the Wells Team in the #wells-data-integration-team Slack channel.

CDF Key

File types

Comments

fileId

Asc, Las, Dlis

File id of the extracted CDF file

fileExternalId

Asc, Las, Dlis

File external id of the extracted CDF file

fileName

Asc, Las, Dlis

File name of the extracted CDF file

creator

Asc, Las, Dlis

“well-structured-file-extractor”

wsfeVersion

Asc, Las, Dlis

WSFE build number

wellName

Asc, Las, Dlis

fieldName

Asc, Las, Dlis

runNumber

Asc, Las, Dlis

rigName

Asc, Las, Dlis

company

Asc, Las, Dlis

serviceCompany

Asc, Las, Dlis

country

Asc, Las, Dlis

minInterval

Asc, Las, Dlis

Minimum value of index column

maxInterval

Asc, Las, Dlis

Maximum value of intex column

step

Asc, Las

bitSize

Asc, Las,

date

Las,

Date for log file log/start

dateComment

Las,

Date description, often date/time format

md

Las, Dlis

Max measured depth

startDate

Dlis

start date/time for starting measurement

frameName

Dlis

sequenceNumber

Dlis

headerIdentifier

Dlis

genericToolNames

Dlis

latitude

Dlis

longitude

Dlis

location

Dlis

permanentDatum

Dlis

loggingMeasuredFrom

Dlis

drillingMeasuredFrom

Dlis

abovePermanentDatum

Dlis

Time Measurements

WSFE has experimental opt-in support for time indexed data. If enabled, the WSFE will check if the file is time indexed and create a set of time series instead of a sequence. Depth indexed data are still ingested as sequences.

A time-indexed file will often contain more then one data column. Unfortunately, at the time of writing, CDF does not support multi-column time series. The WSFE will create one time series per curve, all with the same index and with a common value for metadata.group.

Enable Time Series

Time measurement support is enabled by setting enable_time_series to true

>>> import os
>>> from cognite.well_model.wsfe import WellLogExtractorClient
>>> wlec = WellLogExtractorClient(
...     cluster="bluefield",
...     client_name="volve-files-extractor",
...     project="wells-test",
...     token=get_token(),
... )
>>> wlec.submit_multiple(items=[...], enable_time_series=True)

Time values

There are two ways to represent a point in time in DLIS standard, DTIME and TIME. DTIME is a formatted string describing a point in time by date and time for example as an ISO 8601 string, 2022-02-25T20:17:41Z. TIME is an amount of seconds, milliseconds, half-seconds passed from the start of measurement, 1500 ms. According to the standard specification the start of the measurement is denoted as CREATION-TIME attribute in the ORIGIN set of a frame.

WSFE uses LogIO for parsing the DTIME and TIME values. If no time zone information is present, WSFE assumes that the time is in UTC. For more technical information date-time string formats LogIO and WSFE supports contact the Wells Team in the #help-wells Slack channel.

To convert TIME to timestamp, milliseconds since epoch, WSFE is dependent on the CREATION-TIME attribute in the ORIGIN set of a frame. When converting TIME to timestamp value WSFE uses following formula, CREATION-TIME + TIME * factor where factor is set based on the unit of time.

Currently WSFE supports only two units of time: seconds, and milliseconds.

WellLogExtractorClient

Just as the CogniteWellsClient is the entrypoint for the well-data-layer, the WellLogExtractorClient is the entrypoint for the well-structured-file-extractor.

Instantiating a client should feel similar to the CogniteWellsClient and CogniteClient. However, since the WSFE isn’t running inside a CDF cluster, we must set the base_url to the URL of the WSFE service and set cluster to point to which CDF cluster you will use. You should generally just set the cluster value and let base_url be the default value.

>>> import os
>>> from cognite.well_model.wsfe import WellLogExtractorClient
>>> wlec = WellLogExtractorClient(
...     cluster="bluefield",
...     client_name="volve-files-extractor",
...     project="wells-test",
...     token=get_token(),
... )
>>>
>>> # It is recommended to enable logging when working with the WSFE SDK.
>>> import logging
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s [%(levelname)s] %(message)s",
...     handlers=[logging.StreamHandler()],
... )

The example above assume you have a token. Please check the Getting started page for more examples. You should also check out Cognite’s python-oidc-authentication examples.

class cognite.well_model.wsfe.WellLogExtractorClient(api_key: Optional[str] = None, project: Optional[str] = None, cluster: str = 'api', client_name: Optional[str] = None, base_url: Optional[str] = 'https://wsfe.cognitedata-production.cognite.ai', max_workers: Optional[int] = None, headers: Optional[Dict[str, str]] = None, timeout: Optional[int] = None, token: Optional[Union[str, Callable[[], str]]] = None, token_url: Optional[str] = None, token_client_id: Optional[str] = None, token_client_secret: Optional[str] = None, token_scopes: Optional[List[str]] = None, token_custom_args: Optional[Dict[str, str]] = None)

Entrypoint to everything about the wsfe.

The Well structured file extractor is currently experimental. The API might change at any time.

Parameters
  • api_key (Optional[str], optional) – API key

  • project (Optional[str], optional) – Project

  • cluster (str, optional) – api, greenfield, bluefield, azure-dev, etc. Defaults to “api”.

  • client_name (str, optional) – A user-defined name for the client. Used to identify number of unique applications/scripts running on top of CDF.

  • base_url (Optional[str], optional) – Defaults to “https://wsfe.cognitedata-production.cognite.ai”.

  • max_workers (int) – Max number of workers to spawn when parallelizing data fetching. Defaults to 10.

  • headers (Dict) – Additional headers to add to all requests.

  • timeout (int) – Timeout on requests sent to the api. Defaults to 60 seconds.

  • token (Union[str, Callable]) – token (Union[str, Callable[[], str]]): A jwt or method which takes no arguments and returns a jwt to use for authentication.

  • token_url (str) – Optional url to use for token generation

  • token_client_id (str) – Optional client id to use for token generation.

  • token_client_secret (str) – Optional client secret to use for token generation.

  • token_scopes (list) – Optional list of scopes to use for token generation.

  • token_custom_args (Dict) – Optional additional arguments to use for token generation.

Examples

Create a client:
>>> from cognite.well_model.wsfe import WellLogExtractorClient
>>> wlec = WellLogExtractorClient()
submit_multiple(items: List[SubmitJob], overwrite: bool = False, enable_time_series: bool = False) ProcessStateList

Submit a set of files to extraction.

This call will block until everything has been processed.

Parameters
  • items (List[SubmitJob]) – items to extract

  • overwrite (bool, optional) – Set to true to overwrite resources if they already exist in CDF.

  • enable_time_series (bool, optional) – Set to true to enable creation of Time Series from time indexed files.

Returns

Status object

Return type

ProcessStateList

Examples

Submit a file for extraction and wait
>>> import logging
>>> from cognite.well_model.wsfe import WellLogExtractorClient
>>> from cognite.well_model.wsfe.models import (
...    FileType,
...    JobDestination,
...    JobSource,
...    SubmitJob,
... )
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s [%(levelname)s] %(message)s",
...     handlers=[logging.StreamHandler()],
... )
>>> wlec = WellLogExtractorClient()
>>> job = SubmitJob(
...     source=JobSource(
...         file_external_id="WEL_Laverda_East_1_S1R2_CMR_Main_Pass_025PUC.dlis",
...         file_type=FileType.dlis,
...     ),
...     destination=JobDestination(
...         data_set_external_id="volve",
...     ),
...     contains_trajectory=False,
... )
>>> status = wlec.submit_multiple([job]) 
submit(items: List[SubmitJob], overwrite: bool = False, enable_time_series: bool = False, overwrite_sequences: bool = False) ProcessStateList

Submit a set of files to extraction.

If you’re sending many objects at once, please use submit_multiple.

Parameters
  • items (List[SubmitJob]) – items to extract

  • overwrite (bool, optional) – Set to true to overwrite resources if they already exist in CDF.

  • enable_time_series (bool, optional) – Set to true to enable creation of Time Series from time indexed files.

  • overwrite_sequences (bool, optional) – Deprecated, use overwrite

Returns

Status object

Return type

ProcessStateList

Examples

Submit a file for extraction and wait
>>> import logging
>>> from cognite.well_model.wsfe import WellLogExtractorClient
>>> from cognite.well_model.wsfe.models import (
...    FileType,
...    JobDestination,
...    JobSource,
...    SubmitJob,
... )
>>> logging.basicConfig(
...     level=logging.INFO,
...     format="%(asctime)s [%(levelname)s] %(message)s",
...     handlers=[logging.StreamHandler()],
... )
>>> wlec = WellLogExtractorClient()
>>> job = SubmitJob(
...     source=JobSource(
...         file_external_id="WEL_Laverda_East_1_S1R2_CMR_Main_Pass_025PUC.dlis",
...         file_type=FileType.dlis,
...     ),
...     destination=JobDestination(
...         data_set_external_id="volve",
...     ),
...     contains_trajectory=False,
... )
>>> status = wlec.submit([job]) 
>>> status.wait() 
The valid file types are
>>> from cognite.well_model.wsfe.models import FileType
>>> file_types = [
...     FileType.dlis,
...     FileType.las,
...     FileType.asc
... ]
status(process_ids: List[int]) ProcessStateList

Retrieve the status of a set of items previously submitted for extraction

status_report(statuses: List[ProcessState]) Dict[str, int]

Partition the set of statuses based on whether they are ‘ready’, ‘processing’, ‘done’ or ‘error’

Data classes

class cognite.well_model.wsfe.models.FileType(value)

An enumeration.

class cognite.well_model.wsfe.models.ProcessStatus(value)

An enumeration.

class cognite.well_model.wsfe.models.Severity(value)

An enumeration.

ProcessStatusList

class cognite.well_model.wsfe.process_state_list.ProcessStateList(client, resources: List[ProcessState])
dump(camel_case: bool = False) List[Dict[str, Any]]

Dump the instance into a json serializable Python data type.

Parameters

camel_case (bool) – Use camelCase for attribute names. Defaults to False.

Returns

A list of dicts representing the instance.

Return type

List[Dict[str, Any]]

to_pandas(camel_case=True) DataFrame

Generate a Pandas Dataframe

Parameters

camel_case (bool, optional) – snake_case if false and camelCase if true. Defaults to True.

Return type

DataFrame

wait()

Wait until the all jobs have completed.

While waiting, it will poll the service and print updates.

refresh_status()

Refresh the statuses.