Understanding the libopenapi data model

Learn how libopenapi constructs a model out of an OpenAPI spec.

Before getting into the code, let’s understand the key differences between OpenAPI versions 3.0 and 3.1.

Almost JSON Schema

OpenAPI 3.0 is loosely based on top of JSON Schema. In the sense that the Schema used by pretty much everything in OpenAPI, is similar to JSON Schema, but isn’t actually valid.

It’s very close, but it’s not actually valid JSON Schema. There are a number of variations and mis-matches

Don’t get us wrong, it’s way better than Swagger.

Fixed in 3.1

Hurrah! OpenAPI 3.1 was introduced to tweak the standard to be compliant with JSON Schema. Which means everything is all good right?

Actually no, it’s not all good. Even though the changes are small, they break a large number of models because of things like exclusiveMinimum and exclusiveMaximum changing from boolean to int types.

These may seem simple, but for strongly typed languages, these kind of multi-value between versions create mayhem.

Other tools have struggled to support OpenAPI 3.1, considering how deeply these types of changes can break models.

libopenapi was designed specifically to avoid these type of issues.

libopenapi grows with the standard

When we were thinking about the design of the library, we wanted to ensure it could grow with the OpenAPI standard as it changes, and gracefully support all previous versions.

There are a few principals we have employed to ensure this is possible…

A single schema

Every single object that is a Schema in libopenapi share the same base model across all versions of the standard. This means that every single property is available to you, from every single version across time.

It’s a variation graceful degradation pattern that we use in front-end applications.

Dynamic values

The jump to version 3.1 means that Schema types can be multiple things. A bool OR and int or a string or a []string

This might seem insignificant, but it really screws things up when using structs to define a concrete type for a schema in a strongly typed language.

To combat this ‘could be lunch-meat, it could be peaches - who knows?…’ problem, we have implemented a container that allows the contents to be salami or peaches.

This container is called DynamicValue at the high-level, and SchemaDynamicValue at the low-level.

Diagram of the model hierarchy

Moving forward, we’re only going to talk about OpenAPI, we don’t recommend Swagger use, please don’t use it

Model layer

The model layer is the first split in the design of the model, Once a new Document has been created from a []byte slice (the OpenAPI specification bytes), there are two methods available to create models.

The BuildV2Model method is for creating a Swagger model, the BuildV3Model is for creating an OpenAPI model.

The porcelain layer

The porcelain layer is a complete representation of the OpenAPI model, with easy to use and navigate data structs. Maps and slices can be easily iterated over and the entire tree can be explored with minimal code.

For example here is the high-level struct Operation that represents the OpenAPI Operation object:

type Operation struct {
  Tags         []string
  Summary      string
  Description  string
  ExternalDocs *base.ExternalDoc
  OperationId  string
  Parameters   []*Parameter
  RequestBody  *RequestBody
  Responses    *Responses
  Callbacks    map[string]*Callback
  Deprecated   *bool
  Security     []*base.SecurityRequirement
  Servers      []*Server
  Extensions   map[string]any
  low          *low.Operation
}

Pretty simple right? All the high-level models are simple and easy to navigate.

high-level models are lossy and drop all the low level details.

The low-level details are contained in the low-level model, which is what is used to construct each high-level model.

For most use-cases, the high-level models will be what most folks are looking for. There won’t be a need to know line and column numbers, or raw text node details.

It’s at this point, most people can stop reading. Enjoy!

Going low into the plumbing layer

Sometimes, there is a need to peek down into where the model came from, which line number, or column position does each key and value for each object exist in the original specification?

All high-level models in libopenapi implement the GoesLow interface. All models at any point in the hierarchy can Go Low and drop down into the low-level version of the model.

By calling the GoLow() method on each high-level model, you can enter the plumbing

For a comparison to the model used by the high-level Operation that represents the OpenAPI Operation object, here is the low-level version of the same object:

type Operation struct {
  Tags         low.NodeReference[[]low.ValueReference[string]]
  Summary      low.NodeReference[string]
  Description  low.NodeReference[string]
  ExternalDocs low.NodeReference[*base.ExternalDoc]
  OperationId  low.NodeReference[string]
  Parameters   low.NodeReference[[]low.ValueReference[*Parameter]]
  RequestBody  low.NodeReference[*RequestBody]
  Responses    low.NodeReference[*Responses]
  Callbacks    low.NodeReference[*orderedmap.Map[low.KeyReference[string]]low.ValueReference[*Callback]]
  Deprecated   low.NodeReference[bool]
  Security     low.NodeReference[[]low.ValueReference[*base.SecurityRequirement]]
  Servers      low.NodeReference[[]low.ValueReference[*Server]]
  Extensions   map[low.KeyReference[string]]low.ValueReference[any]
}

Looks similar, however, everything is contained within NodeReference, KeyReference or ValueReference containers.

These containers encapsulate the original low level text node that was extracted from the raw specification.

NodeReference

The NodeReference struct is generic and accepts type T which represents the Value represented by the node.

type NodeReference[T any] struct {
  Value T
  ValueNode *yaml.Node
  KeyNode *yaml.Node
  IsReference bool
  Reference string
}

Property	Type	Description
`Value`	T	The actual value captured by the node
`ValueNode`	*yaml.Node	The `*yaml.Node` that holds the value
`KeyNode`	*yaml.Node	The `*yaml.Node` that is the key, that contains the value
`IsReference`	bool	Is this value actually a reference ($ref) in the original tree?
`Reference`	string	If `IsReference` is true, then `Reference` contains the original $ref value.

The pointers to KeyNode and ValueNode are the original *yaml.Node values that were extracted from the OpenAPI specification when it was parsed.

The use of YAML

When dropping down to the low model, there is an extraordinarily heavy use of the *yaml.Node struct and API. We didn’t to this because we love YAML, no, in fact it’s the powerful design of the library that we love and the huge value that *yaml.Node provides.

libopenapi supports both JSON and YAML as inputs, we just use the very useful YAML API under the hood.

KeyReference

The KeyReference is a subset of the NodeReference struct, it only contains two properties.

Property	Type	Description
`Value`	T	The actual value of the key captured by the node
`KeyNode`	*yaml.Node	The `*yaml.Node` that is the key, that contains the value

KeyNode is used to represent a key in some kind of map key or array used in the spec.

ValueReference

Like its sibling, ValueReference is a subset of the NodeReference struct. Its main purpose being to point to the value of a node held in a map or an array.

Property	Type	Description
`Value`	T	The actual value of the value captured by the node
`ValueNode`	*yaml.Node	The `*yaml.Node` hat contains the original value

An example of navigating from high to low

Below is an example of iterating over a list of tags in an Operation in the high-level model, and then dropping down to the low-level model and perform the same action, but also printing out some line numbers as well.


import (
  "fmt"
  "github.com/pb33f/libopenapi"
  "github.com/pb33f/libopenapi/datamodel/low"
  "io/ioutil"
)

func main() {

  // load an OpenAPI 3 specification from bytes
  petstore, _ := os.ReadFile("petstorev3.json")

  // create a new document from specification bytes,
  // ignore the errors for the sake of brevity
  doc, _ := libopenapi.NewDocument(petstore)

  // because we know this is a v3 spec, we can build a ready to go
  // model from it - also ignore the errors.
  v3Model, _ := doc.BuildV3Model()

  // in the porcelain layer (high-level)
  // loop through paths and then for each operation
  // extract the GET operation tags.

  // high level tags extracted from the porcelain layer.
  highTags := make(map[string]int)

  // low level tags wrapped in a reference.
  lowTags := make(map[string][]*low.ValueReference[string])
    
  // iterate over the sorted map composed of path pairs.    
  for pathPairs := v3Model.Model.Paths.PathItems.First(); pathPairs != nil; pathPairs = pathPairs.Next() {
      pathItem := pathPairs.Value()
      if pathItem.Get != nil {
          for _, tag := range pathItem.Get.Tags {
              if _, ok := highTags[tag]; ok {
                 highTags[tag] = highTags[tag] + 1
              } else {
                  highTags[tag] = 1
              }
          }

          // now drop down to the low level plumbing
          // and extract low level tags from the GET operation.
          lowOperation := pathItem.Get.GoLow()

          // make sure there are tags.
          if !lowOperation.Tags.IsEmpty() {
              for _, tag := range lowOperation.Tags.Value {
                  if _, ok := lowTags[tag.Value]; ok {
                      lowTags[tag.Value] = append(lowTags[tag.Value], &tag)
                  } else {
                     lowTags[tag.Value] = []*low.ValueReference[string]{&tag}
                  }
              }
          }
      }
  }

  // iterate through the high level tags and print them all out.
  fmt.Printf("%d tags extracted from high "+
      "level GET operations:\n", len(highTags))
  for x := range highTags {
      fmt.Printf("High Tag: '%s', used %d time(s)\n", x, highTags[x])
  }

  // now do the same for the low-level,
  // but print out the line/col and the node type
  // from where they came. We have more power
  // with the low level data.

  fmt.Printf("\n%d tags extracted from low "+
      "level GET operations:\n", len(lowTags))
  for x := range lowTags {
      fmt.Printf("\nTag: %s has %d instances:\n", x, len(lowTags[x]))

      for _, lowTag := range lowTags[x] {
          fmt.Printf("--> '%s' defined on (line: %d, col: %d, nodeType: %v)\n",
              lowTag.Value,
              lowTag.ValueNode.Line,
              lowTag.ValueNode.Column,
              lowTag.ValueNode.Tag)
      }
  }
}

This will print out:

3 tags extracted from high level GET operations: High Tag: 'user', used 3 time(s) High Tag: 'pet', used 3 time(s) High Tag: 'store', used 2 time(s) 3 tags extracted from low level GET operations: Tag: user has 3 instances: --> 'user' defined on (line: 739, col: 11, nodeType: !!str) --> 'user' defined on (line: 821, col: 11, nodeType: !!str) --> 'user' defined on (line: 805, col: 11, nodeType: !!str) Tag: pet has 3 instances: --> 'pet' defined on (line: 235, col: 11, nodeType: !!str) --> 'pet' defined on (line: 294, col: 11, nodeType: !!str) --> 'pet' defined on (line: 173, col: 11, nodeType: !!str) Tag: store has 2 instances: --> 'store' defined on (line: 502, col: 11, nodeType: !!str) --> 'store' defined on (line: 577, col: 11, nodeType: !!str)

The low-level plumbing API provides much more power for engineers looking to build tools on top of the OpenAPI model.