Design and development principles

This section describes the design and development methodologies followed for the architecture definition of the OCDS Open Source Central Unit of the electronic public procurement system. The general approach on design of architecture for the eProcurement system involves the Service Oriented Architecture (SOA) and Exploratory Data Analysis (EDA) architecture. These two approaches combined provide a richer, more robust model by leveraging previously unknown causal relationships in order to build a new event pattern. This new business intelligence pattern triggers further automated processing that adds exponential value to the enterprise by injecting value-added information into the recognized pattern which could not have been previously achieved.

Architectural approach

Event-driven SOA that combines the intelligence and proactiveness of EDA with the organizational capabilities found in service offerings.

Service-oriented architecture

SOA is a style of software design where services are provided to the other components by application components, through a communication protocol over a network. The basic principles of service-oriented architecture are independent of vendors, products and technologies. A service is a discrete unit of functionality that can be accessed remotely and acted upon and updated independently, such as retrieving a credit card statement online.

Event-driven architecture

EDA, is a software architecture pattern promoting the production, detection, consumption of, and reaction to events. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. Events do not travel, they just occur. However, the term event is often used metonymically to denote the notification message itself, which may lead to some confusion. Building systems around an event-driven architecture simplifies horizontal scalability in distributed computing models and makes them more resilient to failure. This is because an application state can be copied across multiple parallel snapshots for high-availability. New events can be initiated anywhere, but more importantly propagated across the network of data stores updating each as they arrive. Adding extra nodes becomes easier as well as you can simply take a copy of the application state, feed it a stream of events and run with it. Event-driven architecture can complement SOA because services can be activated by triggers fired on incoming events. This paradigm is particularly useful whenever the sink does not provide any self-contained executive

CQRS architecture style: command-query separation

In traditional architectures, the same data model is used to query and update a database. That's simple and works well for basic CRUD operations. In more complex applications, however, this approach can become unwieldy. For example, on the read side, the application may perform many different queries, returning data transfer objects (Data Transfer Objects or DTOs) with different shapes.

Object mapping can become complicated. On the write side, the model may implement complex validation and business logic. As a result, you can end up with an overly complex model that does too much. Another potential problem is that read and write workloads are often asymmetrical, with very different performance and scale requirements.

Command and Query Responsibility Segregation (CQRS) is an architecture which addresses these problems by separating reads and writes into separate models, using commands to update data, and queries to read data.

Commands should be task based, rather than data centric. Commands may be placed on a queue for asynchronous processing, rather than being processed synchronously.
Queries never modify the database. A query returns a DTO that does not encapsulate any knowledge.

Compared to the single data model used in CRUD-based systems, the use of separate query and update models for the data in CQRS-based systems simplifies design and implementation.

Figure 1. General scheme for CQRS-approach

For greater isolation, the read and the write data could be physically separated. In that case, the read database can use its own data schema that is optimized for queries.

For example, it can store a materialized view of the data, in order to avoid complex joins or mappings. It might even use a different type of data store. If separate read and write databases are used, they must be kept in sync. Typically this is accomplished by having the write model publish an event whenever it updates the DB.

Design by Contract: contract programming approach

Design by contract (DbC), also known as contract programming, programming by contract and design-by-contract programming, is an approach for designing software prescribes that software designers should define formal, precise and verifiable interface specifications for software components, which extend the ordinary definition of abstract data types with preconditions, postconditions and invariants. These specifications are referred to as "contracts", in accordance with a conceptual metaphor with the conditions and obligations of business contracts.

The DbC approach assumes all client components that invoke an operation on a server component will meet the preconditions specified as required for that operation. Where this assumption is considered too risky (as in multi-channel client-server or distributed computing) the opposite "defensive design" approach is taken, meaning that a server component tests (before or while processing a client's request) that all relevant preconditions hold true, and replies with a suitable error message if not.

Used Patterns

Materialized View pattern

Materialized views (MV) are used to generate pre-populated views of the data in one or more data stores when the data isn't ideally formatted for required query operations. This can help support efficient querying and data extraction, and improve application performance.

Context and problem

When storing data, the priority for developers and data administrators is often focused on how the data is stored, as opposed to how it's read. The chosen storage format is usually closely related to the format of the data, requirements for managing data size and data integrity, and the kind of store in use. For example, when using NoSQL document store, the data is often represented as a series of aggregates, each containing all of the information for that entity. However, this can have a negative effect on queries. When a query only needs a subset of the data from some entities, such as a summary of orders for several customers without all of the order details, it must extract all of the data for the relevant entities in order to obtain the required information.

Solution

To support efficient querying, a common solution is to generate, in advance, a view that materializes the data in a format suited to the required results set. The MV pattern describes generating pre populated views of data in environments where the source data isn't in a suitable format for querying, where generating a suitable query is difficult, or where query performance is poor due to the nature of the data or the data store. These MVs, which only contain data required by a query, allow applications to quickly obtain the information they need. In addition to joining tables or combining data entities, MVs can include the current values of calculated columns or data items, the results of combining values or executing transformations on the data items, and values specified as part of the query. A MV can even be optimized for just a single query. A key point is that a MV and the data it contains is completely disposable because it can be entirely rebuilt from the source data stores.

Figure 2a. General scheme of “Materialized View” pattern

A materialized view is never updated directly by an application, and so it's a specialized cache.

When the source data for the view changes, the view must be updated to include the new information. You can schedule this to happen automatically, or when the system detects a change to the original data. In some cases it might be necessary to regenerate the view manually. The figure shows an example of how the Materialized View pattern might be used.

How it works

The following figure shows an example of using the Materialized View pattern to generate a summary of sales. Data in the Order, OrderItem, and Customer tables in separate partitions in an Azure storage account are combined to generate a view containing the total sales value for each product in the Electronics category, along with a count of the number of customers who made purchases of each item.

Figure 2b. Tables and materialized view

Event Sourcing pattern

Instead of storing just the current state of the data in a domain, Event Sourcing uses an append-only store to record the full series of actions taken on that data. The store acts as the system of record and can be used to materialize the domain objects. This can simplify tasks in complex domains, by avoiding the need to synchronize the data model and the business domain, while improving performance, scalability, and responsiveness. It can also provide consistency for transactional data, and maintain full audit trails and history that can enable compensating actions.

Context and problem

Most applications work with data, and the typical approach is for the application to maintain the current state of the data by updating it as users work with it. For example, in the traditional Create, Read, Update, and Delete (CRUD) model a typical data process is to read data from the store, make some modifications to it, and update the current state of the data with the new values—often by using transactions that lock the data. The CRUD approach has some limitations:

CRUD systems perform update operations directly against a data store, which can slow down performance and responsiveness, and limit scalability, due to the processing overhead it requires.
In a collaborative domain with many concurrent users, data update conflicts are more likely because the update operations take place on a single item of data.
Unless there's an additional auditing mechanism that records the details of each operation in a separate log, history is lost.

Solution

The Event Sourcing pattern defines an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store. Application code sends a series of events that imperatively describe each action that has occurred on the data to the event store, where they're persisted. Each event represents a set of changes to the data. The events are persisted in an event store that acts as the system of record (the authoritative data source) about the current state of the data. The event store typically publishes these events so that consumers can be notified and can handle them if needed. Consumers could, for example, initiate tasks that apply the operations in the events to other systems, or perform any other associated action that's required to complete the operation. Notice that the application code that generates the events is decoupled from the systems that subscribe to the events.

Typical uses of the events published by the event store are to maintain materialized views of entities as actions in the application change them, and for integration with external systems. For example, a system can maintain a materialized view of all customer orders that's used to populate parts of the UI. As the application adds new orders, adds or removes items on the order, and adds shipping information, the events that describe these changes can be handled and used to update the materialized view.

In addition, at any point it's possible for applications to read the history of events, and use it to materialize the current state of an entity by playing back and consuming all the events related to that entity. This can occur on demand to materialize a domain object when handling a request, or through a scheduled task so that the state of the entity can be stored as a materialized view to support the presentation layer.

The figure №3 below shows an overview of the pattern, including some of the options for using the event stream such as creating a materialized view, integrating events with external applications and systems, and replaying events to create projections of the current state of specific entities.

The Event Sourcing pattern provides the following advantages:

Events are immutable and can be stored using an append-only operation. The user interface, workflow, or process that initiated an event can continue, and tasks that handle the events can run in the background.
Events are simple objects that describe some action that occurred, together with any associated data required to describe the action represented by the event. Events don't directly update a data store. They're simply recorded for handling at the appropriate time.
Events typically have meaning for a domain expert, whereas object-relational impedance mismatch can make complex database tables hard to understand. Tables are artificial constructs that represent the current state of the system, not the events that occurred.
Event sourcing can help prevent concurrent updates from causing conflicts because it avoids the requirement to directly update objects in the data store.
The append-only storage of events provides an audit trail that can be used to monitor actions taken against a data store, regenerate the current state as materialized views or projections by replaying the events at any time, and assist in testing and debugging the system. In addition, the requirement to use compensating events to cancel changes provides a history of changes that were reversed, which wouldn't be the case if the model simply stored the current state.
The event store raises events, and tasks perform operations in response to those events. This decoupling of the tasks from the events provides flexibility and extensibility. Tasks know about the type of event and the event data, but not about the operation that triggered the event. In addition, multiple tasks can handle each event. This enables easy integration with other services and systems that only listen for new events raised by the event store.

Figure 3. General scheme of “Event sourcing” pattern

Technologies

This section describes the development framework and technologies by the MTender system. the Technology choice has been performed according to the MTender principle of Open source, open data, open contracting data standard. The MTender CDU is based on open source applications and foster transparency and accountability of public procurement by incorporating open data and Advanced Open Contracting Data Standards.

Spring Framework

The Spring Framework provides a comprehensive programming and configuration model for modern Java-based enterprise applications - on any kind of deployment platform. Spring focuses on the "plumbing" of enterprise applications so that teams can focus on application-level business logic, without unnecessary ties to specific deployment environments.

Kotlin

Kotlin is a statically typed programming language that runs on the Java virtual machine and also can be compiled to JavaScript source code or use the LLVM compiler infrastructure.

Camunda

Camunda is an open source platform for workflow and business process management. You can model and execute BPMN 2.0, CMMN 1.1 and DMN 1.1.

Apache Kafka

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

Cassandra DB

The Apache Cassandra is a database solution for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.

JSON Web Tokens

JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed.