Skip to content
Baseella
  • Home
  • Features
  • About
  • Pricing
  • Contact
Schedule a demo
Schedule a demo
  • Home
  • Features
  • About
  • Pricing
  • Contact
Baseella
  • Home
  • Features
  • About
  • Pricing
  • Contact
Baseella
  • Home
  • Features
  • About
  • Pricing
  • Contact
Schedule a demo
Schedule a demo
  • Home
  • Features
  • About
  • Pricing
  • Contact
Baseella
  • Home
  • Features
  • About
  • Pricing
  • Contact
Modern core banking system happy robot

Core banking and payments technology

11
  • What is a Core Banking System? 7 Key Features
  • What are Legacy Core Banking Systems? The Complex Nightmare
  • What are the key advantages of using a SaaS cloud-based banking system? Top 7 reasons why to avoid developing your own
  • Is using an open-source technology in core banking software development safe and secure? 
  • What are the advantages of using an open-source database in modern cloud-based whitelabel bank software? 
  • What advantages RESTful API has over SOAP API?
  • How does the use of GraphQL Federation enhances RESTful APIs?
  • Key principles and advantages of the microservices architecture in payment software solutions
  • What are the benefits of integrating container and orchestration technologies such as Docker and Kubernetes into the deployment of cloud-based software for bank systems?
  • What are the typical security measures undertaken by the cloud core banking systems developers to address the security concerns of financial institutions?
  • What is required of the SaaS cloud-based core banking software to enable the financial institutions to provide banking as a service or a superapps?
Modern core banking system happy robot

Regulations and compliance

15
  • What Is Confirmation of Payee?
  • What Is Verification of Payee?
  • What is PCI DSS? The best explanation
  • What are the key concerns when choosing the core banking system from the perspective of regulatory compliance?
  • What is Open Banking, and why do banks, payment institutions and e-money institutions in the EU must publish Open Banking API?
  • What is strong customer authentication (SCA) regulatory technical standard (RTS)?
  • Can push notifications be considered compliant with SCA RTS?
  • Why is it important to use multi-factor authentication (MFA) when accessing a cloud-based core banking system?
  • Why is it essential to have comprehensive user management in the banking software?
  • Why is it important for the modern cloud-based core banking system to be built around a general ledger and have a chart of accounts?
  • Is it possible to obtain necessary information for regulatory reporting if an institution uses a core banking system with no general ledger and chart of accounts?
  • Why is there a need for customer risk scoring and transaction risk scoring?
  • Why is it ineffective or even dangerous to outsource the risk scoring from a third party without having it as a part of the cloud-based core banking software?
  • What is DORA (Digital Operational Resilience Act)?
  • What is safeguarding in payments, and why is it required?
Modern core banking system happy robot

Banking, payments, and e-money

21
  • What is payment initiation service, and how it can be used?
  • What is a banking superapp and what does it offer?
  • What is Banking as a Service, or BaaS?
  • What is an Account Servicing Payment Service Provider?
  • Who are Third-Party Providers (TPPs), and what is their role?
  • What is Account Information Service, and how it can be used?
  • What is Original Credit Transaction (Visa and Mastercard) and how is it used in payments?
  • What is SEPA, and what types of payment transactions it facilitates?
  • What is Step2 and what types of payment transactions it supports?
  • What is Target2, and what types of payment transactions it supports?
  • What is Faster Payments (UK), and what types of payment transactions it supports?
  • What is Bacs, and what kind of payments it supports?
  • What is NACHA (USA), and what types of payments it supports?
  • What is SWIFT, and what types of payments it supports?
  • What is a correspondent bank, and what is its role in payments?
  • What is a ledger-centric architecture in core banking systems?
  • What is the difference between a core ledger and a payments ledger?
  • How does event-driven architecture work in payment platforms?
  • What is the role of message queues in payment systems?
  • How do core banking systems achieve high availability and fault tolerance?
  • How does multi-tenant architecture vs single tenant in SaaS core banking platforms compare?
View Categories
  • Home
  • Knowledge Base
  • Banking, payments, and e-money
  • How do core banking systems achieve high availability and fault tolerance?

How do core banking systems achieve high availability and fault tolerance?

10 min read

core-banking-high-availability

Core banking high availability and fault tolerance are two related but distinct properties of a core banking system. High availability refers to the system’s ability to remain operational and accessible to users for a defined proportion of time, typically expressed as a percentage uptime target such as 99.9 percent or 99.99 percent. Fault tolerance refers to the system’s ability to continue functioning correctly when one or more of its components experience a failure, without that failure causing the entire system to become unavailable or producing incorrect results.

A highly available system is not necessarily fault tolerant, and a fault tolerant system is not necessarily highly available, but in practice, modern core banking systems must achieve both properties simultaneously to meet the operational and regulatory expectations placed on them.

For payment institutions and e-money institutions operating on real-time payment rails such as SEPA Instant Credit Transfer and Faster Payments, these properties are particularly critical. Payment schemes impose strict processing time requirements, and a core banking system that becomes unavailable or produces incorrect results during payment processing can cause scheme rule violations, financial losses, and regulatory incidents. Under DORA, financial institutions are additionally required to demonstrate that their ICT systems are designed and operated with digital operational resilience as a core design requirement rather than an afterthought.

As illustrated in a typical high availability core banking architecture, the system is deployed across multiple availability zones within a cloud region, with each zone operating independent power, networking, and physical infrastructure. Each service runs as multiple instances distributed across these zones, with a load balancer distributing incoming requests across healthy instances.

A health monitoring system continuously checks the status of each instance, and if an instance fails its health check, the load balancer automatically removes it from the pool and stops routing traffic to it. The remaining instances absorb the traffic without service interruption. If an entire availability zone becomes unavailable, the instances in the other zones continue to serve requests, and the failed zone’s instances are automatically replaced in the remaining zones.

Key Takeaways: #
  • High availability in core banking systems is achieved through a combination of redundant infrastructure, automated failover, load balancing, and health monitoring, designed to ensure that the system remains operational even when individual components fail
  • Fault tolerance is the system’s ability to continue functioning correctly in the presence of component failures, achieved through techniques including replication, circuit breakers, graceful degradation, and stateless service design that prevent individual failures from cascading across the platform
  • Under DORA, payment institutions and e-money institutions must define recovery time objectives and recovery point objectives for their critical ICT systems, conduct regular resilience testing, and demonstrate to their national competent authority that their core banking infrastructure can withstand and recover from ICT incidents including infrastructure failures, cyber incidents, and third-party service disruptions

Core banking high availability and fault tolerance are two related but distinct properties of a core banking system. Core banking high availability refers to the system’s ability to remain operational and accessible to users for a defined proportion of time, typically expressed as a percentage uptime target such as 99.9 percent or 99.99 percent. Fault tolerance refers to the system’s ability to continue functioning correctly when one or more of its components experience a failure, without that failure causing the entire system to become unavailable or producing incorrect results. A highly available system is not necessarily fault tolerant, and a fault tolerant system is not necessarily highly available, but in practice, modern core banking systems must achieve both properties simultaneously to meet the operational and regulatory expectations placed on them.

For payment institutions and e-money institutions operating on real-time payment rails such as SEPA Instant Credit Transfer and Faster Payments, these properties are particularly critical. Payment schemes impose strict processing time requirements, and a core banking system that becomes unavailable or produces incorrect results during payment processing can cause scheme rule violations, financial losses, and regulatory incidents. Under DORA, financial institutions are additionally required to demonstrate that their ICT systems are designed and operated with digital operational resilience as a core design requirement rather than an afterthought.

As illustrated in a typical high availability core banking architecture, the system is deployed across multiple availability zones within a cloud region, with each zone operating independent power, networking, and physical infrastructure. Each service runs as multiple instances distributed across these zones, with a load balancer distributing incoming requests across healthy instances. A health monitoring system continuously checks the status of each instance, and if an instance fails its health check, the load balancer automatically removes it from the pool and stops routing traffic to it. The remaining instances absorb the traffic without service interruption. If an entire availability zone becomes unavailable, the instances in the other zones continue to serve requests, and the failed zone’s instances are automatically replaced in the remaining zones.

How Core Banking High Availability Is Achieved #

Redundant infrastructure and multi-zone deployment: The foundational requirement for high availability is the elimination of single points of failure across all layers of the core banking infrastructure. A single point of failure is any component whose failure would render the entire system unavailable. In a cloud-based core banking deployment, single points of failure are eliminated by deploying each infrastructure component, including application servers, databases, message brokers, load balancers, and network gateways, with redundant instances across multiple availability zones. The failure of any single instance or any single availability zone does not cause system unavailability, as redundant instances in other zones continue to operate.

Load balancing and traffic distribution: Load balancers distribute incoming requests across multiple instances of each service, ensuring that no single instance is overwhelmed by traffic and that requests are automatically redirected to healthy instances when a failure occurs. In core banking systems, load balancing operates at multiple layers, including the network layer for routing incoming API and user traffic, the application layer for distributing processing requests across service instances, and the database layer for distributing read queries across database replicas. Cloud-native load balancers provide health check integration, automatically removing unhealthy instances from the traffic pool and adding them back when they recover.

Automated failover: Automated failover is the mechanism by which a core banking system automatically switches from a failed primary component to a standby component without requiring manual intervention. In database infrastructure, automated failover is achieved through primary-replica replication, in which a replica database maintains a continuously updated copy of the primary database’s data and is automatically promoted to primary if the primary becomes unavailable. The failover process, including the detection of the primary failure, the promotion of the replica, and the redirection of database connections, must complete within the institution’s defined recovery time objective for the database layer. For core banking systems where account balance accuracy is critical, the replication must be synchronous, meaning that each write to the primary is confirmed only after it has been replicated to the standby, ensuring zero data loss in a failover scenario.

Health monitoring and alerting: Continuous health monitoring is the mechanism by which the core banking system detects component failures and triggers automated remediation. Each service instance exposes health check endpoints that the monitoring infrastructure queries at defined intervals, assessing whether the instance is running, whether it can connect to its dependencies, and whether it is performing within acceptable response time thresholds. Instances that fail health checks are automatically removed from the load balancer pool and replaced. Monitoring data is aggregated into a centralised observability platform that provides real-time visibility into the health, performance, and resource utilisation of all system components, enabling operations teams to identify and respond to emerging issues before they affect system availability.

How Fault Tolerance Is Achieved #

Service replication and stateless design: Fault tolerance at the service level is achieved by running multiple instances of each service simultaneously and designing each service to be stateless, meaning that it does not store session data or processing state locally. Because each request contains all the information required to process it, any instance of the service can handle any request, and the failure of one instance does not affect the ability of other instances to continue processing. Stateless service design is a prerequisite for effective horizontal scaling and automated failover, as it ensures that traffic can be redistributed across instances without loss of processing context.

Circuit breakers: A circuit breaker is a fault tolerance pattern that prevents a failing dependency from causing a cascade of failures across the payment platform. When a service makes repeated requests to a dependency, such as an external payment network or a downstream microservice, and those requests consistently fail or time out, the circuit breaker opens, stopping further requests to the failing dependency for a defined period. During this open period, the service applies fallback logic, such as queuing the request for later processing or returning a defined error response, rather than continuing to make failing requests that consume resources and increase latency. After the circuit breaker timeout period, it enters a half-open state, allowing a small number of test requests through to determine whether the dependency has recovered, and closing fully if those requests succeed.

Graceful degradation: Graceful degradation is the ability of a core banking system to continue providing a reduced but functional service when some components are unavailable, rather than failing completely. In a payment platform, graceful degradation might mean continuing to process payment instructions and update account balances when the notification service is unavailable, even though customers do not receive real-time payment confirmations until the notification service recovers. The system prioritises the critical payment processing and ledger posting functions, while non-critical functions such as notification delivery, analytics processing, and report generation are queued for processing when the affected services recover.

Data replication and consistency: Fault tolerance at the data layer requires that data is replicated across multiple storage nodes so that the failure of any individual node does not result in data loss. For core banking systems, where the integrity and completeness of financial records is a regulatory requirement, data replication must be configured with appropriate consistency guarantees. Synchronous replication, in which a write is confirmed only after it has been committed to all replicas, provides the strongest consistency guarantee but introduces latency on write operations. Asynchronous replication provides lower write latency but introduces the risk of a small amount of data loss if the primary fails before the replica has received the latest writes. The appropriate replication model for each data store depends on the criticality of the data and the institution’s recovery point objective for that data layer.

High Availability, Fault Tolerance, and DORA #

DORA requires financial institutions to implement ICT systems that meet defined standards of digital operational resilience, and the core banking high availability and fault tolerance architecture of the core banking system is central to meeting these requirements. Institutions must define recovery time objectives and recovery point objectives for each critical ICT system, implement and test the technical controls required to meet those objectives, and conduct regular resilience testing to verify that the controls function as designed under realistic failure scenarios.

DORA’s resilience testing requirements include basic vulnerability assessments and scenario-based testing for all financial entities, and Threat-Led Penetration Testing for significant entities. Scenario-based resilience testing for core banking systems should include infrastructure failure scenarios such as availability zone outages, database primary failures, and message broker unavailability, as well as application-level failure scenarios such as service instance failures and circuit breaker activation. The results of resilience testing must be documented and used to drive improvements to the system’s high availability and fault tolerance architecture on an ongoing basis.

FAQ: #

What is the difference between recovery time objective (RTO) and recovery point objective (RPO)?

  • Recovery time objective (RTO) is the maximum acceptable duration of downtime for a system or service following a failure or incident, measured from the point at which the failure occurs to the point at which the system is restored to operational status. Recovery point objective (RPO) is the maximum acceptable amount of data loss following a failure, measured as the time elapsed between the last successful data backup or replication and the point of failure. A core banking system with an RTO of one hour must be restorable to operational status within one hour of a failure.
  • A system with an RPO of zero requires synchronous replication so that no data is lost in a failover scenario. Both metrics must be defined for each critical ICT system under DORA, and the technical controls implemented must be demonstrably capable of meeting the defined objectives under realistic failure conditions.

How does high availability differ between on-premises and cloud-based core banking deployments?

  • In an on-premises deployment, high availability requires the institution to procure, install, and maintain redundant physical hardware across multiple data centre locations, manage network connectivity between those locations, and operate the monitoring and failover infrastructure independently. The capital cost of on-premises high availability infrastructure is substantial, and the time required to provision additional capacity in response to failures or demand spikes is measured in days or weeks rather than seconds.
  • In a cloud-based deployment, high availability is achieved through the cloud provider’s multi-availability-zone infrastructure, with redundant instances provisioned and managed through software configuration rather than physical hardware procurement. Failover, scaling, and health monitoring are automated through cloud-native services, and additional capacity can be provisioned within seconds. For payment institutions and e-money institutions subject to DORA’s operational resilience requirements, cloud-based high availability architecture is generally more cost-effective and more reliably testable than on-premises equivalents.
Updated on April 29, 2026
Share This Article :
  • Facebook
  • X
  • LinkedIn

Powered by BetterDocs

Table of Contents
  • Key Takeaways:
  • How Core Banking High Availability Is Achieved
  • How Fault Tolerance Is Achieved
  • High Availability, Fault Tolerance, and DORA
  • FAQ:
Pages

  • Features
  • About
  • Pricing
  • Contact
Resources

  • Knowledge base
  • Blog
ISO sertificate

Copyright © 2026 Baseella Ltd

  • Privacy
  • Cookies
  • Terms and Conditions

Stay Ahead in Banking Innovation!

 

Subscribe to our blog and get the latest insights on core banking technologies, industry trends, and expert advice delivered straight to your inbox.

✅ Exclusive Content: From in-depth articles and case studies to interviews with banking leaders and tech innovators.

✅ Early Access: Be the first to know about our newest features, updates, and exclusive offers.

✅ Empower Your Institution: Gain actionable tips and strategies to drive digital transformation and enhance your banking services.

Join our community of banking professionals today!

Loading