JumpServer Application Practice in Multi-Data Centers of Cloud Wisdom World

Published 2024年12月17日

Editor's note: At the "2021 JumpServer Open Source Bastion Host City Meet · Chengdu Station" event held on October 16, 2021, Zhou Zhengjun, R&D Manager of Cloud Wisdom World, delivered a speech titled "JumpServer Application Practice in Multi-Data Centers of Cloud Wisdom World". The following content is compiled from this speech.

Chengdu Cloud Wisdom World is a high-tech enterprise focused on the construction and operation of Internet towns and smart cities. Relying on strong technical capabilities in communication networks, radio and television networks, Internet and IoT, as well as full-chain implementation capabilities in construction, management, service, and operation, Cloud Wisdom World provides Internet infrastructure installation, maintenance, and value-added services for scenarios such as Internet towns, iChengdu, smart scenic areas, smart communities, smart campuses, and smart industrial parks.

In 2016, Cloud Wisdom World built Phase I of the Internet town in Chengdu, providing IT informatization construction for 29 towns around Chengdu. In 2018, Cloud Wisdom World built multiple data centers and launched a data communication traffic platform. In 2019, Cloud Wisdom World's business expanded nationwide, covering Xinjiang and Hainan.

The construction of multiple data centers means rapid expansion of IT infrastructure scale. Currently, Cloud Wisdom World has formed an IT architecture of "public cloud + private cloud + edge cloud", with 7 data centers, over 700 cloud-based services, and more than 400 computing nodes.

We adopt a cloud-network integration technical architecture, operating a series of applications in the cloud, supporting data collection for smart communities, Internet towns, and traffic platforms, as well as instruction issuance. In terms of network construction, Cloud Wisdom World currently has three networks: the IoT under smart communities, home bandwidth and data communication network composed of APs and PONs used by Internet towns, and IDC data center network.

Cloud Wisdom World R&D Manager Zhou Zhengjun

Business Pain Points

■ Cloud Scenarios

The architecture design of Cloud Wisdom World's cloud platform has its foundation based on Kubernetes plus some KVM forming a cloud virtual resource pool, with physical resources at the lower layer, middle-platform capabilities we defined at the upper layer, and the application layer at the top. In cloud scenarios, our IT operation and maintenance face several pain points: First, there are many resources, including cloud resources and virtual machine resources, making management complex; Second, due to the multi-data center and cross-region design of the IT architecture, daily operation and maintenance pressure is significant.

■ Data Communication Network Architecture

Cloud Wisdom World's data communication business network architecture accesses various network resources, then connects public WiFi and other services through aggregation switches to core network nodes, and finally delivers to the Internet. The daily operation and maintenance of the data communication network architecture also has several major pain points: First, the peak traffic is very high, requiring bandwidth exceeding 100G; Second, there are many network devices, including core, access, PON, switches, and various APs, making management complicated.

■ Smart Community

Cloud Wisdom World's smart community adopts a "Cloud-Management-Edge-End" architecture. "Cloud" refers to scenario applications at the application layer, such as property owner management and community management, mainly used for interface storage and data storage; "Management" layer refers to network access, such as ZigBee, LoRa, WiFi, Bluetooth, connecting corresponding devices through this layer; "Edge" refers to edge computing nodes, as smart communities have many security requirements such as face recognition and behavior recognition, which need edge computing nodes for corresponding computation and logical processing; "End" refers to specific terminal applications, such as access control devices, parking lot barriers, ground locks, smart home devices, and smart curtains. In the smart community business scenario, our operation and maintenance pain point is how to manage these edge devices that are placed in the community's weak current room after deploying edge computing nodes.

How to Solve Operation and Maintenance Pain Points with JumpServer?

After discussing the design and operation of Cloud Wisdom World's multi-data center technical architecture, our operation and maintenance pain points can be summarized as follows:

■ Assets in multiple data centers are too scattered, lacking unified management. Different project business requirements are not unified, without a flexible and unified management mechanism;

■ There are many types of equipment, including cloud servers, physical servers, switches, routers, etc., with relatively weak operation and maintenance means;

■ Asset permission management is coarse-grained, with Root accounts "handling everything," lacking a fine-grained resource permission management system.

After using the JumpServer bastion host, we quickly established unified organizational management capabilities, unified asset management capabilities, and unified audit capabilities, and achieved distributed high-availability deployment of JumpServer, thereby achieving unified operation and maintenance management of cloud servers, physical machines, network devices, and edge computing nodes.

1. Unified Organization Management

In daily operations, the R&D department has software-side operation and maintenance operations, the data communication business department has network device operation and maintenance operations, and property owners or some third-party operation and maintenance teams also have corresponding operation and maintenance needs. Thanks to JumpServer's advanced technical architecture design and good user experience, we obtained unified organization management functionality. We create organizations on the JumpServer platform and map corresponding department personnel to corresponding organizations to implement organization management. Through JumpServer, we can very flexibly manage the entire organization and divide permissions in fine granularity, thereby achieving unified organization management.

2. Unified Asset Management

In daily operation and maintenance work, we need to manage many types of IT assets. These include cloud resources, edge computing nodes, and physical equipment in Cloud Wisdom World's IDC rooms in Hainan, Xinjiang, and other locations.

For our current multi-data center architecture, considering that some edge computer rooms don't have many devices, we chose not to deploy JumpServer Agents in edge computer rooms, but instead built VPN networks from central computer rooms to edge computer rooms, achieving layer-3 network accessibility and unified asset management based on VPN.

Another scenario is when some edge computer rooms have many IT devices and are far from the central computer room, causing high latency through VPN network connections. For this situation, we adopted the "Core+Proxy" solution recommended by the JumpServer open source community, deploying JumpServer Core nodes in the central computer room for resource and permission management. Meanwhile, we deploy JumpServer Agent nodes in remote computer rooms. The benefit is that requests from remote computer rooms access Koko nodes through load balancing, then go to the Core component in the central computer room for permission verification, and return to the Koko component for authorized access to corresponding devices. This way, data interaction occurs at the network's remote end, not consuming central computer room network resources.

3. Distributed High-Availability Deployment

Our data centers have many core devices requiring over 99.99% high availability, so the accompanying bastion host, log system, etc., also need to support high-availability deployment solutions.

For JumpServer's high-availability deployment, we use a dual-Master (master-master) architecture, mainly considering two aspects. First is the high-availability requirements of our IT core devices themselves, and second is network layer requirements. Because our two computer rooms are both in Chengdu, connected through professional splitting devices, the network latency between computer rooms is very small, basically negligible.

Among the two Master computer rooms, one implements load balancing through "VIP+HAProxy" deployment. The computer rooms include some specific JumpServer components, as well as Redis and MySQL storage. For MySQL storage, we chose the PXC (Percona Xtradb Cluster) high-availability solution. In practical application, the PXC solution brings some throughput consumption because PXC is a strongly consistent data middleware when submitting data, so in cases of high network latency or distance from data centers, attention needs to be paid to some impacts brought by PXC. Additionally, our image files are stored in distributed clusters because we already have corresponding clusters on MINIO and Elasticsearch, so we can directly connect.

4. Unified Audit

For specific operations by operation and maintenance teams and third-party partners on devices, traceability and audit need to be implemented, and unified playback viewing is needed when failures occur. Based on JumpServer's security audit capabilities, we achieved audit of internal and external user operation behaviors, operation video playback for Linux and Windows assets, and command audit for assets and applications.

In summary, during the actual use of JumpServer, we discovered more and more highlights of this open-source bastion host. For example, the previously mentioned organization management, database connection through Web UI, work order management, login review, etc. Among them, two functions are particularly practical and frequently used by us:

① Multi-cloud asset management: Since we have many cloud assets distributed across different public cloud platforms, with the help of multi-cloud asset management functionality, we can import cloud assets into JumpServer with one click;

② Password change plan: Equal protection standards and ISO-related certifications have requirements for password usage periods, requiring password changes every week or two weeks. With the password change plan functionality, JumpServer can automatically change passwords periodically, saving a lot of operation and maintenance work.

JumpServer Benefits Different Teams

After using the JumpServer bastion host, the value we obtained from this open-source bastion host is not limited to within the operation and maintenance department. In summary, JumpServer brought better application experiences from different perspectives to Cloud Wisdom World's infrastructure team, developers, operation and maintenance team, and managers:

1. For the Infrastructure Team: The infrastructure team's requirements are very direct - solving problems with minimal cost. JumpServer is open-source software, more reasonably priced compared to other security vendors, delivering distributed secure and reliable bastion host solutions for enterprises while improving resource utilization efficiency and reducing enterprise basic operation and maintenance cost expenditure.

2. For Developers: JumpServer's community operation method has always been very appealing to developers. JumpServer open source project's Star growth and community activity are very good, providing developers with a good communication platform; Based on JumpServer's open source license agreement, we can also Fork some code to do corresponding integration, meeting enterprise personalized needs; JumpServer also provides open APIs, if we want to do unified asset management, or want to connect OA systems with corresponding assets, it can be achieved through API integration.

3. For the Operation and Maintenance Team: JumpServer provides the operation and maintenance team with an operation and maintenance security audit system, with operation review and failure playback functions very useful in daily operation and maintenance work. Regarding the operation interface, JumpServer provides a very good operation experience, simpler and more convenient than some traditional bastion host operations. Additionally, JumpServer adopts a distributed architecture design, where every component in the architecture can be expanded, facilitating our subsequent business expansion.

4. For Managers: JumpServer provides enterprise IT managers with a unified resource view, allowing them to understand IT system operation and maintenance status at any time. Based on JumpServer's unified management of enterprise IT resources, it provides guarantees for organization management. Besides this, JumpServer's comprehensive log audit and operation audit make incident playback more convenient, improving the security of the entire operation and maintenance system.

Future Plans for JumpServer Application

Finally, let's talk about our plans for using the JumpServer bastion host. In the future, we plan to migrate the entire JumpServer cluster to the Kubernetes platform, and during the migration process, we also have higher expectations for JumpServer.

First, JumpServer has many components, and more components mean a higher probability of errors. We hope to have a relatively complete operation and maintenance infrastructure to manage JumpServer components.

Second, currently the JumpServer community has provided a way to deploy JumpServer to Kubernetes through Helm Chart commands, which is very convenient for users. As an open source project, we hope the JumpServer project can provide stronger functionality in supporting high availability, high scalability, and high self-healing capabilities.