理解基于 docker 的现代化的服务发现

669 查看

糙译,[Warning] 继续阅读可能会感到不适

人一生不可能踩到同一滩大便,故而,本文会持续修改。


Understanding Modern Service Discovery with Docker

Over the next few posts, I'm going to be exploring the concepts of service discovery in modern service-oriented architectures, specifically around Docker. Many people aren't familiar with service discovery, so I have to start from the beginning. In this post I'm going to be explaining the problem and providing some historical context around solutions so far in this domain.

我要从头开始讲讲 service discovery, 并且要八卦一下历史。

Ultimately, we're trying to get Docker containers to easily communicate across hosts. This is seen by some as one of the next big challenges in the Docker ecosystem. Some are waiting for software-defined networking (SDN) to come and save the day. I'm also excited by SDN, but I believe that well executed service discovery is the right answer today, and will continue to be useful in a world with cheap and easy software networking.

最终,我们会做到:Docer containers 轻松跨主机交流。这是这 Docker 生态中被认为是另一个大挑战。一些观点期盼 SDN 来实现这一功能。我也觉得 SDN 非常带劲儿,蛋是,我今天相信 well excuted 的 service discovery 才是正确答案,尔切,可以在又贱又容易的网络中使用。

What is service discovery?

Service discovery tools manage how processes and services in a cluster can find and talk to one another. It involves a directory of services, registering services in that directory, and then being able to lookup and connect to services in that directory.

Service discovery 工具管理这样的事情:在一个 cluster 中 processes 和 services 能够发现,尔切 互相可以交谈。这包括了,一个目录 services , 这个目录中的注册 services ,尔切,能够查 lookup 和 connect 到目录中的 services。

At its core, service discovery is about knowing when any process in the cluster is listening on a TCP or UDP port, and being able to look up and connect to that port by name.

这当中的核心问题是,service discovery 知道 何时 cluster 中的 任意 process 在监听 TCP 和 UDP 端口,能够根据name 查找,链接到端口。

Service discovery is a general idea, not specific to Docker, but is increasingly gaining mindshare in mainstream system architecture. Traditionally associated with zero-configuration networking, its more modern use can be summarized as facilitating connections to dynamic, sometimes ephemeral services.

Service discover 是一个 general 的 idea,并不针对 Docker, 蛋是,她开始成为主流的系统 architecture。传统的体系是关于 零配置网络,更现代的用途是能够 summarized as facilitating connections to dynamic, 有时是一些短暂的 services。

This is particularly relevant today not just because of service-oriented architecture and microservices, but our increasingly dynamic compute environments to support these architectures. Already dynamic VM-based platforms like EC2 are slowly giving way to even more dynamic higher-level compute frameworks like Mesos. Docker is only contributing to this trend.

Name Resolution and DNS

You might think, "Looking up by name? Sounds like DNS." Yes, name resolution is a big part of service discovery, but DNS alone is insufficient for a number of reasons.

你也许会想,“以 name 发现,类似 DNS” 是的,name resolution 是 service discovery 的一大块内容,蛋是 DNS 单独是不够的。

A key reason is that DNS was originally not optimized for closed systems with real-time changes in name resolution. You can get away with setting TTL's to 0 in a closed environment, but this also means you need to serve and manage your own internal DNS. What highly available DNS datastore will you use? What creates and destroys DNS records for your services? Are you prepared for the archaic world of DNS RFCs and server implementations?

一个关键的原因:DNS 不是针对 封闭的实时改变的系统 而原生优化的。可以调整 TTL 到0,获得封闭的环境,蛋是,这意味着需要设定自己的内部 DNS。这里存在三个棘手问题。。。

Actually, one of the biggest drawbacks of DNS for service discovery is that DNS was designed for a world in which we used standard ports for our services. HTTP is on port 80, SSH is on port 22, and so on. In that world, all you need is the IP of the host for the service, which is what an A record gives you. Today, even with private NATs and in some cases with IPv6, our services will listen on completely non-standard, sometimes random ports. Especially with Docker, we have many applications running on the same host.

事实上,DNS 解决 service discovery 一个最大的 drawbacks 是 她本身是为 现实世界 而设计的,她使用标准 ports 来提供服务。。。。。。。。。针对 Docker,有很多 application 运行在同一个 host 中。

You may be familiar with SRV records, or "service" records, which were designed to address this problem by providing the port as well as the IP in query responses. At least in terms of a data model, this brings DNS closer to addressing modern service discovery.

Unfortunately, SRV records alone are basically dead on arrival. Have you ever used a library or API to create a socket connection that didn't ask for the port? Where do you tell it to do an SRV record lookup? You don't. You can't. It's too late. Either software explicitly supports SRV records, or DNS is effectively just a tool for resolving names to host IPs.

Despite all this, DNS is still a marvel of engineering, and even SRV records will be useful to us yet. But for all these reasons, on top of the demands of building distributed systems, most large tech companies went down a different path.

Rise of the Lock Service

In 2006, Google released a paper describing Chubby, their distributed lock service. It implemented distributed consensus based on Paxos to provide a consistent, partition-tolerant (CP in CAP theorem) key-value store that could be used for coordinating leader elections, resource locking, and reliable low-volume storage. They began to use this for internal name resolution instead of DNS.

2006年 Google 发布了一片 paper 描述了 Chubby,分布式 lock service。

Eventually, the paper inspired an open source equivalent of Chubby called Zookeeper that spun out of the Hadoop Apache project. This became the de facto standard lock server in the open source world, mainly because there were no alternatives with the same properties of high availability and reliability over performance. The Paxos consensus algorithm was also non-trivial to implement.

最终,这片 paper 激发了与 Chubby 等价的 从 Hadoop Apache 项目 分离出来的 Zookeeper。她成为了开源世界中 lock server 的事实上的标准,主要因为,并无同样高可用和可靠的替代品。 Paxos consensus 算法也一样。

Zookeeper provides similar semantics as Chubby for coordinating distributed systems, and being a consistent and highly available key-value store makes it an ideal cluster configuration store and directory of services. It's become a dependency to many major projects that require distributed coordination, including Hadoop, Storm, Mesos, Kafka, and others. Not surprisingly, it's used in mostly other Apache projects, often deployed in larger tech companies. It is quite heavyweight and not terribly accessible to "everyday" developers.

Zookeeper 提供了与 Chubby 相似的语义,用来协调分布式系统,作为一个 consistent and highly available 的 key-value 存储,使其成为理想的 cluster 配置存储服务和目录服务。她成为了很多需要 distributed coordination 项目的主要依赖部件, 包括 Hadoop, Storm, Mesos, Kafka, and others。毫不奇怪,用于其他的 Apache 项目,经常不属于大型技术公司。她是一名超重量级选手, not terribly accessible to "everyday" developers.

About a year ago, a simpler alternative to the Paxos algorithm was published called Raft. This set the stage for a real Zookeeper alternative and, sure enough, etcd was soon introduced by CoreOS. Besides being based on a simpler consensus algorithm, etcd is overall simpler. It's written in Go and lets you use HTTP to interact with it. I was extremely excited by etcd and used it in the initial architecture for Flynn.

大约一年之前,发布了一个叫做 Raft 的算法,这是一个相似并且可以替代 Paxos 算法的算法。她被作为阶段性的 真正的 Zookeeper 的 alternative,十分确定的是,etcd 很快引入 CoreOS。因为,基于一个简单的 consensus algorithm,etc 整体简化了。用 Go 编写,使用 HTTP 进行交互。etcd让我他妈了个逼的嫉妒兴奋,我用它初始化 Flynn 的架构。

Today there's also Consul by Hashicorp, which builds on the ideas of etcd. I specifically explore Consul and lock servers more in my next post.

而今,有了 Hashiicorp 的 Consul,基于 etcd 的想法构建。我在下面,特别探索了 Consul 和 lock servers。

Service Discovery Solutions

Both Consul and etcd advertise themselves as service discovery solutions. Unfortunately, that's not entirely true. They're great service directories. But this is just part of a service discovery solution. So what's missing?

We're missing exactly how to get all our software, whether custom services or off-the-shelf software, to integrate with and use the service directory. This is particularly interesting to the Docker community, which ideally has portable solutions for anything that can run in a container.

A comprehensive solution to service discovery will have three legs:

  • A consistent (ideally), highly available service directory

  • A mechanism to register services and monitor service health

  • A mechanism to lookup and connect to services

We've got good technology for the first leg, but the remaining legs, despite how they sound, aren't exactly trivial. Especially when ideally you want them to be automatic and "non-invasive." In other words, they work with non-cooperating software, not designed for a service discovery system. Luckily, Docker has both increased the demand for these properties and makes them easier to solve.

In a world where you have lots of services coming and going across many hosts, service discovery is extremely valuable, if not necessary. Even in smaller systems, a solid service discovery system should reduce the effort in configuring and connecting services together to nearly nothing. Adding the responsibility of service discovery to configuration management tools, or using a centralized message queue for everything are all-to-common alternatives that we know just don't scale.

My goal with these posts is to help you understand and arrive at a good idea of what a service discovery system should actually encompass. The next few posts will take a deeper look at each of the above mentioned legs, touching on various approaches, and ultimately explaining what I ended up doing for my soon-to-be-released project, Consulate.