This is a really complex topic. I had no idea how complex for quite a few years. Have had to learn a good bit of it over the last decade, and what I now know is that I don't know very much.
This will feel a bit disjoint for a while, I hope it gets better by the time I finish writing it.
My previous experience with networking was not on this scale, and that's where the complexity comes from.
I would like to have the entire Center be on IPv6, but at the moment that's not going to happen. My casual impression is that v6 continues to be not quite ready.
I need to make some pictures for this as well.
One important thing to note is that this whole problem is a large multi-machine problem, between network gateways, network switches, and servers.
Step One is that you need to know how to configure a gateway that is outside internet for WAN (as opposed to corporate upstream), the pass-through mechanisms (aka NAT rules), how to configure (or not) network switches, and how to configure actual computers and virtual machines.
Gateways and NAT rules.
The gateway serves is security interface between the outside world (WAN) and inside world (LAN).
I have been using Ubiquiti devices for this, because they do a lot more than I need for Data Center behavior (i.e., wifi networks).
NAT rules are the mechanism for how WAN and LAN translate to each other. For outbound traffic that originates inside, default behavior already exists and works fine, until you want isolation so that two computers can't bump into each other for bad reasons. A pair of NAT rules, one "Source" and one "Destination", will allow traffic originating outside to talk to one specific machine inside. This works fine, and I have used it a number of times so far, in testing. (Not providing a tutorial on NAT rules here, the key info is that you use Src/Dest NAT rules--took me a good while to find that out, with an explanation.) NAT rules are used to bash IP addresses in network data packets, just make the numbers line up properly and you're good.
The gateway(s) I am using have fiber ports for WAN and LAN, as well as 8 RJ45 ports for LAN (and one for LAN but that's a 1G uplink port, so not good on large scale. I am planning for 10Gig fiber uplink WAN, and 1Gig LAN activity, until I find that inadequate.
On the LAN side you can create multiple subnetworks in the various LAN sub-ranges. I don't know if there's an upper limit; I expect there's a practical limit.
Going along with my DC philosophy of sharding everything the network gets sharded too. The Center as a whole has multiple incoming fibers, each fiber feeds one row, each row has 20 racks, each rack has 20 servers, each server could have 20 VMs. OK, that's 8000 VMs, probably over the practical limit. We'll see if we get there. The shard is probably at the rack level, with ~400 VMs.
400 VMs is a lot of activity at the Gateway level, bu not really a problem regarding the IP addressing, and very straightforward.
What is less straightforward is the networking definition about packet routing for security and efficiency.
Networking inside a server.
It took me months on a casual basis of reading and then weeks of heavy working creating and testing different configurations until I learned enough to do half of what I want.
What I want: VMs are effectively isolated from each other on the network, preventing security problems, and routing to be efficient after that when VMs are busy doing network I/O.
The picture is like this:
I have servers, HP DL360/380, with multiple network ports on the back:
1) ILO, which is separate machine management. 1 port.
2) RJ-45 copper ethernet. 4 ports.
3) SFP/Gbic fiber ethernet. 2 ports.
I want the RJ45 ports used like this:
1) host cpu. single IP address
2) for half the VMs. effectively multiple IPs
3) for half the VMs. effectively multiple IPs
4) VM backup management. Single IP.
5) Fiber for big data
6) Fiber for big data
I do not yet have this, and there may well be a further breakdown that I need. It's certainly possible that #1 and #4 should be combined. #5 and #6 perhaps could be as well.
At the moment this is not what I have, because I don't actually have the above. What I have works at the moment, but doesn't have the isolation I really want.
Here's what I did that works:
RJ 45 port 1 is separate, as desired, will allow host access, has one IP, doesn't hit the VMs.
RJ 45 ports 2 and 3 are bonded (i.e., shared). That bond group supports a Bridge. The VMs use that bridge.
Relatively speaking, this configuration is pretty trivial, but learning exactly what it is and does took me longer than I wanted.
How it works:
Linux runtime kernel includes a Layer-2 network routing behavior. You can define various control concepts for this in huge detail, well beyond what I understand.
Because the OS/VM-manager I am using is ProxMox, the network definitions are in the file /etc/network/interfaces. You edit this with "nano", like this:
nano /etc/network/interfaces
ProxMox, upon installation, creates as your machine's network interface, one "bridge", with an IP you give it (possibly originating with DHCP). This bridge names which network port it will use (can be the fiber port), and the IP address. The host OS is what you reach on this IP (VMs will of course get their own IP or you will give a static one). VMs, as created, will default to using this bridge to get to the outside world (or other VMs if that is what you are doing). This is all the default and works fine, except that most of the network physical connections are going unused.
So an alternative would be to create a bond instead of a bridge, and tell the bond to use all the ports (or a subset, leaving the other ports unused). Using the other ports requires giving them their own IPs.
What I had hoped to do was create one IP (#1), two bridges (port #2 and port #3), where port #1 would be on one subnet, port 2 with be a separate subnet, port 3 would be a third subnet, port 4 a fourth IP , etc., such that 2 and 3 are supporting VM external I/O. That seemed to not work properly at all, two bridges seemed problematic; entirely possible I goofed up something else, I have some further experiments to do here.
Here's my final "interfaces" file content:
auto lo
iface lo inet loopback
auto nic0
iface nic0 inet static
address 192.168.1.140/24
gateway 192.168.1.1
auto nic1
iface nic1 inet manual
auto nic2
iface nic2 inet manual
iface nic3 inet manual
iface nic4 inet manual
iface nic5 inet manual
auto bond0
iface bond0 inet manual
bond-slaves nic1 nic2
bond-miimon 100
bond-mode balance-tlb
auto vmbr0
iface vmbr0 inet static
address 192.168.1.141/24
gateway 192.168.1.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
source /etc/network/interfaces.d/*
You can see "nic0" (copper port 0) has its own IP. This is for host use. The "bond" has two "slaves", copper ethernet ports nic1 and nic2, "balance-tlb" means the sharing I want here (there are other options but they seemed wrong). The "bridge" uses the bonded pair of nic1 and nic2 for I/O; VMs will be created using this bridge.
It took me a while to understand enough to use this definition properly such that it worked the way I wanted. I had wanted two have two bridges, with one copper ethernet port for each. ProxMox will not allow you to create two bridges, and for different VMs to use either one, which WOULD work the way I wanted, but it aborts creating that because creating a second one tries to create the same "default gateway" over again. You can create a second bridge by editing the file, it works how you want after that, almost. This is where I want to do a little more experimentation that I didn't get, such that the gateway has more sub-networks defined, they connect to specific nic1 or nic2, have different IP address ranges, etc, achieving a greater measure of VM isolation for security.
I can't do the next stage of learning/experimentation without buying more network switches for the physical isolation I want, where ILO is one physical network, port 1 is one physical network, port 2 is another physical network, etc. This helps with security, and loading efficiency.
The casual IP assignment right now is like this:
.1.20/24 = host machine N
.1.21/24 = bridge
.1.22-39 = VMs, with some extra cores
.1.40/24 = host machine N+1
etc. Numbered like this means approx ".1.X" is for ten host machines in the rack, ".2.X" is for the other ten.
I have the feeling I'm going to grow to dislike that, but for the moment I am able to remember it all properly, although I do have a chart on the wall
What I WANTED original probably looks like this, and I may have to do it sooner than later:
.1.16 network ID (/30)
.1.17 local gateway
.1.18 usable IP
.1.19 broadcast
.1.20 network ID (/30)
.1.21 local gateway
.1.22 usable IP
.1.23 broadcast
where each VM gets its own little micro subnet. This would require the gateway to really have a lot of subnetworks defined, in these four-pak groups. Again here I think we are bumping into practical limits, but I don't know.
More work, and more writing, on the way.
No comments:
Post a Comment