Why do I care about IPv6?

Let me spend some lines, just to introduce why I care about this topic.
As many of my friends know, I’m an IPv6 lover and I like to give presentations to teach how it works and how to setup it.
The IPv6 standard was published nearly 20 years ago and until ~10 years ago there were very few sites enabling it, as well as few autonous systems able to route it.
February 2011 marked the date when IANA (no-profit organization managing the IPv4/IPv6 address space) exhausted its free pool of IPv4 address /8 blocks for assignment to regional registries. This didn’t changed much the availability of IPv4 for the normal customers, but many regional registries and big corporations started to ask themselves about the future of IPv4 address availability.
So, the IPv6 deployment drastically increased the next year after the IPv6 world launch in 2012, when several big companies (Facebook, Google, Cisco, …) deployed and enabled IPv6 for all the customers. Many network operators started to follow the wave and the number of deployments continued to increase, especially when ARIN (the north american Internet registry), completely exhausted its free pool of IPv4 addresses assignable to customers.
Google is continously measuring the percentage of visitors connecting to google.com via IPv6. Google’s data shows how the deployment of IPv6 is currently (February 2016) 10% worldwide, ~23% in the United States alone, and continously increasing exponentially.
This means that, willing or not, we have to move to IPv6 now. It’s not a matter of setting up shitty networks just to put the “IPv6 ready” logo on your homepage. It’s a matter of getting it working.

IPv6 is not complicated. It has some similarities to IPv4 on many things, but after all it’s a new protocol with new features and new methods of address assigning, multicast etc…

While some hosting providers (Online.net) did understand how to setup and operate it flawlessly; others are really doing an awful job (OVH).

So, let’s see what’s wrong with OVH and why Online.net is better.

OVH

First, let me be very clear: I’ve been an OVH customer for more than 6 years with dedicated servers, RPS, kimsufi (old one, new one), domains and I’ve even tested new beta services. I enjoyed many of their services and offers. However they deeply disappointed me for the following pitfalls, expecially considering that they were one of the first hosting providers in Europe to offer IPv6.
IPv6 on OVH works. The problem is how “it works” and the number of limitations that you have (expecially compared to a different hosting provider).

Kimsufi

It all started when I bought two kimsufi servers on OVH. One was personal (I was hosting a research website: www.rpki.me), another one was for my university club.

/128

The first thing that I’ve noted was the subnet size: all kimsufi servers are sold with a /128 IPv6 prefix.
OVH is currently announcing on public BGP the following prefix for the european region: 2001:41d0::/32 .The latest RFC regarding IPv6 address assignments suggests to assign a /48 prefix to each end-site. Let’s assume that OVH would assign a /48 for each server. This means that there are 2^(48-32) = 65’000 subnets available.
Has OVH more than 65k servers? Probably yes. Then they could buy a new /32 prefix and address 65k more servers. A /32 IPv6 prefix is not so expensive and if they have so many servers, money should not be an issue. In fact they have three /32, one for Asia, one for Europe and one for Canada.

OK, so maybe /48 should be reserved to big customers, while others may use a  /56 instead, which is less than the best RFC suggestion, but still enough to allow each server to split the subnet in multiple /64s (for example for VM networks, VPNs, etc..). In this case they would have 2^(56-32) = 16’777’216 subnets /56. Has OVH more than 16M servers? I don’t think so. So a /56 could be reasonable, maybe even a larger allocation for each one.

But let’s say that they completely ignore the possibility for someone to be interested in having mutiple subnets, but of course you may use multiple addresses on your server. Maybe they would choose a /64 assignment for each site in order to change the address assignment method in the future to SLAAC.
In case of a /64 assignment, they would have 2^(64-32) = 4’294’967’296 subnets /64. Has OVH more than 4G servers? No. Will they ever have that much? No.

What did they do? A /128. A single IPv6 address. Do you need another address on your server to handle mutiple websites or more services on the same port? They don’t care. Do you need to setup a VM using IPv6? Why don’t you choose the idiot idea of doing NAT on IPv6? Why not? (Just kidding, but NAT is bad).

RFC6177 also specifically says: “It is no longer recommended that /128s be given out. While there may be some cases where assigning only a single address may be justified, a site, by definition, implies multiple subnets and multiple devices“. You may argue that a dedicated server is not “a site”. However I would consider it a site since the datacenter is not your too and you get addresses as a service from another authority (which doesn’t know what you want to do with those addresses).

The only possible idea I have regarding the /128 is this: someone did generate a lot of traffic using a lot of addresses in a /64, and some crappy router that they have crashed or had problems due to fill of NDP cache table. So they thought it was a good idea to proceed in OVH-style to solve the problem. This problem could have been solved instead, by buying decent networking gear, setting limits on caches or delegating the subnet instead of generating NDP neighbor solicitation requests (as we will see later).

Hopefully, it is a lie. For some reason all kimsufi servers are setupped with a /128 IPv6 address, but you can use the whole /64 of the provided address just by changing the prefix length on your interface. It’s easy to understand that it’s a lie because they assign the IPv6 address ending with ::1 to each customer. So how could they use the same /64 for two customers, if both of them need that specific address of the subnet?

Why did they lie? I don’t know. Probably to let kimsufi customers know that they don’t deserve a decent service and that those servers are only for “testing purposes”,  as they like to underline.

Static addresses and stange routers

While on IPv4 you are stuck with static addresses or DHCP, in IPv6 you have more choice. You could use static, DHCPv6 for addresses and router advertisements for routes, SLAAC, both DHCPv6 and SLAAC or force one of the two, or you could use DHCPv6 PD (prefix delegation).
What did OVH choose? Everything static. It’s a server, could make sense. It’s annoying because you have to search for your own address on the manager, but you have to do only once.
OK, but given that I know my address, which is the address of the router?
A guide on the OVH website illustrate how to assign the address and the default route. Let’s ignore the fact that all the commands are based on the deprecated ifconfig tool. In order to setup a default route, you have to create a static route to a specific address (of the router) and then route all the traffic through that.
The router address is computed by picking your own /64, computing the /56 (even though the “/56” concept is not even discussed in the guide) containing it and adding ff:ff:ff:ff:ff, or more properly writing, by adding ff:00ff:00ff:00ff:00ff to the /56 prefix.
The example on the page is misleading because it’s considering a particular case (where the host’s address have the 4th group with a leading zero).
Moreover the page is even more misleading by saying “5x FF”, which is not correct, otherwise the address would be ….:ffff:ffff:ff00:0000.

My understanding is that each server has its own /64, there is a /56 containing some of these /64s and there is a single IPv6 router per /56 subnet. The router is located in the last /64 of the /56. Each host is able to reach the router directly on layer 2.

Why not using the address “all-zeros” or ending in “::1” of the last /64 for the router?
If the router is in the same layer 2, answering to neighbor solicitations, why not use a link-local address such as fe80::?

As a side note, OVH completely forgot to mention IPv6-reachable DNS servers to use.

Security? Isn’t that only for v4?

A friend of mine wrote me “you told me that the /128 is a /64, but I’m actually able to use the whole /56 !“. My friend was partially correct. You don’t have the whole /56, but nobody will stop you from stealing it. All servers of the same /56 seem to be in the same layer 2 network, and there is no protection in place preventing you to assign any address in the /56. The router will send a neighbor request for that address and you can reply to that. This is not an IPv6 security issue. This is a shitty setup issue.
I’m not sure if the same problem is present on IPv4 for the same network.

NDP vs. delegating the prefix

One of the main issues with the OVH IPv6 setup is that they do not delegate the IPv6 prefix. I will later show how this is valid also for servers other than the kimsufi.

When you send an IPv6 packet from the Internet to one of the IPv6 addresses of your server, the router responsable of forwarding the packet to your server will generate a neighbor solicitation request (part of the NDP protocol), asking which is the MAC address associated to the given destination address of the packet. Your server will receive the solicitation and reply with a neighbor advertisement.

This approach might be fine for an access network, but it’s not good for a server. A server might receive traffic on a lot of addresses, and this process will not scale and could eventually give problems to the NDP cache of the router (expecially if not properly implemented).
Let’s say your server has 2001:41d0:8:17d8::1/64 and you want to run several virtual machines inside the server using IPv6. There are only two ways for doing this: bridging the server’s network interface with the VM one (probably a bad idea), or route the traffic for a slice of the subnet to the vm (better idea).
So you split a slice of your subnet, in order to use it only for the VMs. You add a static route pointing 2001:41d0:8:17d8:1000:/65 to the hypervisor interface.
However, when a packet destinated to the VM arrives from the Internet through the OVH router, the router will send a neighbor solicitation in order to find who has the address. No one will reply, since the VM is routed behind the server and do not receive neighbor solicitations.

The dirty workaround is to use the NDP proxy function of the Linux kernel, in order to allow the server to reply on behalf of the VM.
The Linux kernel allows us to proxy only single addresses. So if we want to reply to neighbor solicitation for the whole /65 we have to compile and configure a NDP proxy daemon program, just to reply to these useless requests.

If we want to setup a VPN providing IPv4 and IPv6 connectivity, using the address space of the server, we will run in the exact same problem.

The solution to this problem, and also the previously mentioned security problem, is to get the server’s prefix delegated by the router. This is possible through DHCPv6 PD or more simply by creating a static route on the router in the datacenter, routing all the traffic for the /64 prefix to the server’s link-local address for example. Setupping the static route with an automatic script is matter of minutes to setup. This would also solve the problem of filling the NDP cache which probably conviced OVH to use /128s.

Reverse DNS delegation

This could be useful for a number of things, such as email servers which need the correct reverse DNS on the address they listen to.

From the kimsufi administration web interface it’s possible to configure a reverse DNS for each IPv4 address. It is also possible to configure a reverse DNS, but only for a single IPv6 address, not the whole subnet. OK. The server is sold as a /128, so it could make sense, but we will see later that this does not change much also for a server sold with a /64.

Kimsufi broken IPv6 connectivity

As I said, I had more than one kimsufi, purchased in different time frames. Sadly the second one came with IPv6 connectivity broken.

A friend told me “yes, probably it’s broken, but nobody will notice except you“.
Wrong: as soon as I’ve booted the fresh-installed system pre-configured by OVH, and I run apt-get update , the command started and immediately halted waiting for connection to repositories. APT by default is trying over IPv6, which was broken, so I was not even able to install anything, unless forcing the use of IPv4.

The problem

After some troubleshooting I’ve quickly found out that the IPv6 gateway (…ff:ff:ff:ff:ff) was not replying to neighbor solicitations. I beleve this was very likely to be a VRRP/HSRP or loopback address configuration error on their router.

The ticket

I did the “standard” process. On the kimsufi you can’t open a proper ticket, but you can send a message to the technical assistence. So I did, and I’ve been waiting for 5 days. No reply.
I’ve sent again.
After 6 days they reply “Please include the complete IP configuration in your reply. Once we can rule out any configuration errors I can then forward this on to our technicians and have them correct the issue.”

I thought it was clear enough already. Anyway I copy the output of all the commands clearly showing where the problem is.

In the meantime there is a thread on the kimsufi forum, with several people reporting the exact same issue. Many people wrote to the support and got no reply.

After 12 days they reply “The IPv4 configuration is incorrect. Your IP is isolated /32 shoud should be […]”. My problem was on IPv6, what are they talking about? I double check what they wrote and then I reply immediately saying that they didn’t understood the problem, and I also link the thread on the forum.

After few hours they reply “I can ask a technician to have a look into this but first I need to confirm you have a full backup of your data and can authorise an intervention before we proceed?”. I agree and wait.

2 days later: “Please configure the IP as described in the following link. It will then work fine. http://help.ovh.co.uk/Ipv4Ipv6 “. Of course they didn’t solve anything and following the guide was useless. I decide to ignore the problem and I setup an IPv6 tunnel with hurricane electric (perfectly working with a /48, but slower because tunneled far away).

So: IPv6 broken for more than 1 month and I gave up on trying again.

OVH-style solution

In the meantime, the number of people experiencing the same problem with IPv6 was increasing. Probably someone at OVH noticed the problem and decided to apply an OVH-style solution: filter all AAAA DNS records in the DNS replies sent out from OVH’s DNS server. In this way everyone can buy the server, boot and run apt-get update without noticing IPv6 being completely broken.

# host -t AAAA google.com
google.com has no AAAA record

# host -t AAAA google.com 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:
google.com has IPv6 address 2a00:1450:4007:807::1001

I have no words.

SoYouStart

Now you may think: “OK. But you are speaking about Kimsufi servers. Those are test servers completely unreliabile, tought for tests, not real stuff”.
Sadly I’ve had to manage also a SoYouStart E3-SAT-1 , which is a production-grade server with SLA and assistance.

So a /48 finally?

No. The SoYouStart IPv6 setup is basically identical to the kimsufi’s one, except for the fact that they officially provide a /64. Yes: you are paying 30€/month of rent and you only get a single /64.

SLAAC? What is that? Ah yes, but it’s broken

When I was setting up the server I did the partitioning and distro install by myself in order to tune a couple of things, so the IPv6 setup was not there ready.

After booting the server, I’ve noticed that as soon as the network interface came up, I had a default route in the IPv6 routing table. Good! So they enabled SLAAC? No, because I got only the route, not the automatically assigned address. Anyway I got a route advertisment from the router, so I just have to assign the IPv6 address and I’m ready.

Everything works great. For about 30 minutes. After 30 minutes I don’t have a default route anymore. I bring down and up the interface, and now I have again the default route.
I notice however that, the default route learned through neighbor advertisments, has an expire time: it can only be used for 30 min. In that period of time the route should be reannounced periodically by the router. However, for some reason, OVH’s routers are not sending router advertisements periodically. They reply only when a router solicitation is sent.
However this is not DHCPv4. This is IPv6: the host, complying the standard, will only send a router solicitation when it brings up the interface. This should be enough since router advertisements are send periodically.

So after all you can’t use router advertisements , because they are broken in OVH.

So I disabled accepting router advertisments via sysctl and configured the static address as explained in the old guide.

The question here is: why on earth sending broken router advertisements if you don’t support them? Just disable them completely, otherwise people without the correct sysctl might get IPv6 broken by OVH routers.

Security

Security didn’t change from the Kimsufi. You can still use addresses of other people’s /64s.

NDP vs. Prefix delegation

Does OVH delegate the /64 IPv6 prefix instead of using NDP, on the soyoustart? No. Same as kimsufi.

Reverse DNS delegation

Do they allow you to delegate the reverse DNS zone of your /64? No. If you want you can add each address to the web interface, or using APIs, but they don’t delegate.

 

Online.net

I recently purchased a couple of servers from Online.net and I’ve experienced a much better setup instead:

Addressing and prefix delegation

Each user holding an online.net account can request a /48 IPv6 prefix. This is possible for all users, both if you take a 30€/month server or if you take a 5€/month one. This prefix can be splitted from the manager into several /56. Each /56 that you slice from the /48 , is associated with a special unique DUID.

The server receive the default route normally via router advertisements, then sends a DHCPv6 request containing the DUID of the chosen subnet to be used. Online.net DHCPv6 server will lookup which is the subnet associated to the DUID and will reply delegating the entire prefix to the host (as discussed before). So it’s possible to setup VPNs, VMs and other things without having to deal with NDP proxying.

The DHCPv6 daemon is continously running to renew the delegation of the prefix, and router advertisements are continously coming to the routing table.

Online.net also have good documentation explaining how to setup the DHCPv6 client correctly.

Reverse DNS delegation

In the manager interface is possible to specify two addresses of DNS servers authoritative for the reverse DNS of the /48.

Security

As far I can tell, because of the absence of neighbor solicitations, it’s not possible to steal addresses of other servers, nor I can see any “public” NDP traffic on the network interface.

 

 

 


IPv4 addresses are not so many. When Internet started a long time ago, 2³² addresses (more than 4 billion) seemed to be a lot. Today the IANA (the organization which administrate addresses and “numbers” of the Internet) have already completed the assignment of all IPv4 address blocks to all “regional” authorities of several parts of the planet.
For this and other reasons we are now inside a migration process to the new IPv6 protocol, which provide 128-bit addresses. A lot of estimations shown that this new kind of addresses will be enough for a lot of time, even we will use them in a very “bad way”.
The migration process already started officially in 2008 and will go ahead for several years. We are already using a lot of migration techniques on several networks (which provide both IPv4 and IPv6 connectivity). Since June 2011, Facebook, Google and several “bigs” of the Internet, started offering IPv6 services to all customers around the world. As today, the “T-Mobile” mobile operator in the  US is providing IPv6 connectivity to every mobile phone customer.

Actually this lack of IPv4 addresses was already predicted and known during the 90′, when the NAT (Network Address Translation) was created on purpose of “temporarily” solve the problem. This “temporarily” has been very long, because we still use it today and more than ever.

But, first of all, let me clarify some definitions. Plain “NAT” means the translation of public IPv4 prefix, into a “private” IPv4 prefix and vice-versa (where “private” = prefix part of the private blocks specified in RFC1918). This means that we do a one-to-one association between public and private addresses. For each public address we associate a private address.
For example, let’s suppose that we have a router with two interfaces, one to the Internet and one to a local LAN network which uses private IPv4 addresses.
Let’s also suppose we have 10 computers on the LAN and 10 public IP address available for Internet use. The plain NAT means that we could use 10 private addresses on the computers on the LAN and program the router in a way that, for each incoming IPv4 packet it will change the destination IP address with a private address and vice-versa, based on a association table. This permits to easily change public IP addresses of all machines (for example when we change ISP provider).

But this “plain NAT” is quite never used today. What I want to talk about is the NAPT (Network Address and Port Translation) , which is often called just “NAT”, giving origin to this ambiguity.

The NAPT works in this way: let’s suppose we have again a local network and a router as before, but we just have one public IPv4 address and 10 computers on the local network. How can we connect all computers to the Internet? So first let’s do a step back and let’s see in a very easy way how a connection works.
Every computer wants to send/receive TCP and UDP packets. Each TCP or UDP packets, can be identified by two “port” numbers, a source port and a destination port. The source port is an identifier to understand from which application of the host from which the packet came, while the destination port is an identifier of the application to connect on the destination host. The destination host will swap destination and source port numbers in order to send back responses.

So, we will use private IP addresses as before on the local network and we will keep the public IP address on the router. The router will use the NAPT, so when an initial packet come from the local network, the router will check the source port of the TCP packet and it will check the free source port numbers on its public IPv4 address. The router will then choose a free source port for its address and will note in a table: “the local network host with address x.y.z.w has established a connection to a remote host k.h.j.l and all packets coming from x.y.z.w  with source port X must be rewritten with my source port Y and my public address as source”. In a similar way it will read on reverse-way the previous route when a packet comes from the Internet. When a packet comes from the Internet, the router will check if the destination address is its public IP, if so it will then check the destination port and see if that port is used for some NAPT mapping. If the port is part of a NAPT mapping rule, it will find the row in the table, change the destination port y with the previously changed port X and the destination address with the private address x.y.z.w of the LAN host. All this could be also be deployed using more than one public address on the router, in order to increase the number of parallel connections (we will see that later).

Even if this system allows you to “solve” the IPv4 exhaustion problem, a lot of network administrators and people in the Internet community think that this is really an abomination and horror for TCP/IP network protocol suite. In the following I will explain to you why I agree with them:

10 Reasons for which NAPT (NAT) should not be used:

  1. NAPT goes against the most important rule about protocol layering, which says that a protocol of level k should not make any assumption about protocols of level k+1. A router should just take IP packets coming and route them using a routing table. Instead, by using NAPT,  the router also has to check the IP packet payload (the TCP/UDP protocol header) in order to check port numbers. This should not be done according to the layering rule and it also cause a not expected overhead on a embedded device which should just forward packets but also starts to trace all TCP/UDP connections going over.
  2. NAPT is against the hierarchical model of IP, which says that any host connected to the Internet is identified globally in unique way by an IP address. In some cases this could be a good thing for our anonymity, but surely it is not a warranty that we could not be traced anyway in several other ways.
  3. Internet applications should not be forced to use TCP and UDP. If a user behind a NAT chooses to use a different (layer 4) transport protocol, he will not be able to use it, because NAPT works only on TCP and UDP checking port numbers. On some transport protocol there are no port numbers. ICMP for example, the control protocol for IP workings and network diagnostics, does not use port numbers but it’s necessary anyway. For this reason NAPT routers employ several workarounds in order to send/receive ICMP packets correctly (watching other numbers inside the ICMP packets). This is obviously other “hard” work for the router. We should not forget that each time that we add something to do on the router side, the workload for each packet increase and some network lines could slowly increase latency and congestions.
    This means that manufacturers will need powerful hardware and so higher costs (and energy consumption). I understand that this is a bit exaggerated view of the problem, but if we have really a lot of traffic (and not just 10 computers),  we should start to consider also these problems.
  4. With NAPT, the router has to re-compute the IP and  TCP/UDP/ICMP checksums. The IPv4 header contains a checksum of itself. Since we change IP address in the header in order to perform NAPT, we have to recalculate this checksum. TCP also have a checksum, which is calculated using the source and destination address of the IP header. The UDP checksum is not usually enabled, but ICMP also has a checksum. This is not a big problem, but if we have to NAPT a large network we will have a lot of overhead in order to process so many checksum operations at wire-speed, so we will again need more powerful hardware.
  5. Handling IP fragments is hard! TCP/UDP port numbers are present only in the initial header of the transport-level frame. If a TCP/UDP packet gets fragmented on the path, the NAPT router will need to keep track of fragment numbers. If two transmitters choose the same Fragment Identifier, the router might not be able to understand which is the correct translation for the packet (it has two possible translations). It’s a quite rare case but it is still possible.
  6. The parallel number of concurrent connections decrease. This is because the NAPT router should use one of its free source TCP/UDP ports for each connection. Since the available ports are 2^16 (actually the first 4096 should not be used for NAPT), we can’t have more than 65535 connections for each public IP address used by the NAPT router. You could think that it’s a big number, but if we use just one IP address for the network of a small company or a university campus (and maybe someone is doing port-scanning or other bad things), the router will exhaust port numbers and the network will not accept new connections. If we have 100 computers behind the NAPT, each one could do only about 655 concurrent connections. Try to open a p2p software on some of them and see what happens.
    So for this reasons, manly on corporate networks, we use more than one public IPv4 address: we use a pool of IPv4 public addresses, in order to have more parallel connections available. So at this point, was it really necessary to use NAPT if we had other public addresses? Maybe for a university campus the answer would be “yes”, but for a small office could be “no”.
  7. NAPT transforms the Internet from a connectionless network, to a connection-oriented network. The NAPT router must keep track of all information related to a connection going over it. So, if a NAPT router goes down and its NAPT mapping table goes away, all TCP/UDP/ICMP connection will not be able to continue when the router comes back online. Instead, without NAPT, when a router goes down and then up, the endpoints using a TCP connection will see just a short lag (router’s reboot time) in communication. UDP also has problems: let’s say for example that we are doing a VoIP call when someone reboots the router. VoIP usually works over RTP protocol (which runs over UDP). If we are using a normal router we will just have a short silence in communication, then the audio will come back. Instead, when behind the NAPT, the VoIP call will be lost and we should call back again. I know, it’s not a so common scenario, but it shows how much NAPT is evil.
  8. Usually each host connected to the Internet is able to connect to some servers or expose some services on the network, so that other people could connect. Sadly, using the NAPT we can’t get incoming connections to our host. You could say that it is possible to configure the NAPT router to forward some ports. So let’s say I have two hosts on the LAN who want to expose a HTTP webserver (TCP port 80). We just have 1 public IP address and we can use the TCP port 80 just once. How could we provide access to more than one of the two hosts? We will be forced to use a non-standard port on the other host. I think that (also) because of the NAPT, the Internet is becoming everyday more a “download-only” network. This might be a more philosophical problem instead of a technical one, but I think that the good side of the Internet (since its first days of existence) is the possibility for anyone to expose to anyone else a self-hosted service, in any point on the world, without the need to ask to a third-party to do that.
  9. A lot of application-level protocols (DCC and FTP in active mode for example) use IP addresses of hosts inside application-level messages (to notice the server where to send some data). The NAPT doesn’t know anything about this and the result is that these protocols don’t work anymore today. On some routers there are some deep packet inspection systems (Application Level Gateway) which are able to recognize these application messages and alter them while opening ports in the NAPT mapping, in order to provide some form of  “retro-compatibility” (a lot of NAPT routers do that for FTP active mode for example). I think this is even worse of what we were talking before. Now the router doesn’t just have to check for layer-4 messages, but even layer-7 messages! Rebuilding TCP packets to search for a byte sequence it’s not a fast/easy thing to do for a router, it adds a lot of overhead. If you aren’t still convinced about this problem maybe you should read the workarounds that IPsec creators made for using IPsec over NAPT. IPsec can sign or ecrypt the payload of IP packets in order to make them secure and with reliable source. IPsec, in order to add this feature, sign all the IP payload. This is a problem because some fields which should be fixed (port numbers) are changed by the NAPT and the sign check fail. Also, if the packet is encrypted, the NAPT can’t read port numbers!
  10. The NAPT doesn’t let you able to keep open idle connections. Can be sometimes useful to keep an open connection where no data is exchanged for hours or weeks (let’s think about today’s popular websockets or IMAP with IDLE command). TCP/IP supports this feature without problems since the connection state is kept just by endpoints. The NAPT has few free port numbers to use, so it can’t just keep them allocated forever (for connections bad-closed and DoS attacks to the NAPT). For these reasons, the NAPT closes some connections when are not “active” and reallocate port numbers for new connections. The worst thing about this is that (on some NAPT implementations), the two hosts involed don’t receive any notification of this “disconnection” by the NAPT, so they still assume to be connected even if they aren’t, and they will discover that just when trying to send something to the other host.
    On the other hand, without NAPT, we could shutdown the router, change communication lines, change the way we connect to the router, put the host in standby, cut ethernet cables, reconnect and wait. We could do whatever we want. When we will be back online, if the addresses of the endpoints have not been changed, we are still “connected” to the other host and we are still able to send data on the same connection, thanks to TCP/IP.
    Someone could ask “so how can I keep open idle connections over NAPT?”, there are many answers, maybe the most popular is to use something like “ping/pong” messages like IRC does sometimes. Another answer could be to use a workaround called “TCP-Keepalive” which sends some empty TCP packets sometimes inside a TCP connection in order to show to the NAPT that we are active and that it should not delete the mapping rule.

The NAT is not a firewall

A lot of people think that the NAPT, for how it works, could be used (improperly) as a firewall technique. Let’s imagine, for example, a university campus network. We don’t want students to turn on SMTP servers in order to start spamming or other stuff like that. How could we prevent this? Some people say “let’s turn on a NAPT so they can just connect outside”. This could work but it is not the purpose of NAPT and I consider it a bad idea, in particular on small networks where we have public addresses for everyone. The purpose of NAPT is not that. It’s always possible to use a stateful firewall on the router or a separate machine, in order to prevent LAN hosts to expose services to the Internet.

Some people says that NAPT protects them from network attacks. Really?

  • On some NAPT cheap routers it can happen that, for several reasons, the WAN and LAN interface are actually on the same physical card, so if you send a packet to the WAN interface, it will go inside the LAN too.
  • Some router have “modules” in order to get FTP active mode and DCC protocols working (as we stated before). This is good but it also gives you security issues. By using some special attack techniques it’s possible for an attacker to use this mechanism in order to open an arbitrary port on the NAPT to an arbitrary host in the local network. For example, by using a special webpage with a java or flash application (see for example here)
  • Some local networks implement the UPnP protocol on the main router (using the so called “IGD” service). By using the IGD service, an host on the local network could ask to the router to add a port-mapping rule in order to get incoming connections from outside. It turns out that some routers accept this kind of requests also from the WAN interface (see this paper)

Other bad things

Did you know? Some ISP providers started to give you private addresses instead of public ones. This is called Carrier-Grade NAT and it’s still bad because it again breaks the end-to-end connectivity and you also don’t have any control to open up a port, since the NAT is performed on the carrier’s network.


L’altro giorno dovevo controllare il traffico generato da un programma (Skype). La mia macchina ovviamente sputa fuori un sacco di dati in rete (firefox aperto con diverse tab con connessioni http, client di chat, lettori di feed).

L’approccio classico per questo tipo di cose è: apro tcpdump/wireshark, penso un po’ alle porte tcp/udp e indirizzi che mi aspetto grossomodo di vedere uscire dal programma e filtro su quello. Oppure avvio tutto senza filtri e spero nella valanga di roba che esce di beccare quello che mi interessa (tanti auguri quando avete molto traffico…). Oppure se proprio voglio sprecare risorse avvio una macchina virtuale (solo per questo? really?).

Un altro approccio è lanciare il programma interessato con un GID (creato appositamente per l’applicazione) diverso da quello normale e poi, utilizzando il modulo owner  (-m owner --gid) di iptables, identificare tutti i pacchetti di quel gruppo e con il target ULOG tirarci fuori un file pcap.

Sinceramente però avevo poca voglia di fare così e ho appena imparato ad usare i namespace di rete su Linux (thanks Max!), quindi ho provato un altro modo un po’ come “esercizio di stile” (cit.). Non prendetela troppo come una roba per final users, ma semmai per p0w3r u5erZ.

Su Linux sono stati introdotti, ormai da un po’, diversi namespace. In particolare per quanto riguarda la rete, abbiamo la possibilità di avviare un processo in un namespace di rete diverso da quello di default. In questo modo il processo non vedrà nessuna interfaccia, nessuna regola di routing, nessun indirizzo ip, nessuna regola di iptables. In pratica è come se, esclusivamente dal punto di vista della rete, il processo si trova in un sistema “nuovo”. Tra l’altro questo fatto coinvolge anche tutti i processi figli. Poi volendo ci sono anche i namespace PID, UID, mount, …, insomma usandoli tutti possiamo fare più o meno una macchina virtuale “leggera”, che gira sempre sopra lo stesso kernel.

Detto questo bisogna sapere che è anche possibile creare delle interfacce di rete “tunnel” tra un namespace di rete e l’altro.

Utilizzeremo un tool per “maneggiare” i namespace nel kernel Linux (basta avere una versione del kernel >=2.6.26). Il tool è “unshare” e dovreste trovarlo già installato all’interno della vostra distribuzione. Fa solitamente parte del pacchetto util-linux. Un tool molto simile (alternativo) che ha qualche funzionalità in più è “vspace” del pacchetto “util-vserver” (che però dovete installare quasi sicuramente a parte).

A questo punto possiamo aprire un processo bash all’interno di un nuovo namespace:
$ sudo unshare --net /bin/bash
sudo è necessario perché viene fatta una syscall (per la precisione la clone con flags=CLONE_VFORK|CLONE_NEWNET|SIGCHLD ) che richiede privilegi di superutente.

Ok, ora nella bash che abbiamo lanciato noteremo l’ assenza di interfaccie di rete (eccetto il loopback, che comunque mancherà di indirizzo ip):

$ ip link show
5: lo: mtu 16436 qdisc noop state DOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

Di regole di routing:

$ ip route show
$

E di regole di iptables

$ iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination


$ ip6tables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination

Possiamo per prima cosa assegnare un indirizzo (IPv4 e IPv6) all’interfaccia di loopback, che potrebbe servire per IPC tra processi:

$ ip addr add 127.0.0.1/8 dev lo
$ ip addr add ::1/128 dev lo
$ ip link set dev lo up

Adesso ci serve tirare su una comunicazione verso l’esterno, quindi utilizzeremo due interfacce di rete virtuali, che faranno da “ponte” tra il namespace che abbiamo creato e quello “globale” (quello “solito” insomma).

Quindi da un altro terminale fuori dal namespace che abbiamo creato, facciamo un:
$ sudo ip link add name antani0 type veth peer name antani1
Questo ci creerà nel namespace “globale” due interfacce: antani0 e antani1, “finte” ethernet, collegate tra di loro in un unico dominio di collisione. Ora abbiamo la possibilità di spostare una delle interfacce che abbiamo creato, all’interno del nuovo namespace.
Per identificare il namespace di rete che abbiamo creato, possiamo cercare nella lista dei processi il PID del nostro bash “speciale” e segnarcelo. A questo punto facendo:
$ sudo ip link set dev antani1 netns /proc/$PID_BASH/ns/net
Diremo di spostare l’interfaccia antani1 all’interno del namespace di rete utilizzato dal nostro processo bash “speciale”.

E ora è tutto semplice dato che abbiamo un collegamento con il mondo “esterno” al namespace. Possiamo mettere in bridge l’interfaccia con una fisica o fare un NAT. Mostrerò questa seconda soluzione (che è quella che ho provato io per poter fare degli esperimenti appunto con NAT e skype). Questo non significa che sia una buona idea fare un NAT.

Diamo un indirizzo all’interfaccia fuori dal namespace e diamo qualche comando per abilitare il NAT:
$ sudo ip addr add 172.16.0.1/24 dev antani0

$ sudo ip link set dev antani0 up

$ sudo su -c "echo 1 > /proc/sys/net/ipv4/ip_forward"

$ sudo iptables -P FORWARD ACCEPT      # Per semplicità, se avete la policy drop aggiungete quelle due regole in più..

$ sudo iptables -t nat -A POSTROUTING -s 172.16.0.0/24 -j MASQUERADE
E dentro la nostra bash speciale, dove vedremo comparsa la nuova interfaccia, assegnamo un altro ip e impostiamo il gateway di default:
$ sudo ip addr add 172.16.0.2/24 dev antani1

$ sudo ip link set dev antani1 up

$ sudo ip route add default via 172.16.0.1 dev wlan0
A questo punto ci basta avviare fuori dal namespace tcpdump o wireshark in ascolto sull’interfaccia antani0 e lanciare dalla nostra bash “speciale” skype (o qualunque altro programma di cui vogliamo catturare il traffico).

In realtà se notate ho lasciato sottointeso un passaggio: skype e qualunque altro programma lancerete nel nuovo namespace utilizzerà sempre le solite informazioni per il DNS, quelle salvate in /etc/resolv.conf  e /etc/hosts. Queste informazioni risiedono sul filesystem e quindi non sono affette dal namespace di rete, saranno dunque uguali al namespace “globale”.
A me serviva però avere dei DNS diversi per skype, rispetto a quelli globali di sistema. D’altro canto non volevo cambiare quelli di sistema perché poi mi avrebbero rallentato la risoluzione nomi di tutti i programmi.

Fortunatamente c’è anche un modo per risolvere questo problema! Ed è grazie al namespace di mount!
Infatti oltre al namespace di rete potete abilitare anche il namespace di mount. Questo significa che quando lanciate un programma in un nuovo namespace di mount, vedrete tutti i mount “soliti”, però quelli nuovi che farete dall’interno del namespace saranno visibili solo a voi e non all’esterno.

Quindi rifacciamo la procedura di prima cambiando il primo comando in:
sudo unshare --net --mount /bin/bash
E avremo sia un nuovo namespace di rete che un nuovo namespace di mount.

Ma perché voglio usare il namespace di mount? Pochi sanno (e non lo sapevo neanche io) che è possibile montare un singolo file sopra un’altro già esistente, usando mount –bind. Montando un file sopra uno esistente, il risultato sarà che accedendo a quel file vedremo solo l’ultima “versione” montata.
Possiamo allora creare da qualche parte una versione sostituiva di /etc/resolv.conf, ad esempio in /tmp/, e scriverci dentro quel cavolo che ci pare, poi, dall’interno del nostro namespace di mount (e rete) faremo tipo:
mount --bind /tmp/resolv.conf /etc/resolv.conf
Ed è fatta! In questo modo abbiamo messo una versione “nuova” di /etc/resolv.conf sopra a quella vecchia, nascondendola, e questa modifica sarà visibile solo dall’interno del nostro bash speciale, dato che stiamo operando, come già detto, in un namespace di mount diverso. Fuori dal namespace di mount nuovo continueremo a vedere la vecchia versione del file. Ci serve farlo anche per /etc/host o /etc/nsswitch.conf (altro file interrogato per risolvere i DNS)? Basta ripetere il mount anche per quegli altri files!

Tra l’altro, se siete interessati, è possibile anche spostare una interfaccia di rete esistente, tipo eth0, sfruttando lo stesso comando visto prima per “spostare” le interfacce! Anche questo può tornare molto utile

Tutto questo non è fantastico?

Tra l’altro unendo queste cose con anche il namespace PID e user penso sia possibile creare una sandbox seria per skype. Se qualcuno si sta chiedendo perché sandboxare skype forse è meglio che vada a informarsi su cosa è successo un po’ di mesi fa quando una versione deoffuscata del programma è girata in rete… Oltre a questo resta il fatto che è un software proprietario con codice offuscato e che funziona in modo abbastanza strano. Farlo girare in una VM forse è esagerato (e anche scomodo), però con questi trucchetti si possono fare (a mio parere) cose molto interessanti, sia per sandboxare programmi che per fare test di networking vari.


Tra Ottobre 2011 e Gennaio 2012 presso il Politecnico di Milano ho avuto modo di studiare finalmente un po’ i fondamenti delle reti di telecomunicazioni e Internet (Ethernet, IP, TCP, etc…). Durante il corso mi sono accorto che i libri erano diversi e un po’ tutti incompleti (soprattutto per la seconda parte del corso), quindi alla fine ho deciso di preparare dei miei appunti “fatti come si deve”, dai quali è nata una sorta di dispensa.
Visto che ho svolto molto lavoro e penso possa tornare utile anche ad altri, ho pensato di pubblicarla qua, anche se è un po’ incompleta nelle sue ultime parti. Spero di trovare il tempo di estenderla un po’!

Potete trovarla qua