RV's blog: 2008

Saturday, May 24, 2008

Cisco CSG sh mod csg X stats extensive checksum errors

2 CSG in FT scenario (active-standby). During normal work you can see extensite checksum errors with "sh mod csg X stats":

cisco-csg1#sh mod csg 2 stats
Connections Created: 220120
Connections Destroyed: 219713
Connections Current: 407
Connections Timed-Out: 0
Connections Failed: 0
Server initiated Connections:
Created: 0, Current: 0, Failed: 0
L4 Load-Balanced Decisions: 220120
L4 Rejected Connections: 0
L7 Load-Balanced Decisions: 0
L7 Rejected Connections:
Total: 0, Parser: 0,
Reached max parse len: 0, Cookie out of mem: 0,
Cfg version mismatch: 0, Bad SSL2 format: 0
L4/L7 Rejected Connections:
No policy: 0, No policy match 0,
No real: 0, ACL denied 0,
Server initiated: 0
Checksum Failures: IP: 6770821, TCP: 0
Redirect Connections: 0, Redirect Dropped: 0
FTP Connections: 0
MAC Frames:
Tx: Unicast: 0, Multicast: 118, Broadcast: 0,
Underflow Errors: 6815443
Rx: Unicast: 751, Multicast: 10, Broadcast: 0,
Overflow Errors: 0, CRC Errors: 46627
cisco-csg1#

Cisco TAC writes about it:
"As you are running replication it is normal to see packets with wrong checksum on FT VLAN. This comes from the way CSG communicate new sessions and tear downs to standby CSG. This is done through "fake" SYNC and RST packets send over FT VLAn and these packets doesn't have correct checksum. If you have possibility to collect sniffer traces with VLAn tags in it you should see that those are only frames on FT VLAN."

Thursday, May 22, 2008

CSG in action

Sandwich configuration (2 CSG cards per chassis), RLB (Radius Load Balancing) from client side, FWLB (Firewall Load Balancing) from server side. LB across 2 cards, and every card has standby one in other chassis.

Sunday, May 18, 2008

Cisco CSG part III

How traffic quoting works? First of all, CSG has several configurations parts (i will not show all):

1. contents (+policy+maps)
2. services
3. billing

Contents - you define content providers there. Where can be 4000 contents (as i remember). You should know, that all Internet is content provider too.
Services - some kind of traffic definition and method of traffic accounting. Services include contents with policy definitions (we can have different pages from one content provider included in different services).
Billings - this is billing plan definitions. Really we can use prepaid or postpaid.

The most interesting for us - Prepaid. When user opens tcp session (or sends any other traffic), it opens service. You can see open services per user with "sh ip csg account user ...." command. CSG sends to quota server service authorization request, where it asks for some volume of quota. Server answers, and user begins work. You can see how user eats quota with above command. When quota volume is low, CSG sends service reauthorization and asks more quota. If no more quota available, than server returns zero, and CSG redirects http/wap traffic to configured page or drop any other traffic.

The quota metered in quadrans. This is virtual resource. In your services you can set what we will use as one quadran (ip bytes, tcp bytes, events, time), and also set quadran weight (for one page we can eat 1 quadrant per 1 byte, for other - 1 quadran per 10 byte or 10 quadrans per 1 byte).

Friday, May 16, 2008

CSG in load balancing IP addr plan (RLB/FWLB sandwich)

This is picture from RLB/FWLB sandwich for CSG load balancing.

Thursday, May 1, 2008

Cisco CSG1 performance

One result from my lab. It done with traffic exchange about 8 kbytes in both directions per TCP session (appx 8 input and 8 output), 2 CSG in FT (content without replication), 150 radius events per second, 3000 active users. Content authorization enabled, all users PREPAID (worst case). 4 balanced quota/bma servers. Input - traffic from the clients, output - traffic to the clients (we send some more data from server in one session).
This is not official result, this is only my investigation.

Sunday, April 27, 2008

Servers redundancy without DNS and NAT

I had one task - develop customer servers redundancy strategy (across globally distributed datacenters) for applications where I can't use DNS (the client software doesn't understand FQDN), and I can't use NAT (this application uses very proprietary protocol). Usually we use DNS (use for example Cisco CSS and reply to DNS with right server IP), or NAT technology - RHI (Route Health Injection) with NAT on CSS (read Cisco SRND for this solution). But not for this application.
So I think that next solution can be correct: Use RHI, but use loopback interfaces on servers with same IP addresses.
Firstly, we can create loopback interfaces on every server with same IP, let use 10.1.1.1/32. We can run server application for work with this loopback interface.
Secondly, we can set routes on gateway routers to use network interface IP on every server to access to 10.1.1.1.
Thirdly, we use Object Tracking to check server health and use it for control route 10.1.1.1/32 propagation to the core (it needs to check - how to implement it)
Finally, we will announce our 10.1.1.1/32 route from every site to the core with different metric.
In case of server failure, Object Tracking will supress route announce, and clients will change server. BTW, clients will set server address to 10.1.1.1 on their client application.
This is simple picture:

Wednesday, April 23, 2008

Cisco CSG part II

Before start, some words about CSG. There are two CSGs available - CSG1 and CSG2.
First of all, this is service board to Cat6500 or Cisco 7600 chassis.
This is traffic inspection system with billing capability.
CSG1 has 6 processors. One PPC 405 and 5 Intel IXPs. PPC is working for common tasks - radius traffic processing, work with Quota/BMA servers, collect statistics, prepare data and rules to IXP processors.
CSG2 is based on Cisco SAMI card and has 5 CPU with Cisco IOS instance on every one and Intel IXP network processor for load balancing among them (without load balancing hardware it is much closer to Cisco MWAM board). It has two IXP processors, but one doesn't enabled in current software. I think that Cisco have plans to use it for load balancing in backward direction for increase load balancing performance.
When CSG1 uses serial packet forwarding across all IXP's (slow path) or fast switching between first and last IXPs (for example, first packet in TCP session goes across slow path, the others - through fast path) and prepare and send instructions to IXPs from central CPU, CSG2 uses parallel forwarding across all CPU-s, and uses C coded software for traffic inspection on every CPU.
From technology point of view I think that CSG1 is some more interesting solution, but much harder to develop and support. The CSG2 uses modern powerful CPU-s and has more performance than CSG1.
So this is L3 routing device. Separate from the 6500/7600 supervisor. CSG1 (i don't know about CSG2) doesn't send ARPs. It only copy ARP information from the chassis (i listened from the Cisco guy), but i think, that it inspects packets and catch arp packets from it. I think so because CSG has less ARP timeout than supervisor and in common situation it can loss MAC address from the ARP. And if CSG loss arp from arp table, it can't forward traffic. And it doesn't renew arp information from the supervisor arp table. So my experience - set small arp timeout on every interface connected to the CSG. My value - 60 seconds.
How does it all work? First of all, we have to know not only IP addresses of our users, but also their identification information. The only way to know it - RADIUS accounting from BRAS, GGSN or other access device. So CSG inspect, or proxy, or work as server for radius accounting. From this information it gets user IP and their login or Caller-ID for current session.
CSG stores all this information in Known Username Table (KUT). You can see it with "show ip csg account users all det" command.
After that we can inspect packets and we know which user will pay for it :)
In next post I will write about billing/quoting system.

Tuesday, April 22, 2008

Cisco CSG part I

Do you know Cisco'c Content Services Gateway (CSG)? Cisco create strange device. I believe, that engineers create some more, than they thought. As usual, such thing has a lot of issues and problems. But together with SCE it can change our view about SP's services to end users.
First of all, how we can run this device? Unlike many other Cisco's product, charging system can't work without external parts. If you have Cisco GGSN, than this system can connect by Diameter DCCA to billing system. But in most cases you don't have all the system. For example, you have GGSN from other vendor. Or you don't use GGSN, may be you systems integrator or xDSL ISP.
I will publish several articles about my experience with CSG testing, implementation and maintaining.
First of all, let create lab. We need in RADIUS client, GTP' (gtp prime) server (or Quota/BMA server), client side (traffic generator from user side), server side (response part).

From my experience, city with size less than 1 million people from point of view of small mobile operator have approximately 400-600 RADIUS accounting events per second and about 30000-60000 active GPRS (or PPP for CDMA) sessions. Our solution must simulate this city. You can easily predict needed load for you environment from this data.
We can use some good and expensive testing tools, but I don't have such :) So my goal - use free tools for it.
Firstly, we have to describe all parts of our solution:
1. RADIUS client - use radclient from freeradius, shell scripts, and SeaGull (http://gull.sourceforge.net) traffic generator for simulate load
2. GTP' server (Quota/BMA server) - server simulator from http://ipantenna.com
3. Traffic generator client side - SeaGull traffic generator, Siege http traffic generator
4. Traffic generator server side - Web server, FTP server and any other server (which protocol we will test).
In next posts I will describe all parts of this lab and we will install and configure it all.

RV's blog