One of the biggest causes of issues is configuration parameters. I try and drill into all the young engineers that I work with the three Cs: Configuration, configuration and configuration.
I was approached by one of the operations staff recently about a performance problem that had started occurring in the production environment. Some of the queries to validate data were taking a lot longer than they should and this was happening intermittently. The guy was concerned because they had recently migrated the system from a physical server to a new virtual server. He was worried there was some issue with the server setup that was not expected. So I knew something must have changed. I didn't think it was the virtual server because we have other deployments on virtual servers.
I wanted to have a look at the log files for the server because we had introduced performance logging around the operations in questions. This would allow me to confirm what the customer was saying. Anyway he wanted to look at some other stuff first and I had other things to do so I left it. A while later he approached me again saying that he needed more help.
So I opened up the log files and was able to confirm that some of the operations were running slowly intermittently. The operations normally complete in about 300ms so when they start taking 25 seconds people notice. It also looked like the customer operators were hitting the buttons again because they thought the first button press had not worked.
What I noticed was that every time the operation ran slowly a new connection to the database was also happening. Our database server is configured to do reverse DNS lookups for the client connections. I don't know why this is but it does mean a new connection sometimes takes 20 seconds or more when the client machine does not have a DNS entry. The operations guy said all the production servers had DNS entries and the DNS was configured and other database connections were happening quickly. I had a look at the JDBC connection string. It was wrongly configured to point to the old database server not the new one. The old database server does not connect to the same DNS server as the new database server. When the system was migrated the database parameters were not updated - Issue resolved. Once again the culprit was the 3 Cs.
It reminded me of another issue that had come up recently, when a test server was being upgraded. Someone had a made a mistake and configured the server with the configuration files for the production server. I was called over when they could not work out why all the data was going into the production database, when no users were connecting and all the users seemed to be connecting to the test server. Again the culprit was the 3 Cs.