Below are answers to questions asked during the recent webinar. If you have additional questions, please contact Dan Young at youngdj@iu.edu
Q: How many people are on your team at IU?
A: The Indiana University Enterprise Database Team consists of 4 full time Oracle Database Administrators, 2 SQL Server Database Administrator and their manager, me! (I try to act like I know a little something about databases also.)
Q: True or False: If we create an Oracle (Standard) database on one of our VMware clusters, Oracle requires us to have Processor-based licensing for every processor on the VM cluster.
A: How I wish this were a simple 'True or False' answer. I, by no means, am an Oracle licensing expert but I do know that there has been a great deal of controversy on this particular subject. As Oracle does not recognize VMware as acceptable 'hard-partitioning', all processing cores on any one physical host where Oracle resides must be licensed. This is true wheather running a database within VMware or natively on physical hardware. However, the question : "Is Oracle required to be licensed for the entire VMware cluster?" is open for debate. Each individual cusomter situation is likely distnictly unique, but it is probable that Oracle licensing costs can be limited within cluster by using mandatory 'DRS Host Affiniity' rules provided by VMware. For a deep read of the topic, I would refer to the work and analysis of Dave Welch from House of Brick, Jeff Browning of EMC and VMware themselves.
Q: Oracle support clearly told us this year that any Oracle DB running on non-OracleVM will not be supported. Any input?
A: Wow! Unfortunately, I am not surprised to hear that certain Oracle representatives are still perpetuating Fear, Uncertainty and Doubt around Oracle database support. I would begin to tackle this problem by requestiing Oracle to clarify the meaning of My Oracle Support (MOS) NOTE: 249212.1, which states that the Oracle database, including Oracle RAC, is fully supported within VMware. Secondly, I would be certain that I fully understood the difference between Oracle's definition of 'Certified' vs. 'Supported' and if necessary, I would explain this distinction to the Oracle representative. Finally, I would be very clear in my service expectations from Oracle support and I would not hesitate to escalte uncooperative or unhelpful responses. LIke many customers, the hUniversity pays a great deal of money for continued support and it is my expectation that Oracle will indeed offer me lucid and competent solutions to our issues.
Q: What are single points of failure for having multiple databases on a VM cluster?
A: In terms of points of failure, any solution, VMware or otherwise is only as resilient as the architecture and hardware design behind it. ( i.e Traditional N+1 Redundancy. ) In our University environment, a VMware cluster is composed of multiple physical machines spread across different hardware racks within our hardened datacenter. Each hardware rack has redundant Network Switching, Power Supply and Storage Switching components which are uplinkined and terminated on separate physical cable cable runs. In addition our Storage Area Network (SAN) and backup technoliges are natively resiliant. Through our architecture and design decisions, we strive to insulate ourselves from any single point of hardware failure. That said and VMware vSphere Fault Tolerance notwithstanding, if a physical machine in the cluster fails, all VM guests on the failed host will experience an interruption in service when migrating to another running physical host. We have ineed experienced such failures with a typical time to recovery of less than 5 minutes, which affords us very highly-available database services without the need to purchase Oracle RAC.
Q: Why VMware? Your OVM stance?
A: The University is a long time partner with VMware, dating back to our intial foray with the technology in 2003. Although virtualization competitors have gained some ground in the last 24 months, VMware continues as the clear market leader with an ever expanding and robust feature set. Of course, OVM, without a support contract, is a very cost appealing opiton as well, with some certain ease of deployment advantages especially being native to the 'Red Stack'. (e.g. Templates) Likely, as OVM matures, it is reasonable that the University perform a formal analysis to compare the products to better understand the benefits of each.
Q: Is there clustering service for Oracle 10g that integrates with VMWare HA? Like SQL server FCI?
A: Using Microsoft Clustering within a VMware HA environment is not without well documented caveats and requirements. It is, nonetheless, possible. Depending upon your particular availabilty requirements, Oracle Data Guard (HA) or Oracle Real Application Clusters (Continuous Operations) can most certainly be utilized atop VMware. Please note that if you are specifically tied to Oracle 10g, that only Data Guard is supported. Oracle RAC on VMware is support from version 11.2. onward.
Q: How does one deal with licensing costs in a multi-tenant VMware cluster?
A: For full disclosure, we at the University carry an enterprise campus license. If I install Oracle once or one-thousand times, the price is the same to me. Now I’m sure that this makes you say ‘A-HA! I knew there was a hitch!’, but I do not believe that the licensing requirement imposed by Oracle necessarily excludes traditionally licensed installations. For certain, it is plausible that license costs could increase, especially at institutions with relatively small Oracle deployments. However, when fully considering the cost, I believe it is necessary to consider two important factors:
1) Can all Oracle databases be separated into their own VMware cluster? We currently do this for several reasons, including I/O isolation. The bulk of our VMware installations run within a larger, general cluster of physical hardware, but database servers are resident within their own cluster of physical resources. In essence, we are minimizing the physical footprint of Oracle and in doing so, we would be minimizing license cost. Also, if you only have the option to run Oracle systems in a single cluster, the following explanation may be of help:
http://oraclestorageguy.typepad.com/oraclestorageguy/2012/09/oracle-throws-in-the-towel-on-vmware-licensing-reprise.html
2) What is the projected consolidation ratio of existing Oracle databases within a VMware environment? Obviously, mileage varies here, but if you have many small physical Oracle servers you may be able to achieve very dense consolidation rates, thus minimizing the number of necessary licenses. Confio Ignite can gather some of the data necessary to help assess consolidation possibilities.
Q: What is the largest size database that is run on VMware?
A: From a raw disk perspective, the largest Oracle database running on VMware is approximatly 3 TB. From a memory perspective, our largest system has an 'SGA Target of 40 GB' with an 'SGA Max' of 64 GB. Finally, transactionally, our largest system supports an application running between 15 and 20 million queries per hour under peak demand, with about 13 'Average active sessions' at any point during the day. (13 different Active database processes.)
Q: How many Oracle VM guests are run per VM cluster?
A: We are fortunate in that we have 2 geographically separate data centers and a physical Oracle cluster located within each. Our average density is approximately 12 VM guests per physical host, with a maximum density of about 35 VM guests per physical host. However, in this case, the calculated average is a bit skewed in that we have a small number of databases which have 'special' computing requirements and we have intentionally chosen to limit the density on these particular systems to one or two VM guests.
Q: Why would you virtualize an Oracle database that uses 190 GB of RAM and 16 or more CPU's?
A: An exellent question with a simple answer, or so I hope. Firstly, VMware is fully capable of supporting a server of this particular size, so no problem! I would run a database of this size on VMware to take advantage of VMware's HA and DRS functionality in an N+1 redundancy configuration. Additionally, the fact that system administrators could perform future hardware maintenance and upgrades without the intervention of a database adminstrator seems like a good deal to me. I would rather sleep soundly and never need to lay hands on a physical server again during a hardware lifecycle replacement project!
Q: Need your advice: Application vendors don't normally provide qualification testing in VM environments, hence convincing stakeholders is difficult.
A: This may sound harsh, but once the University established our vision of entirely virtualized enterprise applications, and after we had several 'Quick Wins' in moving Web and Application servers to VMware, we met with reluctant vendors and displayed our strategic plan. Essentially the vendors were put on notice that their solution was to either work within a virtual infrastructure or we would seek alternatives. Obviously, this is not the type of approach that can work in all organizations, and more practically, we have "coached" more than one vendor through the process, educating them to the laurels of virtualization and helped them to become comfortable running their software with a virtual infrastruture. Generally, they comply.
Q: Can you give examples of the commodity equipment in your VMware environment?
A: In general and in accordance with our University approach to "Rutheless Standardization" we strive to keep idenetical commodity hardware in our VMware clusters. (Such a feat increases as the equipment ages and vendors retire certain models, but we try to do our best.) As a rule, all of our servers run in an 8 core, dual socket configuration (Intel) with 256 GB RAM per host. (Please contact me directly for exact vendor specifications.)
Q: Do you have non-Production Oracle instances running on hosts in clusters separate from Production?
A: We mix both Production and Non-Production instances within a single database cluster. We do however, isolate our "Production" servers to one datacenter and our full sized "Pre-Production" servers to the opposing datacenter, such that we reserve full computing capacity within each location in case of a massive disaster. Otherwise, there is a mix of other development and test databases within each cluster. We do occassionally enforce some 'Host Affinity Rules' to associate certain legacy workload with a particular physical server as a optimization technique.
Q: How storage is configured for Oracle databases running on VMWare? Is ASM used for databases on VMWare and if ASM is used then how disks were presented to ASM?
A: We run ASM with all of our Oracle databases. As a rule, 99.5% of our ASM volumes are presented using native VMFS. (VMDK files) Given its size, we do use RDM (Raw Device Mappings) for our Decision Support System so that we can leverage the ‘snapshot’ capability of our SAN to quickly clone the database, but this is our only system configured in this manner. From a management perspective, however, both the VMDK and RDM files behave in the same fashion from Oracle’s perspective. (If it is helpful, we do not use ASMLIB, but rather disk as block devices.)
Q: Did you use an outside company (like House of Brick) or Oracle to evaluate your licenses (ie. what you would need when moving to VM)?
A: The only external consulting we have utilized as part of our efforts was a brief four week engagement with House of Brick (HoB) in 2008. After designing our virtualized Oracle infrastructure, we engagued HoB as a sanity check to validate our assumptions, approach and lastly, to perform some load testing and load validation of our Learning Management System. At the end of the day, HoB produced the following case study: http://www.houseofbrick.com/
Q: What was the largest database you moved from physical to virtual hardware? How long did it take?
A: The largest migration we performed would easily be our Decision Support System. Around 2.5 TB at the time of migration, we completed the process in approximately 8 hours, using Oracle's datapump utilities. We had several strategic reasons to leverage Oracle datapump as our primary migration vehicle, but we also utilized Oracle Streams when high system availabity was of particular importance. (Oralce RMAN and/or Data Guard can also be used to manged the transition as well.) As a rule, we always try to select the best tool for our particular requirements.)
Q: Why RMAN? We run on NetApp and we backup using snapshots. We then copy the snap to a Data Domain backup unit which compresses and dedups. If we need to refresh or clone a database, we simply snap clone it. In our experience, it takes RMAN much longer to produce a backup set than creating a snap.
A: I would not disagree that Storage Snapshots are much faster than RMAN backups and as a matter of fact we do use them with our data warehouse where the speed of backup operations is particularly important. On all other systems, however, we take an RMAN Level 0 twice a week and a Level 1 (Incremental) backup all other days. Of course we backup archive logs as necessary as well. We chose RMAN for several reasons - Firstly, we get an approximately 3 to 1 compression ratio with backups, which is important to us. As it seems our development organization requires us to have ‘full copies of the data’ on all non-production databases, compression is important to maintain a reasonable storage footprint. Secondly we do a great deal of database cloning using the RMAN duplicate function and we have found the flexibility of this tool to be preferable to coding custom scripts and processes necessary with the use of snapshots. Finally and perhaps most importantly, using RMAN will assist us with future SAN migrations. By using RMAN, we are not tied to any one vendors approach of doing storage snapshots. If down the road, we were to choose to change vendors, my unit nor my system administrators are required to do any customized setup in support of the snapshot technology. (Our previous use of storage snapshots has been a limiting factor in performing seamless SAN migrations.)
Q: Why ASM? Storage arrays such as NetApp and EMC VNX are highly optimized for I/O.
A: It is not entirely clear to me why a particular storage platform would preclude me from using ASM. We use ASM primarily as a management tool for ease of data file management and growth. When we create a database, we effectively have two ASM volumes, one for Data and one for Redo. All datafile naming is handled using OMF and files are set as ‘big files’ allowing for virtually unlimited growth. When the time comes to add more storage, we simply present a new block device to ASM from our SAN and add it dynamically. (The same holds true for when we need to remove a block device is a database is reduced in size or we desire to change device sizes, etc. This is a fairly rare occurrence, however.) In essence, I want my staff to spend as little time as possible doing data file management. It is not a value added task. We have moved from managing database storage at the file/device level to the ASM volume level which is a much better use of time.