All about Database Administration, Tips & Tricks

The Insights of a Geek DBA !!!

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Cassandra for Oracle DBA’s Part 5 : Read and Write Consistency

With the Data not shared model, how does a client or query get a consistent data and how does a write ensure that it is writing right data.

Obvious answer is the maintaining consistency in a database involves in adhering the ACID Properties.

Cassandra follows Atomicity, Isolation, Durability but consistency is an eventual consistency or a Tunable Consistency.

Having said that, data will eventually become consistent among nodes.

What is eventual consistency?

ANS: The data will become consistency when required or by on Demand over a period of time.

But, Does client need to wait until it become consistent?

ANS: No, when client send a request for data, the underlying query have a consistency level for a session level or query level , for which ALL, ANY, ONE etc. should be mentioned,

When replication factor is set and read happens the co-ordinator that receives the client request will send a request to the nodes that contain the partition range of data and nodes will revert back with data to coordinator node. Whilst its the coordinator node to merge all the parts of the data and check with digest it received from each node and see the data is latest. If any node data is not matched with digest then that node data will be repaired in backend and so forth become consistent eventually.

I do not want to waste any efforts in rewriting this stuff since Datastax documentation has very good page for Read/Write Paths in Cassandra with all scenarios.

https://docs.datastax.com/en/cassandra_win/2.2/cassandra/dml/dmlClientRequestsReadExp.html

https://docs.datastax.com/en/cassandra_win/2.2/cassandra/dml/dmlClientRequestsWrite.html

So with Consistency Levels in place and the digest matching and read repair online, the Data that retrieves Cassandra ensures consistency of data can be achieved eventually and depends on your requirement.

Next Post High Availability Comparing with Oracle RAC, cluster interconnect, node evictions etc.

12c Upgrade Gotchas

The following are various gotchas during upgrade or post upgrade to 12c collect from various sources and also few of them in my environment.

1. Change of resource_limit parameter to true by default

Impact: Resource Manager will be invoked automatically and push the resource limits to the sessions and users may receive cpu/io restriction messages.

2. Global Temporary Tables Statistics session level

Impact: Earlier versions of Oracle uses shared statistics for global temporary tables, where in 12c the statistics on global temporary tables are session level.

that means other users cannot see it

3. HISTGRM$ add column

Impact: n 12.1, optimizer group added a column to histgrm$ which is a clustered table. For upgrade from 11.2 to 12.1, this column ep_repeat_count is added as a default value column with a not null constraint. Since histgrm$ table on a database could have millions of rows, this addition of a default value column could take a long time since we need to update all existing rows with a default value. This in turn slows down the upgrade significantly to apply the patch 19333670 on 12.1.0.2

4. Drop column with null default value can cause corruption bug 17325413

5. Dynamic Statistics - Performance Problems

Impact:- Dynamic statistics has been enhanced in Oracle 12c, but this can lead to a longer parse time.

Automatic dynamic statistics is used more often in 12c which can lead to a parse time increase in the more cases than before.

6. Wri$_optstat_histhead_history Size can cause much time to upgrade, before upgrade purge or reorg the table.

7. Wrong results in group by with outjoin for a bind variable

alter system set "_optimizer_aggr_groupby_elim"=false scope=both;

alter system set "_optimizer_reduce_groupby_key"=false scope=both;

8. In memory database - Do not enable unless you have a license

9. DBA_REGISTRY also reports CATALOG and CATPROC in "UPGRADED" status as shown below:

Solution: run utlrp.sql

10. SPD & Dynamic Sampling more CPU Utilisation & Cause result cache also fill.

11. Oracle 12c has deprecated the sec_case_sensitive_logon parameter. Changing to case sensitive passwords could break your application.

12. If you have client/JDBC version below 10g , ensure you must update the sqlnet.ora in database host SQLNET.ALLOWED_LOGON_VERSION_SERVER to 10 or 8 or 9.

This is due to in earlier versions it was 8 where in 12c it was 11, that means 11g client can connect to db. By keeping it 10 or 8 or 9 you are allowing other version clients to connect to database.

13. Be aware of, By default, In 12c version the auditing runs in mixed-mode, which means you can combine traditional auditing with unified auditing. There is one gotcha

to be aware of:

– if the database has been upgraded, then there will be no Unified Policy enabled!

– if the database has been newly created, then there will be 2 Unified Policies enabled : ORA_SECURECONFIG and ORA_LOGON_FAILURES.

Thanks

GEEK DBA

Cassandra for Oracle DBA’s Part 4 – Data Distribution across Nodes

Shared Disk architecture in RAC where the data is shared across the nodes and at any point of time , every instance will give the same data using cache fusion.

But in Cassandra, the important part is storage is not shared storage, instead its sharded means partitioned and stored in each node.

And when you see RAC architecture we see an horizontal landscape (the hub nodes), like below figure 1, where in Cassandra nodes formed as ring (the peer architecture or hub-leaf spoke technology). In a ring architecture every node acts as coordinator there is no master in cluster unlike RAC.

Note: In 12c we use the same mechanism called Flex Cluster, but still the Hub Nodes aka database uses shared storage. Do not confuse with it.

Diagrams RAC & Cassandra

Image result for flex cluster oracle rac architecture

RAC 12c Flex Cluster Architecture Cassandra Sharded Cluster Nodes

Well, if the data is not shared , we as a DBA the common questions that come to our mind are, 🙂

1. How does the data is distributed to the nodes?

2. How does nodes see a consistent data if a read happens?

3. How does node knows if a write happen in one node the other node knows about it?

4. How about data availability?What if the one of the node corrupted or disk failed?

Before to answer this questions, Lets catch up our ASM Diskgroup Mirroring and Striping methods

ASM uses diskgrup level redundancy aka mirroring with Normal (two copies) and High (three copies), so at any time it maintains the data that stored in data diskgroup that many copies. Even a diskgroup/disk failure will not have any data loss.

ASM uses disk level striping so it evenly distribute the data to the number of disks in that diskgroup, when a disk added or deleted the data is distributed to other disk first called re-balancing and then allow disk operations on it. This ensures the data can be read / write evenly to avoid a single point of contention on disks.

Keeping those in mind, Just mark yourself rather diskgroup level, in Cassandra the data is maintained at node level replication and striping. So each node store its own set of data and may be a replicated data of other nodes.

In Cassandra, Mirroring is achieved by Replication Factor and Striping is achieved by Token Range of the nodes called Partitioner ( I am not covering Vnodes here )

Partitioner/Tokens:- Cassandra uses a hashing mechanism to assign a token range for a node. When a node is added and deleted one must re-assign the tokens (not the vnodes configuration) so that data will be redistributed to the other/new nodes as like the ASM Diskgroup Rebalancing. Now with token in place for each node, there is another thing that required is table level partition key, which is based on a column to store certain partitioned data into a separate node. And then while any DML operation performed that row will be hashed and checked with node token range and insert into that particular node. In this way each node has a non shared data means a part of data for a table.

Cassandra uses different types of hash mechanism to provide token to nodes and A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically a partitioner is a function for deriving a token representing a row from its partion key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token. Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if the tables use different partition key, such as usernames or timestamps. Moreover, the read and write requests to the cluster are also evenly distributed and load balancing is simplified because each part of the hash range receives an equal number of rows on average.

You can set the partitioner type in cassandra.yaml file using

Murmur3Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
RandomPartitioner: org.apache.cassandra.dht.RandomPartitioner
ByteOrderedPartitioner: org.apache.cassandra.dht.ByteOrderedPartitioner

Mirroring/Replication :- Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor.

Replication Factor can be configured at Key Space Level. i.e A tablespace/schema level, and then each object/table in that Keyspace will maintain that Replicas in nodes. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later.

Replication Strategies can be determined by Using class option at Key Space Level so that each data center will maintain a set of data replicas.

Two replication strategies are available:

SimpleStrategy: Use only for a single data center and one rack. If you ever intend more than one data center, use the NetworkTopologyStrategy.
NetworkTopologyStrategy: Highly recommended for most deployments because it is much easier to expand to multiple data centers when required by future expansion.

With above, look at the example with replication factor 3 with single and multiple datacenter.

Mutli-datacenter clusters for geograpically separate data

Now lets answer the questions above.

1. How does the data is distributed to the nodes?

Ans: Data Distribution of nodes is based on Token Range assigned to the node. So each node stores different data.

2. How about data availability?What if the one of the node corrupted or disk failed?

Ans: As such the data is replicated to other nodes, even if a node / disk failed the other nodes can get the data. This way data high availability is maintained.

3. As if the data is not shared how about read consistency and write consistency.

Ans: Cassandra Maintains the eventual consistency in forms of multiple read requests to the nodes if it sees the data digest is not matching. So it try to read the most recent data from other replicated nodes.

Cassandra for Oracle DBA’s Part 3 – Architecture

In Oracle (infact for any RDBMS) , will have a memory structures and processes (or thread) to process the data.

If we take Oracle as an example, lets look the similarities

1. Memory Architecture

Oracle	Cassandra (NO-SQL DB)	Parameter to Set in Cassandra.yaml file	Comments
SGA	Java Heap Memory	cassandraenv.sh MAX_HEAP_SIZE	SGA is shared global area , where in the Java Heap Memory is used for Cassandra Core Engine
BufferPool	Memtables	memtable_total_space_in_mb	Memtables in Cassandra used for each single table to store the recently used or modifying data, unlike a large buffer cache suppose you have DML on 4 tables , it will create 4 Memtables in the memory
Result Cache	Row Cache	row_cache_size_in_mb	As like result cache in Oracle, you can store certain rows (entire rows) in table in memory by using rowcache
RowCache	Partition Key Cache	key_cache_size_in_mb	When an operation has to perform on a block, one must know the Block location in terms of Block Number, Disk number, node name etc. To achieve this, Oracle relies on Data Dictionary Cache and get block number and file number and then read from the disk. In Cassandra, Data Stores on disk based on Partition Key Value and this Key will be stored in Partition Key Cache to effectively get the data from disk
Hash Buckets	Bloom Filter	bloom_filter_fp_chance	In Oracle to find the data block in buffer cache, Oracle uses mod function to find the right bucket for the data. This will help process not to traverse entire block chain. Similarly, Cassandra checks the Bloom filter to discover which SSTables are likely to have the request partition data. The Bloom filter is stored in off-heap memory. Each SSTable has a Bloom filter associated with it. Create table ..... (bloom_filter_fp_chance = 0.01)
NA	PartitionSummary	min_index_interval 128 and max_index_interval 2048, or index_interval 128	The partition summary is an off-heap in-memory structure that stores a sampling of the partition index. A partition index contains all partition keys, whereas a partition summary samples every X keys, and maps the location of every Xth key's location in the index file. For example, if the partition summary is set to sample every 20 keys, it will store the location of the first key as the beginning of the SSTable file, the 20th key and its location in the file, and so on.
NA	CompressionOffsets		Compression offset maps holds the offset information for compressed blocks. By default all tables in Cassandra are compressed and more the compression ratio larger the compression offset table.

2. Physical Structures

Oracle

Cassandra

Parameter

Comments

RedoLog

CommitLog

commitlog_total_space_in_mb

commitlog_segment_size_in_mb

To maintain the durability of the Transaction any database should write the transaction to off a file like redo log or transaction log or a commit log in case of cassandra. . The purpose of the commitlog is to be able to recreate the memtable after a node crashes or gets rebooted. This is important, since the memtable only gets flushed to disk when it's 'full'

Datafile

SSTables

sstable_size_in_mb

sstable_min_threshold

In oracle the table is stored in form of segments and extents and blocks in a physical datafile. And it is contiguous or non-contigous and fits into a single or multiple large datafiles.

Where the SSTables

Each table will have a physical SSTable on disk
SSTables are immutable, well immutable means it cannot be written again, kind of Write and then Readonly (like temp)
So how about a write that need to modify the Table aka SSTable?
A transaction will create a new SSTable with new data on same disk with increment the file number
Both of this SSTables (aka files) will be merged together called Compaction when required or when reached the number of SSTables threshold
In multinode environment, this SSTables are spawned across nodes with data distributed based on Token Range of that node.

SSTables consists of different types:-

BloomFilter
Index
Compression File
Statistics File
Datafile
Summary File
TOC.txt

General Architecture

How Read Occurs in Cassandra?

Check the memtable
Check row cache, if enabled
Checks Bloom filter
Checks partition key cache, if enabled
Goes directly to the compression offset map if a partition key is found in the partition key cache, or checks the partition summary if not
If the partition summary is checked, then the partition index is accessed
Locates the data on disk using the compression offset map
Fetches the data from the SSTable on disk

How Write Occurs in Cassandra

Cassandra processes data at several stages on the write path, starting with the immediate logging of a write and ending in compaction:

Logging data in the commit log
Writing data to the memtable
Flushing data from the memtable
Storing data on disk in SSTables
Compaction, by merging the SSTables

3. Logical Structures

Oracle	Cassandra	Command	Comments
Schema	Keyspace	CREATE KEYSPACE TEST WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : n };	Keyspace is a schema a collection of Tables a top level namespace Replication Factor is used to determine the Mirroring of data in a multinode environment
Table	ColumnFamily	CREATE TABLE TEST.Users( ID text, firstname text, lastname text, PRIMARY KEY (ID) )	A table is often called as ColumnFamily in NO-SQL , created with primary key ,
ParitionKey	PartitionKey	CREATE TABLE TEST.Users( ID text, firstname text, lastname text, PRIMARY KEY (ID) )	primary key means the partition key to store the data in column pair not the primary key in Oracle. As like Oracle's Composite Primary Key aka Partitioning you can set multiple columns to the Primary Key
Indexes	Index	CREATE INDEX user_state ON TEST.Users (lastname);	Creating a simple index whichi inturn creates a file on disk.
Sequence	Counter	CREATE TABLE TEST.page_view_counts (counter_value counter, url_name varchar, page_name varchar, PRIMARY KEY (url_name, page_name) );	Here the datatype counter depicts , its a sequence based value, so you just can increment in your dml and code ex: UPDATE counterks.page_view_counts SET counter_value = counter_value + 1 WHERE url_name='www.datastax.com' AND page_name='home';
Trigger	Trigger	CREATE TRIGGER TEST.myTrigger ON myTable USING 'org.apache.cassandra.triggers.INDEXCHECK'	In RDBMS, Triggers are stored as compiled code in the memory of the Database called SharedPool/Library Cache, where in the cassandra implementation, To create a trigger, you must first build a jar with a class implementing the ITrigger interface and put it into the triggers directory on every node, then perform a CQL3 CREATE TRIGGER request to tie your trigger to a Cassandra table (or several tables)
Type	Type	CREATE TYPE TEST.address ( street text, city text, zip_code int, phones set<text> )	Facilitates handling multiple fields of related information, such as address information: street, city, and postal code.
Undo	Tombstones	gc_grace_seconds	Deletes do not remove the rows and columns immediately and in-place. Instead, Cassandra writes a special marker, called a tombstone, indicating that a row, column, or range of columns was deleted. These tombstones are kept for at least the period of time defined by the gc_grace_seconds per-table setting

Image Courtesy: Datastax

-Thanks

GeekDBA

Cassandra for Oracle DBA’s Part 2 – Three things you need to know

Cassandra has three important differences if compared with RDBMS,

NO-SQL
Data Stored as Columnar Structure
Distributed Systems with Nothing Shared Architecture

Cassandra is combination of the Distributed Processing Systems (for high availability and scalability) with NO-SQL (to process unstructured data & schema less structure) & with Column Store(to store wide variety and velocity of data), an adoption of Google BigTable & Amazon DynomoDB.

Let's delve into each of these things.

1. NOSQL - Not Only SQL

NoSQL technologies address narrow yet important business needs. Most NoSQL vendors support structured, semi-structured, or non-structured data which can be very useful indeed. The real value, comes in the fact that NoSQL can ingest HUGE amounts of data, very fast. Forget Gigabytes, and even Terabytes, we are talking Petabytes! Gobs and gobs of data! With clustering support and multi-threaded inner-workings, scaling to the future expected explosion of data will seem a no-brainer with a NoSQL environment in play. Let’s get excited, but temper it with the understanding that NoSQL is COMPLIMENTARY and not COMPETITIVE to ROW and COLUMN based databases. And also note that NoSQL is NOT A DATABASE but a high performance distributed file system and really great at dealing with lots and lots of data

There are three variants of NOSQL,

Key Value – which support fast transaction inserts (like an internet shopping cart); Generally stores data in memory and great for web. applications that need considerable in/out operations
Document Store (ex. MongoDB) – which stores highly unstructured data as named value pairs; great for web traffic analysis, detailed information, and applications that look at user behavior, actions, and logs in real time.
Column Store (Cassandra) – which is focused upon massive amounts of unstructured data across distributed systems (think Facebook & Google); great for shallow but wide based data relationships yet fails miserably at ad-hoc queries .

2. Data Stored as Columnar Structure

Imagine a Row vs Column Structure

Creating a Table in RDBMS Creating a Table using CQL (Thirft naming convention: Column Family) with Column Database like Cassandra

Create table emp Create table emp

(id number primary key, (id int,

name varchar2(10), name text,

age number, age number,

interests varchar2(100)); interests text,

PRIMARY KEY((id), name, age, interests)

The Nomenclature is as follows,

id: The Partition Key is responsible for data distribution across your nodes. Data sorted in the Columns with Key value in each separate files on the disk.
name:age:interests: The column key values that store for each partition

3. Distributed Systems with Shared Nothing Architecture

For Cassandra, the usage of distributed system is Shared Nothing Architecture where in Oracle RAC Architecture uses Shared Disk Architecture.
As you see, the data is stored in each node locally
And unlike RAC master-slave architecture(cluster & database instance level), this is peer-peer architecture and every node act as coordinator

As such the data is stored locally , data will be evenly distributed (stripes) to each node when data writes into tables.
Each Node contains the Token Range to accept and store data
Token Range is determined by Partition Schemes used by Cassandra Random Partitioner, Murmur partitioner, OrderPreserving Partitioner etc.
If a node is added/deleted, as like in ASM Diskgroup Re-balancing, Cassandra will re-balance node data to other nodes by rearranging the token assigned to each node.

But as such the data stores locally, there is likely hood that Data may lost in case of failure
To avoid this, while inserting the data , Cassandra uses Replicaton Factor for each Schema/Keyspace
So that all tables in that schema/keyspace will have the data mirrored (ASM Disk Mirroring) across nodes
This avoids single point of failure and as well as contention on Read/Write I/O.

Cassandra mirrors Data to Nodes based on Two things, Strategy and replication factor

SimpleStrategy (Single Data Centre), NetworkTopology Strategy with multiple DC or Single DC with specific identification of IP & Servers, so that data mirrors across data centres.
Replication Factor: How many copies of data should be maintained in cluster.

In the below diagram, with 5 Node Cassandra , we have created a Keyspace Called Messages with replication placement strategy of Simple & Replication Factor 1

Replication Factor: 1 : In this case 1, means the data will be not be mirrored and you see the IL, KY with their county has been written to different nodes

In case of failure of a node, the data in that node will not be available to the clients means a data loss/unavailability.

Example of a cluster with a composite partition key

Take another example, In the below diagram, with 5 Node Cassandra , we have created a Keyspace Called Messages with replication placement strategy of Simple & Replication Factor 3

Replication Factor: 3 : In this case 3, means the data will be mirrored to three nodes at least in the cluster and you see the IL, KY with their county has been written to different node,

In case of failure of a node, the data in that node will not be available but the co-ordinator will get the data from other replicas.

But in case of data centre loss, it is likely hood that whole cluster is not available.

Take another example, In the below diagram, with 5 Node Cassandra , we have created a Keyspace Called Messages with replication placement strategy of NetworkTopology Strategy & Replication Factor as, for us-west data center is 3 and us-east is 2.

As you see below, the data is spawned/mirrored three copies in us-west and two copies in us-east

Mutli-datacenter clusters for geograpically separate data

That's all about in this post, There's nothing new I have written above most of things available in Internet, but to have a series of posts I would like to have an introduction first and then dig into more for DBA's.

Lets take a closer look at architecture of Cassandra and its components for each node in next post.

Image Courtesy: Random Blogs in internet & Apache Cassandra website.

GeekDBA

Cassandra for Oracle DBA’s Part 6 – Installing Cassandra

Installing Cassandra is straight forward, With CCM, is a tool to simulate a Cassandra cluster in a single host, Ideal for test environments or you want to make your hands on in Cassandra.

Further to install Cassandra in production size, you can directly download the Cassandra from Apache.org or Datastax for an enterprise version.

I cover here both of these methods.

For CCM Installation,

Step 1: Upgrade Python to 2.7 (this is must for ccm based installations also we have to do alternate install for python to upgrade 2.7 if not this will break your libraries.)

cd /usr/src

wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz

tar xzf Python-2.7.10.tgz

cd Python-2.7.10

./configure

make altinstall

ls -ltr /usr/bin/python*

ls -ltr /usr/local/bin/python*

ln -s /usr/local/bin/python2.7 /usr/local/bin/python

wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py

sudo /usr/local/bin/python2.7 ez_setup.py

sudo /usr/local/bin/easy_install-2.7 pip

sudo easy_install six

sudo easy_install pyyaml

Step 2: Download the CCM Package from Git hub

git clone https://github.com/pcmanus/ccm.git

pip install ccm

change the setup.py to point to /usr/local/bin/python instead of /usr/bin/env python (this is important if not CCM will not work)

cd ccm; sudo ./setup.py install; cd ..

Step 3: Creating a CCM cluster for DSE stack (this command only download the Cassandra to the directory you are running into)

ccm create --version 1.2.0 --nodes 3 --start test

ccm create dse_cluster --dse --dse-username=emaildi --dse-password=password -v 4.5.2 -n 3 -s

Step 4: Verify the installation

root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node4 nodetool ring

Note: Ownership information does not include topology; for complete information, specify a keyspace

Datacenter: Cassandra

==============

Address Rack Status State Load Owns Token

127.0.0.1 rack1 Up Normal 3.05 MB 33.33% -9223372036854775808

127.0.0.2 rack1 Up Normal 2.99 MB 33.33% -3074457345618258603

127.0.0.3 rack1 Up Normal 3.5 MB 33.33% 3074457345618258602

Installing Cassandra Cluster Manually on each of the node

Step 1: Downloading and Installing Cassandra

sudo su - cassandra

wget http://mirrors.ae-online.de/apache/cassandra/2.1.3/apache-cassandra-2.1.3-bin.tar.gz

sudo tar -xvf /home/cassandra/apache-cassandra-2.1.3-bin.tar.gz -C /usr/local/

cd /usr/local/

sudo mv apache-cassandra-2.1.3 cassandra-2.1.3

sudo ln -s /usr/local/cassandra-2.1.3 /usr/local/cassandra

sudo chown -R cassandra:cassandra /usr/local/cassandra-2.1.3

xport JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")

export CASSANDRA_INSTALL=/usr/local/cassandra

export CASSANDRA_HOME=$CASSANDRA_INSTALL

Just edit the $CASSANDRA_HOME/conf/cassandra.yaml file vim $CASSANDRA_HOME/conf/cassandra.yaml and replace the following keys with proper values:

cluster_name: 'MyFirstCluster'

seeds: "10.64.200.47, 10.64.200.48"

listen_address: 10.64.200.47

rpc_address: 10.64.200.47

Note: Seeds address are those nodes performing as seed, Listen and RPC address should be the node IP which you have installed.

Step 2: Start Cassandra on all nodes

$CASSANDRA_HOME/bin/cassandra -f

Step 3: Check status of cluster

$CASSANDRA_HOME/bin/nodetool status

-Thanks

Geek DBA

Cassandra for Oracle DBA’s – Part 1 : Nomenclature or Commands or Features Comparision

Although, there are heaps of fundamental changes between Oracle & Cassandra,

Just for fun, or for nomenclature or for commands or similar features are listed here , which may provide handy information to Oracle DBA's

Oracle Keyword	Cassandra Feature / Nomenclature
SGA	Java Heap Memory
Buffer Cache	Memtables
LRU List	Parition Index Summary Cache
Buffer Cache	Partition Key Cache
KeepPool/Result Cache	Row Cache
Background Process	Nothing Like that
SQLPLUS	CQLSH
TNS	THRIFT
Dictionary Cache	Nothing Like That
Tablespace	Keyspace
Table	Column Family/Table
Datafiles	SSTABLES
Index	Index for partition key
Partition Key for row level	Partition Key as column level to store relevant rows
Merge Partition	Compaction of SSTABLES
sql_trace = on	tracing on
C and ANSI SQL	Java and CQL
Schema	Keyspace
User	User
SYSDBA	SuperUser
Backup Copies	SNAPSHOTS
Incremental	Incremental Backups
redolog	commitlog
undo	tombstone
Temp	Data Actually Stores in Order no need of temp
Interconnect Protocol RDP/UDP	Gossip Protocol
Network /Disk Heartbeat	Only one heartbeat network heartbeat in messages
Master / Slave Nodes	Peer Nodes (no master) indeed every node act as coordinator
Shared Storage	Local Storage to each node
ASM Striping	Tokens: Data Stripes using token range at node level
ASM Mirroring	Replication Factor: Data Mirrors across nodes using RF=ONE, ALL, N..
Disk Repair Time	Hinted Handoffs
Prefered_read_failgroup, diskgroup level	table level / query level CL=QUORUM, LOCAL, ONE, ALL
init.ora or spfile.ora	cassandra.yaml in /softwarelocation/conf/
redo log sizes & location	commitlog size determined by parameters in cassandra.yaml file commitlog_directory
redo log flush	commitlog flush determined by parameter commitlog_segment_size_in_mb,commitlog_sync_period_in_ms,commitlog_sync
archives	commitlog will be copied as archives determined by parameter cassandra/conf/commitlog_archiving.properties
alert log/crsd.log/node.log	Log located at /cassandrasoftware/clustername/nodename/logs/system.log
crsctl / srvctl	nodetool
Rebalance of Data in diskgroups	nodetool repair or nodetool repair -st -et
private network or private ips	no private ip's
OCR/Voting Disk	Nothing Like that, but maintains list of peer nodes to communicate using seeds parameter in cassandra.yaml
scn_to_timestamp or rowscn functions	writetime function
alter session set schema	use keyspace
dbms_metadata.get_ddl	desc tablename or describe keyspace shows metadata
v$asm_operation, rebalance operation	nodetool netstats or nodetool tpstats
cssd.log (disk heartbeat and network heartbeat)	nodetool gossipinfo
size of table: bytes in dba_tables	Because the table data is striped across the nodes, For each node, calculate Space used (total), bytes: output from nodetool cfstats keyspace.tablename
num of rows in table: num_rows in dba_tables	Because the table row/columns are striped across the nodes, for each node, calculate Number of Keys (estimate) from Number of keys (estimate): Number of keys (estimate): 10112
size of Tablespace / Schema: sum(bytes) in dba_segments	For each node, calculate the Space Used (total), bytes: output from nodetool cfstats system \| grep "Space Used (total)" \| awk '{ SUM += $5} END { print SUM/1024/1024 }'
Size of Buffer Cache	For each node, calculate the each keyspace memtables count nodetool cfstats \| grep "Memtable data size" \| awk '{ SUM += $5} END { print SUM/1024/1024 }'
Flush Buffer Cache	nodetool flush keyspace tablename
Size of result cache	Row Cache
undo_retention	gc_grace_seconds for holding tombstones
addnode.sh	nodetool join
crsctl delete node	nodetool decomission and then when status show decommissioned , nodetool remove token
result cache size / keep pool size	row_cache_size_in_mb
multiple dbwr writers	memtable_flush_writers parameter in cassandra.yaml parameter file
buffer cache flushing (3 seconds,ckpt etc)	memtable_cleanup_threshold parameter in cassandra.yaml
Shutdown instance, srvctl stop instance	nodetool -h $(hostname) -p 8080 disablegossip nodetool -h $(hostname) -p 8080 dissablethrift nodetool -h $(hostname) -p 8080 drain /etc/init.d/cassandra stop
crsctl status cluster	nodetool status o nodetool ring or nodetool describecluster
rman>backup tablespace tag 'today'	For each node, backup keyspace -t today backup keyspace.table -t today
rman> backup incremental tag 'incr'	Just enable incremental_backups: true in cassandra.yaml file, which creates incremental snapshots when take snapshot using backup keyspace -t
rebuild index or reorg table	rebuild index keyspace tablename idx1, idx2
logical recovery rman> recover block 57; rman> recover datafile	logical recovery at table level scrub [keyspace] [tablename] [-s\|--skip-corrupted]
restricted mode?	nodetool drain
statistics	Nothing like that

Disabling Active Dataguard Feature

As we all know Active Dataguard is a licensed feature and whilst there is no parameter to control, indeed if just in case if you do not have a license and by chance your DBA has opened the standby database in readonly and started MRP then you are in troubles as it activates the Active Dataguard.

In earlier versions when you open the standby in readonly and try to start MRP it will not start. But from 11g Onwards , MRP will still start and that activates your active dataguard feature.

If you want to disable completely that feature by not accidentally enabling it, there is undocumented parameter called

SQL> alter system set “_query_on_physical”=false scope=spfile;

System altered.

And when you open readonly and try to start MRP

ORA-16669: instance cannot be opened because the Active Data Guard option is disabled

-Thanks

Database Migration & Upgrade 11g to 12c, Non ASM to ASM, Non RAC to RAC

Consider a Typical Database Migration which contains all of this below (and if down time is not a problem)

1. Into a new Database Host (NEW SERVER)

2. Convert to Non-ASM to ASM

3. Convert to Single Instance to RAC,

4. Upgrade to Database to 12c

Environment Details

Source	Target
DB Version: 11.2.0.4	DB Version: 12.1.0.2
Instance Type : Standalone	Instance Type : RAC
Storage: Non - ASM	Storage Type: ASM

##################################################################################

Step 1:- In order to upgrade or convert to rac We must install 11g software as non - rac assuming installed in Target side

##################################################################################

Assuming its already installed here, Oracle_Home: /u01/app/oracle/11.2.0.4/db_1

###################################################################################

Step 2:- Create Tnsnames.ora and Listener.ora (separate) in 11.2.0.4 home in target side

###################################################################################

in new 11g home

tnsnames.ora

dbdup =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = target.localdomain)(PORT = 1522))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dbname.localdomain)

)

dbname =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = source.localdomain)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dbname.localdomain)

)

listener.ora

SID_LIST_LISTENER =

(SID_LIST =

(SID_DESC =

(GLOBAL_DBNAME = dbname.localdomain)

(ORACLE_HOME = /ora/app/oracle/product/12.1.0.1/db_1)

(SID_NAME = dbname)

)

LISTENER =

(DESCRIPTION_LIST =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = target.localdomain)(PORT = 1522))

)

Start the listener

cd /u01/app/oracle/11.2.0.4/db_1/network/admin

lsnrctl start

###################################################################################

Step 3:- Copy password file to target machine

###################################################################################

On source

scp $ORACLE_HOME/dbs/orapwdbname target:$ORACLE_HOME/orapwdbname

###################################################################################

Step 4:- Take a pfile copy in source database and copy it

###################################################################################

sqlplus / as sysdba

create pfile='/tmp/initdbname.ora' from spfile;

###################################################################################

Step 5:- Modify the parameter file especially the following

###################################################################################

*.control_files='+DATA','+DATA'

*.db_create_file_dest='+DATA'

*.db_recovery_file_dest='+FRADG'

*.log_archive_dest_1='location=+FRADG'

#######################################################

Sample File

#######################################################

dbname.__data_transfer_cache_size=0

dbname.__db_cache_size=536870912

dbname.__java_pool_size=16777216

dbname.__large_pool_size=150994944

dbname.__oracle_base='/ora/app/oracle'

dbname.__pga_aggregate_target=671088640

dbname.__sga_target=989855744

dbname.__shared_io_pool_size=50331648

dbname.__shared_pool_size=218103808

dbname.__streams_pool_size=0

*.compatible='12.1.0.2.0'

*.control_files='+DATA','+DATA'

*.db_block_size=8192

*.db_domain='localdomain'

*.db_name='dbname'

*.db_create_file_dest='+DATA'

*.db_recovery_file_dest='+FRADG'

*.db_recovery_file_dest_size=4560m

*.diagnostic_dest='/ora/app/oracle'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=dbnameXDB)'

*.log_archive_dest_1='location=+FRADG'

*.log_archive_dest_state_1='enable'

*.open_cursors=300

*.processes=300

*.remote_login_passwordfile='EXCLUSIVE'

*.undo_tablespace='UNDOTBS1'

###################################################################################

Step 6:- Start the instance in non mount mode in target machine

###################################################################################

export ORACLE_HOME=/u01/app/oracle/11.2.0.4/db_1

export PATH=$ORACLE_HOME/bin:$PATH

export ORACLE_SID=dbname

sqlplus / as sysdba

startup nomount pfile='$ORACLE_HOME/dbs/initdbname.ora'

###################################################################################

Step 7:- In target machine, Using RMAN , run duplicate command as below

###################################################################################

rman

connect target sys/password@dbname

connect auxiliary sys/password@dbdup (must to be dbdup as per your tns entry)

run {

allocate channel c1 device type disk;

allocate auxiliary channel t1 device type disk;

duplicate target database to dbname from active database

pfile=$ORACLE_HOME/dbs/initdbname.ora

logfile

group 1 ('+DATA','+FRADG') SIZE 200M;

group 2 ('+DATA','+FRADG') SIZE 200M;

release channel t1;

}

The above will create a database in ASM with single instance now you need to make this as RAC instance before upgrading to 12c.

###################################################################################

Step 8:- Convert Database to RAC, In target database run the following`

###################################################################################

alter database add logfile thread 2 group 3 ('+DATA','+FRADG') size 200m reuse;

alter database add logfile thread 2 group 4 ('+DATA','+FRADG') size 200m reuse;

alter database enable public thread 2;

create undo tablespace UNDOTBS2 datafile '+DATA' size 500M;

###################################################################################

Step 9:- Add the parameters in init file and copy the parameter files to remote nodes as well

###################################################################################

*.cluster_database_instances=2

*.cluster_database=true

dbname1.instance_number=1

dbname2.instance_number=2

dbname1.thread=1

dbname2.thread=2

dbanme1.undo_tablespace='UNDOTBS1'

dbanme2.undo_tablespace='UNDOTBS2'

cd $ORACLE_HOME/dbs

cp initdbname.ora initdbname1.ora

scp initdbname.ora node2:$ORACLE_HOME/dbs/initdbname2.ora

###################################################################################

Step 10: Shut down and compile the binaries with RAC option, this ensures oracle home is enabled with RAC options

###################################################################################

sqlplus / as sysdba

shut immediate

Convert the binaries to RAC

cd $ORACLE_HOME/rdbms/lib

make -f ins_rdbms.mk rac_on

make -f ins_rdbms.mk ioracle

###################################################################################

Step 11: Startup the instance in node 1/db_1

###################################################################################

Change the

export ORACLE_HOME=/u01/app/oracle/11.2.0.4/db_1

export PATH=$ORACLE_HOME/bin:$PATH

export ORACLE_SID=dbname1 --> ensure this is instance name you are starting

sqlplus / as sysdba

startup mount pfile='/u01/app/oracle/11.2.0.4/db_1/initdbname1.ora'

create spfile='+DATA' from pfile;

shut immediate

startup

###################################################################################

Step 12 : Register database with Oracle CRS

###################################################################################

srvctl add database -d dbname -o /u01/app/oracle/11.2.0.4/db_1 -p $ORACLE_HOME/dbs/spfileDBNAME.ora

srvctl add instance -d dbname -i dbname1 -n node1

srvctl add instance -d dbanme -i dbname2 -n node2

###################################################################################

Step 13:- With all above, the database is converted to RAC & ASM, now the remaining part is to upgrade the database to 12c.

###################################################################################

# Stop the database &

srvctl stop database -d dbname

# Copy the password file and spfile to new home

$ cp /u01/app/oracle/11.2.0.4/db_1/dbs/spfiledbname.ora /u01/app/oracle/product/12.1.0.2/db_1/dbs

$ cp /u01/app/oracle/11.2.0.4/db_1/dbs/orapwdbname1 /u01/app/oracle/product/12.1.0.2/db_1/dbs

# Set Oracle HOME and PATH to 12c database home

export ORACLE_HOME=/u01/app/oracle/12.1.0.2/db_1

export PATH=$ORACLE_HOME/bin:$PATH

export ORACLE_SID=dbname1

sqlplus / as sysdba

startup upgrade

cd $ORACLE_HOME/rdbms/admin

$ORACLE_HOME/perl/bin/perl catctl.pl catupgrd.sql

###################################################################################

Step 13: Change the CRS registry for new home

###################################################################################

srvctl modify database -d dbname -o /u01/app/oracle/12.1.0.2/db_1

###################################################################################

Step 14:- Change the environemnt variables and /etc/oratab entries to new home

###################################################################################

Performance Troubleshooting Series : Identifying or Estimating How much PGA needed by Session/Query

Have got a question from one of fellow DBA that how much PGA the session will use or needed.

Well to answer this, first we need to know what PGA looks like and what it used for?

The follow shows the components of a PGA, the little space used for Stack, Env Variables and Private SQL area.

Image Courtesy: Oracle Architecture Diagram

Where the most of the utilisation will be SORT AREA, HASH AREA

SORT AREA : is used when the session performing a sorting of data and use approximately

HASH AREA: is used when execution plan contains hash join and stores the hashed data and once data is matched or processed the final data put into buffer cache (db file scattered read) but stores with in PGA for Direct Path Reads.

Bit Map Merge area:- When using bitmap keys and merging of data uses.

To further dig into , the PGA setting is not hard limit, it can grow and utilize all free memory at OS if needed by oracle sessions. Until 12c you cannot limit the pga memory size.

For example if you have given PGA_AGGREGATE_TARGET =5g , pga can still grow more than 5gb accumulating of all sessions not by single session.

In dedicated architecture where each session will have its own PGA space and put the private data like variables , sorting, hashed data etc. However, although the PGA can grow much more than your setting each individual session can have only 200MB of PGA due to hidden parameter _pga_max_size which default to 200MB or 5% of pga_aggregate_target parameter. So that means if a session needs more than 200MB it has to spill to disk to do sorting.

How to find how much each of this area used by your query ,

select inst_id,sid,sql_id,operation_type,sum(actual_mem_used)/1024/1024,sum(work_area_size)/1024/1024, sum(tempseg_size)/1024/1024 from gv$sql_workarea_active group by inst_id,sid,sql_id,operation_type;

So far we seen the PGA dynamics and allocations of memory, but how to know and estimating how much sorting/hash need of PGA Memory. As of my understanding, this is two fold,

Sort : Once the sort area is full with in pga it will spill to temp space , so ideally with gather plan statistics hint we can find out the temporary space need

Hash: Basically a full table scans, ideally used for hash joins of two tables, find the two tables that join with hash you are looking for find the avg_row_len * num of rows should give you approximate hash utilization.

For example consider a simple query below, which does a full table scan of each table O,T,I.

The plan shows three tables having full tables doing hash join to finalize the data after equal predicates.

Lets look at the calculation tables (here I done some estimations to prove avg_row_len can be used but cases may vary, and its not hard fact rule it will be matched)

Estimation Calculation, to do estimation I have taken the exact row length and rows from dba_tables and the number of rows returned from query. I have added few columns manually to do calculation, the query may not return below.

Lets look at v$sql_workarea_active to see how much expected and actual pga / workarea used for this query.

As you see above, the Expected_MB column shows the expected pga utilization for this query which matches to our Estimation table above.

Happy Reading

Geek DBA

All about Database Administration, Tips & Tricks

New Features for DBA’s

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Pages

Cassandra for Oracle DBA’s Part 5 : Read and Write Consistency

12c Upgrade Gotchas

Cassandra for Oracle DBA’s Part 4 – Data Distribution across Nodes

Cassandra for Oracle DBA’s Part 3 – Architecture

Cassandra for Oracle DBA’s Part 2 – Three things you need to know

Cassandra for Oracle DBA’s Part 6 – Installing Cassandra

Cassandra for Oracle DBA’s – Part 1 : Nomenclature or Commands or Features Comparision

Disabling Active Dataguard Feature

Database Migration & Upgrade 11g to 12c, Non ASM to ASM, Non RAC to RAC

Performance Troubleshooting Series : Identifying or Estimating How much PGA needed by Session/Query

New Features for DBA’s

Follow Me!!!

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Tags

Pages