Subscriber Count

    420

Subscribe2

Pages

Contact for Online learnings
* indicates required field
Thanks for the inquiry. You will be receiving a mail shortly with course details. If not please check the spam folder once and mark our mail as not a spam.

Oracle 12.2 New Features : Convert Non Partitioned Table to Partitioned Table using ALTER

In earlier versions of Oracle, when one want to convert a table to partition, one must use move or export/import method and rename it.

Now in 12.2 release we can use ALTER table command to convert the table into Partitioned Table. Here is sample command it is from the documentation excerpt,

ALTER TABLE test_table MODIFY

PARTITION BY RANGE (amount) INTERVAL (100)

(PARTITION P1 VALUES LESS THAN (500), PARTITION P2 VALUES LESS THAN (1000), 

ONLINE

UPDATE INDEXES (IDX01_AMOUNT LOCAL (PARTITION IDX_P1 VALUES LESS THAN (MAXVALUE)));

-Geek DBA

12.2 New Features : Memory Management at PDB Level

Memory & Guaranteed BufferCache/Shared Pool can be allocated at PDB Level. From the white paper here it is

 

 

 

 

 

 

 

 

Well so far I haven't seen any such need to keep a separate memory settings for each PDB as we have a consolidated databases of same size/resource usage sitting in one single container. But a good feature just in case if one PDB requires much SGA than others this can help.

-Thanks

Geek DBA

Oracle 12.2 New Features : Local UNDO and Flashback PDB Database

Hi,

You can now flashback PDB database exclusively with Local UNDO enabled. In earlier versions, Oracle shares the undo tablespace for CDB and PDB's and creates a common view for transaction and instance recovery done at all CDB and PDB level. With local undo , Oracle has to change this a bit, it has to either maintain records of master cdb and pdb in shared undo and also in Local UNDO for pdb level. I will have to further dig into, but for now this feature is available and flashback is possible.

To do this, Enable Local UNDO on PDB level and restart the database. It will create a undo tablespace automatically and PDB start using it.

Steps

SQL> shutdown immediate

SQL> startup upgrade

SQL> alter database local undo on;

SQL> shutdown immediate

SQL> alter pluggable database all open;

SQL> select name,con_id from v$tablespace where name like '%UNDO%' ;

NAME         CON_ID

---------- -------------

UNDOTBS1        1  -- >Root

UNDO_1          2  -- >PDBSEED

UNDO_1          3  -- >PDB1

Do a flashback,

SQL> flashback pluggable database pdb1 to timestamp systimestamp - interval '1' hour;

SQL> alter pluggable database pdb1 open resetlogs;

-Thanks

GEEK DBA

 

Oracle 12.2 New Features : SQLPLUS Enhancements

In 12.2 release, one of the coolest things that every DBA would love is to get history in SQLPLUS :).

Now you can have history ON/OFF with SQLPLUS and see history

SET HISTORY ON|OFF

SHOW HISTORY

Some other features like, FAST OPTION  -F flag to set the ARRAYSIZE PAGESIZE STATEMENTCACHE all at once.

Cool one is HISTORY feature.

-Thanks

Geek DBA

Oracle 12.2 New Features : Oracle Sharding on its Way

DBA's , Oracle is coming with Sharding.  The one killer feature that No-SQL databases claiming distributed processing with sharding aka a non-shared database storage.

Now with release 12.2 Oracle releasing Sharding feature, with new command "Create Sharded Table" and with a catalog schema, well hold on second , its basically distributed partitioning relied on partitioning feature and standby and you need a license for it :). Oh No common...

I really want to dirt my hands on this feature hearing of many new databases from past few years, and I want to tell "Hey Oracle supports this too" 🙂

From the internet sources and OOW presentations, what I understood from Oracle Sharding , Looks complex to me not a simple as like other databases.

  •  uses dbca for creating shards intially
  •  use a new catalog called GSM (global service manager a features introduced in 11.2)
  •  and catalog contains sharded nodes and key information(mongo config instance)
  •  can be created by using "Create Sharded Table" command
  •  uses consistent hashing with either Traditional hashing/Linear Hashing (even) 
  •  is based on Distributed Partitioning (licensing)
  •  needs standby databases (active dataguard and its licences)
  •  uses set of tablespaces for each shard
  •  uses dblinks for getting data from each shard
  •  use GSM service and clients need to use this service for their connection (like mongos )
  •  uses listener to redirect the connection to right shard for your data (ex: mongo router instance)

What best with this feature is , Unlike Many NoSQL featurs lack of capability having RDBMS and ACID complaince with distributed processing capabilities , whilst if Oracle can keep this sharding simple then this can be a whistle blower to all other new databases.

-Thanks

Geek DBA

Oracle 12.2 New Features : Long Identifiers – What a relief

Oracle 12.2. is out and in Cloud first. 🙂

Out of all features one future must aware and important thing is long identifiers limitation relief , earlier any table or index cannot exceed more than 30 character length.

Not sure how many of you have face issues with character limit, but I was, and many times encountered this limitation as an obstacle to push Oracle as the database for some projects. And I have suggested alternate approaches always to developers and they used to refer some other database allow it why not Oracle. 🙂

Especially when you use Salesforce as your CRM , it allowed more than 30 characters as name length for an object (ofcourse the underlying database for Salesforce is oracle only but they have their own application layer tables like apex ) and having a Datawarehouse or data lake by pulling the data from salesforce and from your application and creating a BI reports and dimensions is difficult in such cases.

So well, now back to post , now Oracle allows more than 30 characters for a name of table/index etc.

As 12.2. its in beta and only released in cloud so far, borrowed the sample code from Oracle-base.com

CREATE TABLE this_is_a_table_to_hold_employees_please_dont_put_customers_in_it (
  this_is_the_primary_key_column_which_uniquely_identifies_the_row  NUMBER,
  this_is_for_the_employee_name_so_dont_put_other_crap_in_it        VARCHAR2(100),
  CONSTRAINT this_is_a_table_to_hold_employees_please_dont_put_customers_in_it_pk
    PRIMARY KEY (this_is_the_primary_key_column_which_uniquely_identifies_the_row)
);

Table created.

SQL>
SQL> desc dba_tables
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                     NOT NULL <strong>VARCHAR2(128)</strong>
 TABLE_NAME                                NOT NULL <strong>VARCHAR2(128)</strong>
 TABLESPACE_NAME                                    VARCHAR2(30)
 CLUSTER_NAME                                       <strong>VARCHAR2(128)</strong>
 IOT_NAME                                           <strong>VARCHAR2(128)</strong>
 STATUS                                             VARCHAR2(8)
 .
 .
 .
 CONTAINER_MAP_OBJECT                               VARCHAR2(3)

MongoDB for Oracle DBA’s Part 11 : Enabling Preferred Read on Standby (Secondary) Instances

As we know in ASM we can specificy the preferred read fail group to scatter the reads across the failgroups to reduce the read contention on single disk group. Similarly in MongoDB you can set to read the data from secondary servers i.e standby, In other words take this as active dataguard feature where you can run queries and backups on standby.

Here is how to do it.

### Check the Lag of the Standby or secondaries

MongoDB Enterprise rs0:SECONDARY> rs.printSlaveReplicationInfo()

source: localhost:47018

        syncedTo: Mon Dec 12 2016 09:15:35 GMT+1100 (AEDT)

        0 secs (0 hrs) behind the primary

source: localhost:47020

        syncedTo: Mon Dec 12 2016 09:15:35 GMT+1100 (AEDT)

        0 secs (0 hrs) behind the primary

MongoDB Enterprise rs0:SECONDARY>

### Connect to any secondary host and run the query, eventually it throw errors

mongo localhost:47018

MongoDB Enterprise rs0:SECONDARY> use foo

switched to db foo

MongoDB Enterprise rs0:SECONDARY>  db.testData.find()

Error: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }

MongoDB Enterprise rs0:SECONDARY>

Error out as its not master

### Enable the reads on slave (remember you need to do every time when you connect

MongoDB Enterprise rs0:SECONDARY> rs.slaveOk()

MongoDB Enterprise rs0:SECONDARY> use foo

switched to db foo

MongoDB Enterprise rs0:SECONDARY> db.testData.find()

{ "_id" : ObjectId("5835286da372386306caeee2"), "x" : 1 }

{ "_id" : ObjectId("5835286da372386306caeee3"), "x" : 2 }

{ "_id" : ObjectId("5835286da372386306caeee4"), "x" : 3 }

{ "_id" : ObjectId("5835286da372386306caeee5"), "x" : 4 }

{ "_id" : ObjectId("5835286da372386306caeee6"), "x" : 5 }

{ "_id" : ObjectId("5835286da372386306caeee7"), "x" : 6 }

{ "_id" : ObjectId("5835286da372386306caeee8"), "x" : 7 }

{ "_id" : ObjectId("5835286da372386306caeee9"), "x" : 8 }

{ "_id" : ObjectId("5835286da372386306caeeea"), "x" : 9 }

{ "_id" : ObjectId("5835286da372386306caeeeb"), "x" : 10 }

{ "_id" : ObjectId("5835286da372386306caeeec"), "x" : 11 }

{ "_id" : ObjectId("5835286da372386306caeeed"), "x" : 12 }

{ "_id" : ObjectId("5835286da372386306caeeee"), "x" : 13 }

{ "_id" : ObjectId("5835286da372386306caeeef"), "x" : 14 }

{ "_id" : ObjectId("5835286da372386306caeef0"), "x" : 15 }

{ "_id" : ObjectId("5835286da372386306caeef1"), "x" : 16 }

{ "_id" : ObjectId("5835286da372386306caeef2"), "x" : 17 }

{ "_id" : ObjectId("5835286da372386306caeef3"), "x" : 18 }

{ "_id" : ObjectId("5835286da372386306caeef4"), "x" : 19 }

{ "_id" : ObjectId("5835286da372386306caeef6"), "x" : 21 }

Type "it" for more

MongoDB Enterprise rs0:SECONDARY>

-Thanks

Geek DBA

 

 

MongoDB for Oracle DBA’s Part 10 : Switchover & Role Transition between Primary and Secondary

In MongoDB as like Oracle Dataguard switchover is possible to but in different way.

You can ask Primary to stepdown and let nodes elect a new primary in replicaset. This can be done using rs.stepDown() method.

Before stepping down primary, one must know the below

  • The procedure blocks all writes to primary while it runs
  • Terminates all sessions and running jobs, index rebuilts etc
  • Disconnects all connections
  • If the wait period to elect a new primary exceeds then primary will not step down.

Let's take a look of the same,

  1. I have opened three sessions and in one sessions querying a table which is running (the right top most one in screenshot)
  2. In another session tail the log file so can understand what happens when a step down is initiated
  3. In the session that left side , initiated the stepdown of primary , immediately disconnects the current session and also the session that running a query on the database (top left side)
  4. In the log it shows the primary is transitioning to secondary
  5. as per rs.status() the 47018 (original primary) becomes secondary.

Screenshot

 

MongoDB for Oracle DBA’s Part 9 : Node Evictions – Losing a Shard

In the previous post we saw when a node in replicaset lost, the secondary nodes become a primary and all data is available.

In this post we will see what if , if a shard is completely lost. A picture looks like this.

As the data is distributed across the shards, if a shard with all nodes belong to that shard, the Data is partially available. Thats why perhaps MongoDB called Basic Availability instead of High Availability (as of my understanding, anyone knows better than this can correct me as well)

Lets lose a shard, we have shard1 with three nodes of replicaset, lets kill the process.

root@wash-i-03aaefdf-restore ~ $ ps -eaf | grep mongo

root      4462  4372  0 13:02 pts/0    00:00:00 grep mongo

root     21678     1  0 Nov23 ?        01:35:23 mongod -f /backups/data/cluster/shard1/rs1.conf

root     21744     1  0 Nov23 ?        01:32:55 mongod -f /backups/data/cluster/shard1/rs2.conf

root     21827     1  0 Nov23 ?        01:17:21 mongod -f /backups/data/cluster/mongoc/mongoc.conf

root     21844     1  0 Nov23 ?        00:38:52 mongos -f /backups/data/cluster/mongos/mongos.conf

root     22075     1  0 Nov23 ?        01:32:45 mongod -f /backups/data/cluster/shard2/rs0.conf

root     22096     1  0 Nov23 ?        01:21:05 mongod -f /backups/data/cluster/shard2/rs1.conf

root     22117     1  0 Nov23 ?        01:21:10 mongod -f /backups/data/cluster/shard2/rs2.conf

root     30107     1  0 Nov24 ?        01:11:14 mongod -f /backups/data/cluster/shard1/rs0.conf

### Kill the process highlighted , this eventually bring down the shard

root@wash-i-03aaefdf-restore ~ $ kill -9 30107 21678 21744

### Lets log to mongos instance and check the sharding status

Let me explain the out put

  1. This has two shard with three nodes in each rs0 and rs1 replicaset.
  2. Last reported error says that rs0 replicaset in shard1 is not reachable of any nodes.
  3. Database Foo has sharding enabled
  4. Under Foo Database testData table is partitioned across nodes with three each of these chunks distributed

MongoDB Enterprise mongos> sh.status()

--- Sharding Status ---

  sharding version: {

        "_id" : 1,

 ....

 }

  shards:

        {  "_id" : "rs0",  "host" : "rs0/localhost:47018,localhost:47019,localhost:47020" }

        {  "_id" : "rs1",  "host" : "rs1/localhost:57018,localhost:57019,localhost:57020" }

 ..

 balancer:

...

       Last reported error:  None of the hosts for replica set rs0 could be contacted.

        Time of Reported error:  Fri Dec 09 2016 13:03:35 GMT+1100 (AEDT)

..

 databases:

        {  "_id" : "foo",  "primary" : "rs0",  "partitioned" : true }

                foo.testData

                        shard key: { "x" : "hashed" }

                        unique: false

                        balancing: true

                        chunks:

                                rs0     3

                                rs1     3

                        { "x" : { "$minKey" : 1 } } -->> { "x" : NumberLong("-6932371426663274793") } on : rs1 Timestamp(3, 0)

                        { "x" : NumberLong("-6932371426663274793") } -->> { "x" : NumberLong("-4611686018427387902") } on : rs0 Timestamp(3, 1)

                        { "x" : NumberLong("-4611686018427387902") } -->> { "x" : NumberLong("-2303618986662011902") } on : rs0 Timestamp(2, 8)

                        { "x" : NumberLong("-2303618986662011902") } -->> { "x" : NumberLong(0) } on : rs0 Timestamp(2, 9)

                        { "x" : NumberLong(0) } -->> { "x" : NumberLong("4611686018427387902") } on : rs1 Timestamp(2, 4)

                        { "x" : NumberLong("4611686018427387902") } -->> { "x" : { "$maxKey" : 1 } } on : rs1 Timestamp(2, 5)

        

### As we killed the mongod process for RS0 replicaset, lets take a look of Primary node log file replicaset RS1 in shard2

It reports , all nodes in replicaset rs0 are down and triying to keep pooling.

root@wash-i-03aaefdf-restore /backups/data/cluster/shard2/rs1/0/logs $ tail -20f rs1.log

2016-12-09T13:04:02.553+1100 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 127.0.0.1:47020, reason: errno:111 Connection refused

2016-12-09T13:04:02.553+1100 W NETWORK  [ReplicaSetMonitorWatcher] Failed to connect to 127.0.0.1:47018, reason: errno:111 Connection refused

2016-12-09T13:04:02.553+1100 W NETWORK  [ReplicaSetMonitorWatcher] No primary detected for set rs0

2016-12-09T13:04:02.553+1100 I NETWORK  [ReplicaSetMonitorWatcher] All nodes for set rs0 are down. This has happened for 6 checks in a row. Polling will stop after 24 more failed checks

This means, another shard i.e shard2 which having replicaset rs1 and its data is available. This is what partial data availability. Lets query the table with random data, so we can see any errors

### If I want to see all documents aka rows , this will throw an error right away saying rs0 cannot be contacted.

mongo localhost:27017

MongoDB Enterprise mongos> db.testData.find()

Error: error: {

        "ok" : 0,

        "errmsg" : "None of the hosts for replica set rs0 could be contacted.",

        "code" : 71

}

MongoDB Enterprise mongos>

### Lets find the random data, find the document with x=750000, throws errors , so it seems that this particular document is in replicaset rs0

MongoDB Enterprise mongos> db.testData.find({x : 750000})

Error: error: {

        "ok" : 0,

        "errmsg" : "None of the hosts for replica set rs0 could be contacted.",

        "code" : 71

}

### Lets find the document for 1000000, see its available and shown the row. as such this document is available in replicaset rs1

MongoDB Enterprise mongos> db.testData.find({x : 1000000})

{ "_id" : ObjectId("58362c23c5169089bcd6683e"), "x" : 1000000 }

### Lets find the document for 900000, again this row got an error

MongoDB Enterprise mongos> db.testData.find({x : 900000})

Error: error: {

        "ok" : 0,

        "errmsg" : "None of the hosts for replica set rs0 could be contacted.",

        "code" : 71

}

### Lets find for row 900001, just a next row of previous one. Ohoa, its available no error.

MongoDB Enterprise mongos> db.testData.find({x : 900001})

{ "_id" : ObjectId("58362bcbc5169089bcd4e19f"), "x" : 900001 }

MongoDB Enterprise mongos>

### Some more random tests, few rows resulted and few not.

MongoDB Enterprise mongos> db.testData.find({x : 840000 })

{ "_id" : ObjectId("58362b98c5169089bcd3f73e"), "x" : 840000 }

MongoDB Enterprise mongos> db.testData.find({x : 930000 })

{ "_id" : ObjectId("58362be5c5169089bcd556ce"), "x" : 930000 }

MongoDB Enterprise mongos> db.testData.find({x : 250000 })

Error: error: {

        "ok" : 0,

        "errmsg" : "None of the hosts for replica set rs0 could be contacted.",

        "code" : 71

}

So plan the shards or shard key in such a way that you achieve high availability even in case complete loss.

-Thanks

Geek DBA

MongoDB for Oracle DBA’s Part 9 : Node Evictions

In MongoDB the high availability is called as basic availability, means in event a primary replicaset lost , the secondary can become primary with in that replicaset. But when the whole shard is down (including all nodes within that replicaset) then there is no High Availability, the data is partially available with respects to the nodes that are available as such of non shared storage, (well its the basis for a distributed processing system )

In this post we will see, What if a Primary Node is down with in replicaset  (as like below), how other nodes elect primary

Case 1:- Primary Node eviction with in replicaset

In our configuration, the Replicaset RS0 is sitting on shard1 with nodes 0 (port 47018) , 1 (port 47019) and 2 (port 47020)

So if we kill the process for node 0 in the RS0 , then the secondary nodes can become primary.

Let's check the process for shard1, RS0 and kill the primary node

### Kill the mongo process of replicaset rs0 , node 11-24T02

ps -eaf | grep mongo

root@wash-i-03aaefdf-restore /backups/data/cluster/shard1/rs0/2/logs $ ps -eaf | grep mongo

root     21695     1  1 Nov23 ?        00:18:14 mongod -f /backups/data/cluster/shard1/rs0.conf

root     21678     1  1 Nov23 ?        00:18:14 mongod -f /backups/data/cluster/shard1/rs1.conf

root     21744     1  1 Nov23 ?        00:18:03 mongod -f /backups/data/cluster/shard1/rs2.conf

root     21827     1  0 Nov23 ?        00:05:05 mongod -f /backups/data/cluster/mongoc/mongoc.conf

root     21844     1  0 Nov23 ?        00:05:26 mongos -f /backups/data/cluster/mongos/mongos.conf

root     22075     1  0 Nov23 ?        00:11:31 mongod -f /backups/data/cluster/shard2/rs0.conf

root     22096     1  0 Nov23 ?        00:10:25 mongod -f /backups/data/cluster/shard2/rs1.conf

root     22117     1  0 Nov23 ?        00:10:26 mongod -f /backups/data/cluster/shard2/rs2.conf

root     29699 27585 78 13:24 pts/0    00:06:03 mongo

root     29882 29287  0 13:30 pts/2    00:00:00 mongo localhost:47020

root     29951 29515  0 13:32 pts/3    00:00:00 grep mongo

### Kill the process ID 21695 

kill -9 21695

##### Verify the logs for rs0 node 1, 2 (secondary nodes)

As you see below , the Node 0 got killed, the Node 1 with in the replicaset recognises it and the heartbeat failed(in red), then an election mechanism has been happened (in green) and a New primary has been selected and the Node 1 (secondary node) is selected as primary.

tail -20f /backups/data/cluster/shard1/rs0/1/logs/rs0.log

2016-11-24T13:23:30.169+1100 I NETWORK  [SyncSourceFeedback] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:47018]

2016-11-24T13:23:30.169+1100 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update: network error while attempting to run command 'replSetUpdatePosition' on host 'localhost:47018'

2016-11-24T13:23:30.169+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

2016-11-24T13:23:30.169+1100 I REPL     [SyncSourceFeedback] updateUpstream failed: HostUnreachable network error while attempting to run command 'replSetUpdatePosition' on host 'localhost:47018' , will retry

...

2016-11-24T13:23:35.171+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

2016-11-24T13:23:40.114+1100 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms

2016-11-24T13:23:40.114+1100 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected

2016-11-24T13:23:40.114+1100 I REPL     [ReplicationExecutor] dry election run succeeded, running for election

2016-11-24T13:23:40.115+1100 I REPL     [ReplicationExecutor] VoteRequester: Got failed response from localhost:47018: HostUnreachable Connection refused

2016-11-24T13:23:40.115+1100 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 3

2016-11-24T13:23:40.115+1100 I REPL     [ReplicationExecutor] transition to PRIMARY

2016-11-24T13:23:40.115+1100 W REPL     [ReplicationExecutor] The liveness timeout does not match callback handle, so not resetting it.

....

2016-11-24T13:23:40.175+1100 I REPL     [rsSync] transition to primary complete; database writes are now permitted

2016-11-24T13:23:40.219+1100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:44171 #7 (4 connections now open)

2016-11-24T13:23:40.219+1100 I SHARDING [conn7] remote client 127.0.0.1:44171 initialized this host as shard rs0

2016-11-24T13:23:40.219+1100 I SHARDING [ShardingState initialization] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers

2016-11-24T13:23:40.219+1100 I SHARDING [ShardingState initialization] Updating config server connection string to: localhost:37019

2016-11-24T13:23:40.221+1100 I NETWORK  [ShardingState initialization] Starting new replica set monitor for rs0/localhost:47018,localhost:47019,localhost:47020

2016-11-24T13:23:40.221+1100 I NETWORK  [ReplicaSetMonitorWatcher] starting

### Check the Replicaset Node 2 log

Initially a Socket Exception with Node 0, for port 47018 and connection refused then after a while the Secondary Node recognises that current Primary is Node 1 (in green) and trying to sync. And still trying to reach the node 0 (old primary)

tail -20f /backups/data/cluster/shard1/rs0/2/logs/rs0.log

2016-11-24T13:23:30.168+1100 I NETWORK  [SyncSourceFeedback] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [127.0.0.1:47018]

2016-11-24T13:23:30.168+1100 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update: network error while attempting to run command 'replSetUpdatePosition' on host 'localhost:47018'

2016-11-24T13:23:30.168+1100 I NETWORK  [conn3] end connection 127.0.0.1:22726 (5 connections now open)

2016-11-24T13:23:30.168+1100 I REPL     [SyncSourceFeedback] updateUpstream failed: HostUnreachable network error while attempting to run command 'replSetUpdatePosition' on host 'localhost:47018' , will retry

2016-11-24T13:23:30.169+1100 I NETWORK  [conn6] end connection 127.0.0.1:22754 (3 connections now open)

2016-11-24T13:23:30.171+1100 I REPL     [ReplicationExecutor] could not find member to sync from

2016-11-24T13:23:30.171+1100 W REPL     [ReplicationExecutor] The liveness timeout does not match callback handle, so not resetting it.

2016-11-24T13:23:30.171+1100 I ASIO     [ReplicationExecutor] dropping unhealthy pooled connection to localhost:47018

2016-11-24T13:23:30.171+1100 I ASIO     [ReplicationExecutor] after drop, pool was empty, going to spawn some connections

2016-11-24T13:23:30.171+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

2016-11-24T13:23:30.171+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

2016-11-24T13:23:30.172+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

....

2016-11-24T13:23:40.172+1100 I REPL     [ReplicationExecutor] Member localhost:47019 is now in state PRIMARY

2016-11-24T13:23:40.174+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

....

2016-11-24T13:23:45.173+1100 I REPL     [ReplicationExecutor] syncing from: localhost:47019

2016-11-24T13:23:45.174+1100 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to localhost:47019

2016-11-24T13:23:45.174+1100 I ASIO     [NetworkInterfaceASIO-BGSync-0] Successfully connected to localhost:47019

...

2016-11-24T13:23:45.175+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

### Check the Status , Connect to Node 2 (port 47019) specifically and check replicaset status, output is edited for brevity

As you see in Below, I marked red color for the Node 0, which states it not reachable and Node 1 which becomes current primary and Node 2 still as secondary.

mongo localhost:47019

MongoDB Enterprise rs0:SECONDARY> rs.status()

{

        "set" : "rs0",

         ...

        "syncingTo" : "localhost:47019",

        "heartbeatIntervalMillis" : NumberLong(2000),

        "members" : [

                {

                        "_id" : 1,

                        "name" : "localhost:47018",

                        "health" : 0,

                        "state" : 8,

                        "stateStr" : "(not reachable/healthy)",

                        "uptime" : 0,

                       .....

                        "lastHeartbeatMessage" : "Connection refused",

                        "configVersion" : -1

                },

                {

                        "_id" : 2,

                        "name" : "localhost:47019",

                     ...

                        "stateStr" : "PRIMARY",

                        "uptime" : 92823,

                        ....                    

                        "electionTime" : Timestamp(0, 0),

                        "electionDate" : ISODate("1970-01-01T00:00:00Z"),

                        "configVersion" : 1

                },

                {

                        "_id" : 3,

                        "name" : "localhost:47020",

                        ...

                        "stateStr" : "SECONDARY",

                        "uptime" : 92824,

                        ...,

                        "syncingTo" : "localhost:47019",

                        "configVersion" : 1,

                        "self" : true

                }

        ],

        "ok" : 1

}

MongoDB Enterprise rs0:SECONDARY>

### Lets bring up the Node 0

mongod -f //backups/data/cluster/shard1/rs0/rs0.conf

about to fork child process, waiting until server is ready for connections.

forked process: 30107

child process started successfully, parent exiting

### check the logs of node 1,2 of replicaset rs0

As per logs, it says connection accepted and Node 0 i.e 47018 becomes online and joins as Secondary and Node 1 still primary

tail -20f /backups/data/cluster/shard1/rs0/1/logs/rs0.log

2016-11-24T13:40:33.628+1100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50596 #15 (8 connections now open)

2016-11-24T13:40:33.628+1100 I NETWORK  [conn15] end connection 127.0.0.1:50596 (7 connections now open)

2016-11-24T13:40:33.630+1100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50600 #16 (8 connections now open)

2016-11-24T13:40:34.667+1100 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to localhost:47018

2016-11-24T13:40:34.668+1100 I REPL     [ReplicationExecutor] Member localhost:47018 is now in state SECONDARY

 

tail -20f /backups/data/cluster/shard1/rs0/2/logs/rs0.log

2016-11-24T13:40:31.730+1100 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:47018; HostUnreachable Connection refused

2016-11-24T13:40:33.628+1100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:32435 #10 (5 connections now open)

2016-11-24T13:40:33.628+1100 I NETWORK  [conn10] end connection 127.0.0.1:32435 (4 connections now open)

2016-11-24T13:40:33.731+1100 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to localhost:47018

2016-11-24T13:40:33.731+1100 I REPL     [ReplicationExecutor] Member localhost:47018 is now in state SECONDARY

2016-11-24T13:40:39.631+1100 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:32445 #12 (6 connections now open)

 

### check rs.status by connecting to any of the replicaset node directly, now the 47018 the primary node showing as SECONDARY

root@wash-i-03aaefdf-restore /backups/data/cluster/shard1/rs0/1/logs $ mongo localhost:47018

MongoDB shell version: 3.2.1

connecting to: localhost:47018/test

MongoDB Enterprise rs0:SECONDARY> rs.status()

{

        "set" : "rs0",

        "date" : ISODate("2016-11-24T02:43:18.815Z"),

        "myState" : 2,

        "term" : NumberLong(3),

        "syncingTo" : "localhost:47020",

        "heartbeatIntervalMillis" : NumberLong(2000),

        "members" : [

                {

                        "_id" : 1,

                        "name" : "localhost:47018",

                        "health" : 1,

                        "state" : 2,

                        "stateStr" : "SECONDARY",

                        "uptime" : 195,

                      ......

                {

                        "_id" : 2,

                        "name" : "localhost:47019",

                        "health" : 1,

                        "state" : 1,

                        "stateStr" : "PRIMARY",

                    .......

                },

                {

                        "_id" : 3,

                        "name" : "localhost:47020",

                        "health" : 1,

                        "state" : 2,

                        "stateStr" : "SECONDARY",

                        "uptime" : 165,

                       .....

        "ok" : 1

}

MongoDB Enterprise rs0:SECONDARY>

Next Post: Case 2, What if an entire shard is down with all nodes in replicaset.

-Thanks

Geek DBA