Subscribe to Posts by Email

Subscriber Count

    696

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

MongoDB for Oracle DBA’s Part 8 : Understanding Data Distribution between Shards

With previous post, we have configured the two shards with each having replicaset rs0 and rs1.

In this post we will insert some data and observe how the data distribution is happening between shards.

To do the data distributed,

  • firstly the database must have the sharding enabled
  • secondly, the collection which data need to be distributed should have the shard key (shard key can be range, hash, tag based)

High Level Steps are,

  • connect to mongos
  • enable sharding on foo database
  • enable sharding on foo.testdata collection with hashed shard key
  • insert data into foo.testdata collection
  • check data distribution and replication

### Connect to Mongos

root@wash-i-03aaefdf-restore $ mongo localhost:27017

MongoDB shell version: 3.2.1

connecting to: localhost:27017/test

Server has startup warnings:

2016-11-23T11:43:50.348+1100 I CONTROL  [main] ** WARNING: You are running this process as the root user, which is not recommended.

2016-11-23T11:43:50.348+1100 I CONTROL  [main]

### Enable Sharding on foo database

MongoDB Enterprise mongos> sh.enableSharding("foo")

{ "ok" : 1 }

### Insert some sample data

MongoDB Enterprise mongos> for (var i = 1; i <= 100; i++) db.testData.insert( { x : i } )

### Enable sharding on collection (table) testData for column "x" with hashed mechanism

MongoDB Enterprise mongos> sh.shardCollection("foo.testData", { "x": "hashed" })

{ "collectionsharded" : "foo.testData", "ok" : 1 }

### Insert more data after enable sharding on collection

MongoDB Enterprise mongos> for (var i = 101; i <= 1000000; i++) db.testData.insert( { x : i } )

WriteResult({ "nInserted" : 1 })

### Check the status of sharding 

MongoDB Enterprise mongos> sh.status()

--- Sharding Status ---

  sharding version: {

        "_id" : 1,

        "minCompatibleVersion" : 5,

        "currentVersion" : 6,

        "clusterId" : ObjectId("5834e646a2c25f3b06d65226")

}

  shards:

        {  "_id" : "rs0",  "host" : "rs0/localhost:47018,localhost:47019,localhost:47020" }

        {  "_id" : "rs1",  "host" : "rs1/localhost:57018,localhost:57019,localhost:57020" }

  active mongoses:

        "3.2.1" : 1

  balancer:

        Currently enabled:  yes

        Currently running:  no

        Failed balancer rounds in last 5 attempts:  0

        Migration Results for the last 24 hours:

                1 : Success

  databases:

        {  "_id" : "foo",  "primary" : "rs0",  "partitioned" : true }

                foo.testData

                        shard key: { "x" : "hashed" }    --> collection shard key

                        unique: false

                        balancing: true

                        chunks:

                                rs0     2           --> Two chunk distributed to rs0 replicaset

                                rs1     2           --> two chunk distributed to rs1 replicaset

                        { "x" : { "$minKey" : 1 } } -->> { "x" : NumberLong("-4611686018427387902") } on : rs0 Timestamp(2, 2)

                        { "x" : NumberLong("-4611686018427387902") } -->> { "x" : NumberLong(0) } on : rs0 Timestamp(2, 3)

                        { "x" : NumberLong(0) } -->> { "x" : NumberLong("4611686018427387902") } on : rs1 Timestamp(2, 4)

                        { "x" : NumberLong("4611686018427387902") } -->> { "x" : { "$maxKey" : 1 } } on : rs1 Timestamp(2, 5)

        {  "_id" : "test",  "primary" : "rs0",  "partitioned" : false }

### Check the data distribution for collection 

MongoDB Enterprise mongos> db.testData.getShardDistribution()

Shard rs0 at rs0/localhost:47018,localhost:47019,localhost:47020

 data : 22.32MiB docs : 487794 chunks : 2

 estimated data per chunk : 7.44MiB

 estimated docs per chunk : 162598

Shard rs1 at rs1/localhost:57018,localhost:57019,localhost:57020

 data : 2.86MiB docs : 62648 chunks : 2

 estimated data per chunk : 978KiB

 estimated docs per chunk : 20882

Totals

 data : 25.19MiB docs : 550442 chunks : 4

 Shard rs0 contains 88.61% data, 88.61% docs in cluster, avg obj size on shard : 48B

 

 Shard rs1 contains 11.38% data, 11.38% docs in cluster, avg obj size on shard : 48B

Note: the uneven distribution of data is possible until the chunks reach according to specification.

Comments are closed.