Google Cloud NoSQL Database Services - Datastore, Firestore and Bigtable

20 Jun 2023  Amiya pattanaik  6 mins read.

What is Google Cloud NoSQL?

NoSQL databases are often better suited for handling large amounts of data that are not structured in a traditional way. NoSQL databases can also be a good choice for applications that need to be able to scale quickly and easily, as they can be horizontally scaled much more easily than relational databases. Google’s cloud platform (GCP) offers a wide variety of database services. Of these, its NoSQL database services are unique in their ability to rapidly process very large, dynamic datasets with no fixed schema.

This post describes GCP’s main NoSQL managed database services like Datastore, Firestore and Bigtable and their key features, and important best practices.

Cloud Datastore

Datastore is a highly-scalable NoSQL database for your applications. Datastore automatically handles sharding and replication, providing you with a highly available and durable database that scales automatically to handle your applications’ load. Datastore provides a myriad of capabilities such as ACID transactions, SQL-like queries, indexes, and much more.  It is a NoSQL database, but it is optimized for storing small entities and may not be suitable for storing large amounts of data or complex queries.

  • Best Practices
    1. Use batch operations for your reads, writes, and deletes instead of single operations. Batch operations are more efficient because they perform multiple operations with the same overhead as a single operation.

    2. Roll back failed transactions If a transaction fails, ensure you try to rollback the transaction. The rollback minimizes retry latency for a different request contending for the same resource(s) in a transaction. Note that a rollback itself might fail, so the rollback should be a best-effort attempt only.

    3. Use asynchronous calls where available instead of synchronous calls. Asynchronous calls minimize latency impact.

    4. Entities Do not write to an entity group more than once per second, to avoid timeouts for strongly consistent reads, which will negatively affect performance for your application. If you are using batch writes or transactions, these count as one write operation.

    5. Properties Always use UTF-8 characters for properties of type string. A non-UTF-8 character in a property of type string could interfere with queries. If you need to save data with non-UTF-8 characters, use a byte string.

Cloud Firestore

Firestore is a fully managed NoSQL document database that is designed to store and manage structured and semi-structured data. It provides a highly available and scalable storage system with automatic sharding and load balancing, making it a good choice for storing large amounts of data. Firestore also provides powerful querying capabilities, with support for SQL-like syntax and complex queries on both structured and unstructured data. It also provides a flexible data model, making it easy to store and retrieve data in a variety of formats. In addition, Firestore is easily integratable with other Google Cloud services such as App Engine, Cloud Functions, and BigQuery, making it a good choice for integrating with other services as required by the customer.

  • Best Practices
    1. Database Location When you create your database instance, select the database location closest to your users and compute resources. To maximize the availability and durability of your application use multi-regional location for improved availability, deploys the database in at least two Google Cloud regions. use regional location for lower costs, for lower write latency if your application is sensitive to latency, or for co-location with other GCP resources.

    2. Indexes Avoid using too many indexes. An excessive number of indexes can increase write latency and increases storage costs for index entries.

    3. Read and write operations Use asynchronous calls where available instead of synchronous calls. Asynchronous calls minimize latency impact.

Cloud Bigtable

Bigtable is a fully managed, scalable NoSQL database service for large analytical and operational workloads and lets you store large amounts of single-keyed data with very low latency. It supports automatic data partitioning to distribute the load evenly. It is designed for handling large volumes of structured data with low latency to support fast queries, real-time analysis, additionally, Cloud Bigtable has built-in redundancy and backup features to ensure that data is easily recoverable in case of any failures.

BigTable is an ideal solution to ingest and analyze large volumes of time series data from sensors in real-time, matching the high speeds of IoT data to track normal and abnormal behavior. It can store large amounts of structured and unstructured data and provide low-latency access to that data.

It is designed for handling large volumes of structured data with low latency, but it may not be the most suitable for unstructured data or complex queries. It also does not support SQL-like syntax. Finally, Cloud Bigtable is cost-effective compared to other NoSQL database services.

  • Best Practices
    1. Trade-off Between High Throughput and Low Latency When planning Cloud Bigtable capacity, consider your goals—you can optimize for throughput and reduce latency, or vice versa. Cloud Bigtable offers optimal latency when CPU load is under 70%, or preferably exactly 50%. If latency is less important, you can load CPUs to higher than 70%, to get higher throughput for the same number of cluster nodes.

    2. Schema design Bigtable is a key/value store, not a relational store. It does not support joins, and transactions are supported only within a single row.

    3. Column families Put related columns in the same column family. When a row contains multiple values that are related to one another, it’s a good practice to group the columns that contain those values in the same column family. Group data as closely as you can to avoid needing to design complex filters and so you get just the information that you need, but no more, in your most frequent read requests. Create up to about 100 column families per table. Creating more than 100 column families may cause performance degradation.

Conclusion

If you’re interested in using a NoSQL database on Google Cloud Platform, I recommend that you start by familiarizing yourself with the different options that are available. Once you’ve done that, you can start exploring the GCP documentation and tutorials to see which option is best for your needs.

All done! I hope this provides you with a better understanding of Google’s NoSQL and its potential use cases. Please visit my other cloud computing related writings on this website. Enjoy your reading!

We encourage our readers to treat each other respectfully and constructively. Thank you for taking the time to read this blog post to the end. We look forward to your contributions. Let’s make something great together! What do you think? Please vote and post your comments.

Amiya Pattanaik
Amiya Pattanaik

Amiya is a Product Engineering Director focus on Product Development, Quality Engineering & User Experience. He writes his experiences here.