Postgres community bonding

The first phase of GSoC is community bonding, where the mentee gets to know the community of his organization.

For Postgres, I started searching for the communication channels used by contributors, I found out about Postgres mailing lists which unfortuanetly i hadn’t involved into it that much till now, but i also discovered the PostgreSQL Hacking Discord channel and the Hacking Workshops they host every month to study and discuss PostgreSQL topics, which has been a great learning resource for me since then.

Also I started looking for PostgreSQL-related projects to contribute to and found out that there are not that many starter issues in the main PostgreSQL engine itself, but the good part is that there were a lot of other projects that integrate with Postgres and build on top of it, which had a good amount of marked starter issues to help aspiring contributors get started. One of them was pg_duckdb the official PostgreSQL extension for DuckDB, where I opened a PR adding SQLSmith CI tests.

Coding Period

At the start of the coding period, I dived more into pgwatch and pgwatch_rpc_server, reading as much from their code and testing them with various options and workloads to enhance my understanding of their workflow.

That led me to discover some issues and suggest optimizations.

Issues

Pull Requests

Then I started working on my first milestone, adding TLS encryption support to pgwatch RPC sink.

I spent most of the time searching (TLS, Golang TLS support, RPC & TLS integration, etc.), Also the mentor pointed out to me that they had worked on a similar idea in another project, which helped me a lot while reading its code.

Pull Requests

  1. Add TLS encryption to the Pgwatch RPC channel
  2. Update the Pgwatch RPC Server to use a TLS listener.

And voila, pgwatch v3.6 supports optional TLS encryption for data sent over the RPC channel

Example Usage

# pgwatch rpc server
export RPC_SERVER_CERT=server.crt
export RPC_SERVER_KEY=server.key
go run ./cmd/[receiver] [flags]

# pgwatch
go run ./cmd/pgwatch [flags] --sink=rpc://[host:ip]?sslrootca=ca.crt 

I continued reading pgwatch codebase, and I was able to dicover and resolve more issues.

Pull Requests

After that, I was supposed to start working on adding authentication support to the current RPC sink implementation, and that was when I discovered that the currently used net/rpc golang package is very basic and limited, although there were a couple of ways to implement our desired Basic Auth functionality with it, I saw that it will be better to migrate to a more robust RPC framework that is better maintained and has more functionalities.

gRPC, being the most famous RPC framework, was the first solution that came to my mind, so I started learning about it and thinking about its suitability for our use case, and its high performance, language-neutral nature, rich feature set, and large ecosystem made me decide to go ahead with it.

I started by presenting this idea to my mentors, pointing out its benefits and optimizations, and I was asked to implement a local draft PR so they can see it in action and estimate the amount of the change it would cause.

Pull Requests

They liked it, and the gRPC migration got approved, and our deliverables timeline got updated accordingly.

Now I rewrote the RPC sink for pgwatch to be a gRPC client, and all sinks in pgwatch_rpc_server got migrated to gRPC, and finally, I re-added TLS support on top of gRPC and used interceptors to add Basic Auth support.

Pull Requests

These features should be available soon under the pgwatch v4 release.

Example Usage

# pgwatch rpc server
export PGWATCH_RPC_SERVER_USERNAME="user"
export PGWATCH_RPC_SERVER_PASSWORD="pwd"

export PGWATCH_RPC_SERVER_CERT="/path/to/server.crt"
export PGWATCH_RPC_SERVER_KEY="/path/to/server.key"

go run ./cmd/[receiver] [flags]

# pgwatch v4
go run ./cmd/pgwatch [flags] --sink=grpc://user:pwd@[host:ip]?sslrootca=ca.crt

At the same time, new issues irrelevant to my pgwatch RPC work got opened, and my experience with reading the codebase and fixing issues in it made me able to reproduce and fix some of them :).

Pull Requests

Also, I started reading pgwatch docs, and figured out some issues and submitted updated suggestions for them That got accepted and merged.

Pull Requests

Then it was time to start working on improving developer experience for building custom sinks, I researched and tested a lot of ideas for this, and finally found out that the simplest and most effective approach is to provide a mini sinks development tutorial that explains the different functions from the pgwatch gRPC API, their parameters, return values, and how users can make use of the predefined server logic to easily develop their own custom sinks.

Pull Requests

Now with pgwatch RPC client & servers optimized and migrated to gRPC, support for authentication and TLS encryption, and docs for developing custom RPC sinks were added; the remaining milestone was to provide additional sink implementations.

I discussed the various available options with my mentors, and they wanted new sinks for a Full-text search engine and Apache Iceberg.

Also, I proposed adding a GCP pub/sub sink that publishes data to Google Cloud, then users can register subscribers (subs) to pull data from it, and hence be able to easily route the data to different data storage and analytics solutions, and its addition got approved.

Pull Requests

While working on the final apache iceberg sink, I faced a lot of difficulties due to the iceberg-go package being not mature enough, with relatively poor documentation (at that time), I discussed this with the mentors, and we saw this a good opportunity to show the extensibility of gRPC by developing the sink in another language that has a more powerful iceberg library, e.g., Python or Java.

Pull Requests

Usage Example

# ElasticSearch Sink
export ELASTIC_PASSWORD="es_password"
go run ./cmd/elasticsearch_receiver -port 1234 -addrs=<comma-sep-list-of-es-addrs> -user=<es-username> -ca-file=<es-ca-fle>

# Google Cloud pub/sub sink
go run ./cmd/gcp_pubsub_receiver --port 1235 --project-id <gcp-project-id>

# Apache Iceberg sink
export PYICEBERG_HOME="cmd/pyiceberg_receiver"
python3 cmd/pyiceberg_receiver -p 1236 -d <dir-path>

# pgwatch v4
go run ./cmd/pgwatch [FLAGS] --sink=grpc://localhost:1234 --sink=grpc://localhost:1235 --sink=grpc://localhost:1236 

End of Journey

This wraps up my GSoC journey at PostgreSQL, contributing to the pgwatch project.

It’s been a wonderful experience that has hugely improved me both personally and technically, and filled me with love for the community and interest in contributing to PostgreSQL projects.

Also, huge shout-out to my very supportive, responsive, and highly talented mentors Pavlo Golub and Akshat Jaimini for their help and guidance.