Hello I recently graduated from college as a computer science major. I would really like to help and contribute to this project. I have checked out the github and the contribute.md, downloaded all the dependencies, cloned the repo with go, built everything, signed the agreement, and read the documentation. Unfortunately I have never worked on an opensource project before though, and I am lost on what to do next. Can someone else who started from a similar position as mine help lead me in the right direction so I can be of use please? Any documentation or reading that would help in the development of this project or any advice would be welcome. I would really like to be a productive member of this team.
First of all, welcome! Really excited to have you contribute.
The first step is to find an issue you want to tackle. Issues tagged with help wanted are good candidates, though not everything we want help with is tagged as “help wanted.” Find something that interests you and doesn’t look too challenging. On your first PR, just navigating our codebase and build system can be challenging enough.
If you’re having trouble finding a good first issue, come join our Gitter chat room during business hours (roughly 10am-6pm EST) and one of us can point you in the right direction!
We also have a guide to submitting your first PR to CockroachDB that you’ll probably find helpful! I’m afraid that won’t help you find a starter project, though; that guide is written for new engineering hires at Cockroach Labs who have an assigned starter project.
I appreciate the quick reply. I’ll read the guide you sent me and be sure to pop into the Gitter chat room tomorrow. Thank you very much.
Definitely let us know if that doesn’t appeal to you—or if it does, let us know if you need some more advice on how to get started. Cheers!
It does appeal to me. I’m reading the first-pr.md file right now. So let me see if I have the architecture right, it looks like CockroadDB is essentially a 5 node distributed cluster where each node allows for redundancy of the others in the chance one of them goes down. Does it also allow for processing on each node similar to a hadoop cluster?
Not just five nodes, but as many as you want! The default replication factor is three (every piece of data is written to five separate nodes), but that’s configurable. Have you taken a look at the design summary in the README and the full design document?
Processing is automatically distributed between nodes—or, it is in theory at least. The project to distribute this processing is called DistSQL (“distributed SQL”). DistSQL doesn’t support all SQL queries yet, so some queries can only harness the processing power of one node. Improving DistSQL scope and performance is one of our top priorities, though.
I was reading the design.md file I’ll check out the summary here in a little bit. DistSQL sounds awesome. I have programmed using openmpi, hadoop, and spark (with python and scala) which is why I asked about Hadoop (I noticed the design.md was discussing mapreduce).
So I’m looking into the Drop User issue. I am noticing that everything seems to be written in GO. I am not familiar with go I’m not sure how useful I am going to be right off the bat.
That’s alright! I think you’ll find Go is pretty easy to pick up.
In my opinion, Go is like C or C++ with all the footguns removed, or Java with the verbosity removed. So if you’ve ever programmed in C/C++/Java, or a C/C++/Java-like language, you’ll likely feel comfortable in Go quite quickly.