Running out of space on DW6 and DW7
Instead of master we will have micro cluster, as long we have 2 copies of each table then we can have all Micro cluster. -- This is still open as DBA team has to validate if all queries can be supported. If proved, what SW do we need to build?
To satisfy all queries with limited impact.
A brand new dataset comes to warehouse, where does it go?
Now customer need to know where is that table located to perform JOINS. JOINS outside VIRT will be a concern.
SW needs to make the decision automatically where the table goes.
New query is created, say a VIRT wants to join to a BOOKER table. SW needs to automatically do that or communicate to user whether it can be done? SW will be an extension of Schema authority, make sure API is available to VIRT Dashboard.
Schema authority is what ETLM uses to route job. Catalog of what table is in each cluster. SA is at a level of table, schema. Not at column level.
We have 20% of Non-Query tables O-Tables.
Our commitment to customer is that all tables will have 2 copies that's it.
Idea to split O, W and D tables based on what customer can or cannot query.
Do we need a master copy? Data Analysis Can a SW be made to handle this?
On demand create a Micro-Cluster is an idea utilizing EC2\S3. What are the top 6 tables without which Schema Authority can work.
* D-Daily_SELECTION_EFFICIENCY * D-PRODUCT_ADS_HITS * WEBLAB-SESSIONS * O-Transits * O-Sessions * O-WMA-HITS
We can split WebLab (1-100 in one cluster...) since its growing pretty fast. We can split any…