diff --git a/02_activities/assignments/Assignment2.md b/02_activities/assignments/Assignment2.md index a95a027fd..f4861603b 100644 --- a/02_activities/assignments/Assignment2.md +++ b/02_activities/assignments/Assignment2.md @@ -54,7 +54,49 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +To store customer addresses, we propose two different architectures: + +Type 1: Overwriting Changes + +A simple structure where the latest address replaces the previous one: + +Table: CUSTOMER_ADDRESS + +customer_id (Primary Key, Foreign Key from Customer) + +address + +city + +state + +zip + +In this model, whenever a customer updates their address, the old data is overwritten. + +Type 2: Retaining History + +A more complex structure that maintains historical address changes: + +Table: CUSTOMER_ADDRESS_HISTORY + +customer_id (Foreign Key from Customer) + +address + +city + +state + +zip + +start_date + +end_date + +With this approach, whenever a customer changes their address, a new record is created with the start_date, and the previous record is updated with an end_date, preserving history. + +Type 1 is best when historical data is not needed, whereas Type 2 is essential when tracking address changes over time. ``` *** @@ -182,5 +224,36 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +Section 4: Ethics in AI and Data Processing + +Ethical Issues in "Neural Nets are Just People All the Way Down" + +The article by Vicki Boykis explores the ethical complexities surrounding AI, specifically Large Language Models (LLMs). Key ethical concerns include: + +Bias in AI Models + +AI systems inherit biases from their training data, which reflects societal prejudices. + +This perpetuates discrimination in automated decision-making. + +Labor and Automation + +LLMs rely on vast amounts of data labeled by underpaid human workers. + +The ethical issue of exploiting global labor for AI development raises concerns. + +Challenges in Moderating AI-Generated Content + +AI-generated content can be harmful or misleading. + +There is no perfect moderation system, as AI models lack human context and ethics. + +AI in Society & Ethical Dilemmas + +The rapid growth of LLMs creates a monopoly where only a few corporations control AI development. + +Ethical concerns arise about transparency, accessibility, and misinformation. + +Conclusion +While AI provides immense benefits, its ethical implications cannot be ignored. To mitigate bias, labor exploitation, and misinformation, there must be continuous oversight, regulation, and a commitment to ethical AI development. AI is ultimately shaped by human values, and ensuring fairness and accountability remains a shared responsibility ``` diff --git a/02_activities/assignments/BOOKSTORE ERD.jpeg b/02_activities/assignments/BOOKSTORE ERD.jpeg new file mode 100644 index 000000000..1c6dce9b0 Binary files /dev/null and b/02_activities/assignments/BOOKSTORE ERD.jpeg differ diff --git a/02_activities/assignments/assignment2.sql b/02_activities/assignments/assignment2.sql index 5ad40748a..afda2e904 100644 --- a/02_activities/assignments/assignment2.sql +++ b/02_activities/assignments/assignment2.sql @@ -1,70 +1,87 @@ -/* ASSIGNMENT 2 */ +/* ASSIGNMENT 2 */ --- FESOBI OLUWAMUYIWA /* SECTION 2 */ --- COALESCE -/* 1. Our favourite manager wants a detailed long list of products, but is afraid of tables! -We tell them, no problem! We can produce a list with all of the appropriate details. - -Using the following syntax you create our super cool and not at all needy manager a list: - +-- COALESCE - Handle NULL values SELECT -product_name || ', ' || product_size|| ' (' || product_qty_type || ')' -FROM product - -But wait! The product table has some bad data (a few NULL values). -Find the NULLs and then using COALESCE, replace the NULL with a -blank for the first problem, and 'unit' for the second problem. - -HINT: keep the syntax the same, but edited the correct components with the string. -The `||` values concatenate the columns into strings. -Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. -All the other rows will remain the same.) */ - - - ---Windowed Functions -/* 1. Write a query that selects from the customer_purchases table and numbers each customer’s -visits to the farmer’s market (labeling each market date with a different number). -Each customer’s first visit is labeled 1, second visit is labeled 2, etc. - -You can either display all rows in the customer_purchases table, with the counter changing on -each new market date for each customer, or select only the unique market dates per customer -(without purchase details) and number those visits. -HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ - - - -/* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, -then write another query that uses this one as a subquery (or temp table) and filters the results to -only the customer’s most recent visit. */ - - - -/* 3. Using a COUNT() window function, include a value along with each row of the -customer_purchases table that indicates how many different times that customer has purchased that product_id. */ - - - --- String manipulations -/* 1. Some product names in the product table have descriptions like "Jar" or "Organic". -These are separated from the product name with a hyphen. -Create a column using SUBSTR (and a couple of other commands) that captures these, but is otherwise NULL. -Remove any trailing or leading whitespaces. Don't just use a case statement for each product! - -| product_name | description | -|----------------------------|-------------| -| Habanero Peppers - Organic | Organic | - -Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ - - - -/* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ + product_name || ', ' || COALESCE(product_size, '') || ' (' || COALESCE(product_qty_type, 'unit') || ')' +FROM product; +--Window function +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date ASC) AS visit_number +FROM customer_purchases; +--Reversing the numbering +SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number +FROM customer_purchases; + +---Filtering only the most recent visit for each customer: + +WITH RankedVisits AS ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number + FROM customer_purchases +) +SELECT customer_id, market_date +FROM RankedVisits +WHERE visit_number = 1; + + +--Count window function +WITH RankedVisits AS ( + SELECT + customer_id, + market_date, + ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY market_date DESC) AS visit_number + FROM customer_purchases +) +SELECT customer_id, market_date +FROM RankedVisits +WHERE visit_number = 1; + +-- Count Window Function - Number of times a customer has purchased a product +SELECT + customer_id, + product_id, + COUNT(*) OVER (PARTITION BY customer_id, product_id) AS purchase_count +FROM customer_purchases; --- UNION -/* 1. Using a UNION, write a query that displays the market dates with the highest and lowest total sales. +--String Manipulation +SELECT + product_name, + TRIM(SUBSTR(product_name, INSTR(product_name, '-') + 1)) AS description +FROM product +WHERE INSTR(product_name, '-') > 0; + +--UNION - Market dates with highest and lowest total sales + +WITH SalesData AS ( + SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) AS total_sales + FROM customer_purchases + GROUP BY market_date +), +RankedSales AS ( + SELECT + market_date, + total_sales, + RANK() OVER (ORDER BY total_sales DESC) AS highest_rank, + RANK() OVER (ORDER BY total_sales ASC) AS lowest_rank + FROM SalesData +) +SELECT market_date, total_sales, 'Highest Sales' AS category +FROM RankedSales WHERE highest_rank = 1 +UNION +SELECT market_date, total_sales, 'Lowest Sales' AS category +FROM RankedSales WHERE lowest_rank = 1; HINT: There are a possibly a few ways to do this query, but if you're struggling, try the following: 1) Create a CTE/Temp Table to find sales values grouped dates; @@ -78,56 +95,44 @@ with a UNION binding them. */ /* SECTION 3 */ --- Cross Join -/*1. Suppose every vendor in the `vendor_inventory` table had 5 of each of their products to sell to **every** -customer on record. How much money would each vendor make per product? -Show this by vendor_name and product name, rather than using the IDs. - -HINT: Be sure you select only relevant columns and rows. -Remember, CROSS JOIN will explode your table rows, so CROSS JOIN should likely be a subquery. -Think a bit about the row counts: how many distinct vendors, product names are there (x)? -How many customers are there (y). -Before your final group by you should have the product of those two queries (x*y). */ - - - --- INSERT -/*1. Create a new table "product_units". -This table will contain only products where the `product_qty_type = 'unit'`. -It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. -Name the timestamp column `snapshot_timestamp`. */ - - - -/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). -This can be any product you desire (e.g. add another record for Apple Pie). */ - - +--CROSS JOIN - Vendor revenue per product +SELECT + v.vendor_name, + p.product_name, + 5 * vi.original_price AS revenue_per_product +FROM vendor v +CROSS JOIN ( + SELECT DISTINCT product_id, original_price FROM vendor_inventory +) vi +JOIN product p ON vi.product_id = p.product_id; --- DELETE -/* 1. Delete the older record for the whatever product you added. -HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +---INSERT - Create a product_units table +--CREATE TABLE product_units AS +--SELECT *, CURRENT_TIMESTAMP AS snapshot_timestamp +--FROM product +--WHERE product_qty_type = 'unit'; +---Insert a new record into product_units: --- UPDATE -/* 1.We want to add the current_quantity to the product_units table. -First, add a new column, current_quantity to the table using the following syntax. +INSERT INTO product_units (product_id, product_name, product_size, product_category_id, product_qty_type, snapshot_timestamp) +VALUES (999, 'Apple Pie', 'Medium', 3, 'unit', CURRENT_TIMESTAMP); -ALTER TABLE product_units -ADD current_quantity INT; +--DELETE - Remove older record +DELETE FROM product_units +WHERE product_id = 999 +AND snapshot_timestamp = (SELECT MIN(snapshot_timestamp) FROM product_units WHERE product_id = 999); -Then, using UPDATE, change the current_quantity equal to the last quantity value from the vendor_inventory details. -HINT: This one is pretty hard. -First, determine how to get the "last" quantity per product. -Second, coalesce null values to 0 (if you don't have null values, figure out how to rearrange your query so you do.) -Third, SET current_quantity = (...your select statement...), remembering that WHERE can only accommodate one column. -Finally, make sure you have a WHERE statement to update the right row, - you'll need to use product_units.product_id to refer to the correct row within the product_units table. -When you have all of these components, you can run the update statement. */ +--UPDATE - Add current_quantity column and update it +ALTER TABLE product_units ADD COLUMN current_quantity INT; +UPDATE product_units +SET current_quantity = COALESCE( + (SELECT quantity FROM vendor_inventory vi WHERE vi.product_id = product_units.product_id ORDER BY market_date DESC LIMIT 1), + 0 +);