I applied through a recruiter. The process took 2 weeks. I interviewed at Indium Software (Bengaluru) in Mar 2024
Interview
i went through 4 rounds including client round.
1st round Basic technical questions on sql,python,pyspark.
2nd round intermediate,hard level of sql and pyspark,data engineering.
3rd round managerial round asked about the processes that i have mentioned in my resume.
4th round this one is client round but,it doesnt seems like that, they gave me a hard sql probel to solve.
Interview questions [3]
Question 1
1.suppose that in a Table named Team contains 4 records, we need to schedule a match between each team vs opposite team
2.You have a table named "Students" with columns "StudentID", "Name", and "Score".
Write a SQL query to retrieve the top student based on their scores from each subject.
Include their names, scores and subject in the result.
----> PySpark <-----
1. Question: Given a list of sales transactions with product names and their corresponding sales amounts,
calculate the total sales for each product and store the results in a dictionary.
sales = [("Widget", 500), ("Gadget", 750), ("Widget", 1000), ("Doodad", 1200), ("Gadget", 600)]
product_sales_dict = {}
2. steps involved in executor memory memory allocation
3. narrow transformations vs broad transformations
Question 1:
we have stores table that contain columns (Store_id,store_nm,Product)
write query to Find the stores which either sell both tea and coffee or coffee and jam.
Question 2:
we have orders table with the columns (Orderid,Orderdt,custid,Endloc)
write a sql query to return the customers who place the order within 12 days
PySpark Question 3:
we have 2 csv files,one contains department data with dept_name,dept_id columns and second csv file contains students data with studentname,stud_id , deptid , total_marks_secured , year
we need to "return the top 5 stds for each dept for each year" in the output format deptname, studid , stud_name,year,total marks
4.What are facts and dimensions in a data warehouse ?
1.delta lake vs data lake difference
2.How CDC process implemented in your project
3.key difference between parquet and orc, which one to choose over the other.
4.have you worked with any reporting tools
5.have you ever consumed files
6.facts and dimensions
7.have you worked on scd, explain about how many types of SCD’s you have worked on.
8.what is normalization,why is it required.
9.what is structured and unstructured data, have you worked with them?
10.issues you have faced, how did you resolve them.