AI as a learning resource

Understanding predictions and Verifying responses

John Little

Duke University Libraries

Center for Data & Visualization Sciences

2024-12-03

AI and new challenges

Introduction of Generative LLMs (e.g., ChatGPT)
- Translation
- Synthesis
New challenges:
1. How to ask questions of a generative AI
2. How to frame questions to reflect goals

The Confidence v Competence Paradox

LLMs give confident responses
Responses are predictions, not necessarily correct answers
Incorrect predictions = “hallucinations”
Verification is crucial
Paradox: More knowledge leads to better evaluation of AI responses

Use case - Code generation

Data transformation
Data analysis
Iteration
Big Data
AI assistance / AI-paired coding

Goal

Create scatter plots, one for each home world

Case Study - Star Wars Dataset

Homeworld	Heights	Masses	Characters
Tatooine	172, 188, 178	77, 84, 120	Luke Skywalker, Anakin Skywalker, Owen Lars
Alderaan	150, 191	49, 85	Leia Organa, Bail Prestor Organa
Naboo	165, 196, 170	45, 66, 75	Padmé Amidala, Isadore, Palpatine
Coruscant	66, 188	17, 84	Yoda, Mace Windu

Example

Challenges in AI Assistance

AI can handle well some basic visualization and coding
Struggles with complex data shaping and iteration
This problem is easier when the user has knowledge in:
- Coding concepts
- Data shaping
- Visualization
- Iteration for large datasets

When it goes wrong

Word problems

Prompt: Inconsistent AI responses for “How long does it take to walk 10,000 steps on a treadmill at 1.2 MPH?”

Lesson 1: Importance of cross-verification
Lesson 2: Prediction is not the same as mathmatical truth

EEBO

No ground truth

Code

Translation done poorly

Due to insufficient background and/or prompting

AI-paired code generation

Some clear winners and losers in the big names. aka each LLM has it’s own evolving strengths, weaknesses, and tendencies.

These problem highlights the Competence v Confidence Paradox but are easily verifiable

When it goes right

and how right does it go?

Synethtic questions

Prompt: Compare student body and faculty diversity at Duke University with UNCG. Compare today with 1985.

Lesson 1: Different LLMs give different amounts of evidence for verification
Lesson 2: Differing amounts of ground truth will affect the prediction

Code translation

I have Python code, give it to me in R

Variations in code translations

R to Python
Python to R
SQL from natural language
javascript
HTML

Natural language

How can I use the phrase “Sticky Wicket” in German?

Translate Sticky Wicket to German
But how to verify (same as code problem)

Value in Reproducibility

Coding
- Do everything with code
- Including report generation
No Code
- Getting better all the time

Increasingly we are seeing computation environments with build-in AI-pairing

Solutions

and best practices

Problems and Solutions

GIGO (Garbage In, Garbage Out) still applies
Prompt engineering is a crucial skill
AI excels in translation tasks
Good for synthetic questions with possible validation
Less reliable for tasks without established ground truth

Best Practices

Using Broad-base LLMs:

ChatGPT
Microsoft Copilot
Claude.ai
Gemini.google.com
GitHub Copilot (for AI-paired coding)

Prompt Engineering

Identify role
Identify audience
Identify voice
Identify goals and problem
Use multiple steps
Verify

Conclusion

Embracing AI in data analysis

AI is a powerful tool, but requires careful use
The library offers crucial guidance
Continuous learning and adaptation are essential

Questions

How do you see these tools or techniques impacting research and research investment?
Do you have data transrormation, reshaping, or analysis tasks that could benefit from AI assistance?
In what ways do you think we can improve training and assistance for next generation LLMs?
What are some of the biggest challenges you see in the future of AI-paired coding?