Companies are increasingly looking to use artificial intelligence-powered chatbots like OpenAI’s ChatGPT in their operations despite concerns of bias, confidentiality breaches, and a murky regulatory landscape.
Human resources departments in particular are exploring chatbots for use in drafting job descriptions, performance reviews, or as employee self-service tools. At the same time, some companies are limiting or outright banning their employees from using such tools.
Interest in large language models (LLMs), like ChatGPT, has accelerated since OpenAI made the tool publicly accessible in November. The company released GPT-4 on March 14, which drastically improves upon the chatbot’s accuracy and fixes other issues.
Other tech companies, like Google’s parent company Alphabet Inc., have rolled out their own chatbots as investments pour into the space.
The nascent technology presents a high risk, high reward dilemma for HR managers and company lawyers. Being too cautious could help avoid legal headaches, but waiting for regulatory clarity may leave them behind competitors.
“Get ahead of the generative AI adoption by HR, rather than dealing with the horse after it has already gotten out of the barn,” said Lance Elliot, who researches AI at Stanford University.
Chatbots could help HR departments more efficiently complete tasks like drafting job descriptions or responding to employee inquiries about benefits or company policies.
Jesse Meschuk, a senior adviser at consulting firm Exequity, said employers looking to incorporate this technology should identify “busy work” it could complete while freeing up HR staff for other tasks.
HR technology company Confirm recently rolled out a ChatGPT-powered product that uses the tool to draft performance reviews for managers based on input from peers and managers.
Joshua Merrill, CEO of Confirm, said there’s “a bit of a gold rush starting” where companies are trying to incorporate LLMs into their operations. But its purpose should remain assistive, with humans still producing the end product, Merrill said.
“Every customer we’ve talked to has felt strongly that that’s the role of the manager, the role of a human being and not a computer,” he said.
While testing an LLM’s ability to categorize employee wellness surveys, for example, Merrill found that the chatbot needed to be specifically directed to only consider the data from that survey. Without that instruction, which would’ve likely been intuitive for a human, the chatbot falsely reported that some employees felt they faced discrimination and harassment at work.
Mistakes like that can happen when the chatbot pulls data it was trained on that isn’t relevant to the task it’s given. Testing for these “hallucinations” or other issues is important to do before rolling out these products, HR tech experts say.
“These AI models are very impressive,” Merrill said. “They’re so impressive that sometimes they fool us into thinking that they’re better than they are. They’re not perfect.”
Employers and their HR departments have increasingly relied on AI to help make employment decisions, usually in the form of resume screeners or assessment tools.
New York City and the European Union have begun to crack down on the use of those tools, generally in the form of audit requirements.
Auditing standards for AI tools in general are still developing. But LLMs are even more difficult to audit because there are unbound possibilities for inputs, which means there are a wide array of cases to test for, according to Philip Dawson, head of AI policy at Armilla AI.
LLMs are often referred to as a “black box” because it can be too complex to determine how an algorithm reached its conclusion.
Right now companies aren’t required or incentivized to adopt certain best practices like being transparent about the data the models were trained on “other than for fear of some liability for discrimination as a downstream consequence of the use of the technology,” Dawson said.
Checking for bias in particular can be difficult because the universe of data chatbots are trained on is so immense that it could easily regurgitate stereotypes present in society.
An analysis by HR technology company Textio found that, when asked to draft performance feedback for hypothetical workers, ChatGPT often leaned on gender stereotypes to determine the pronouns of certain positions, referring to kindergarten teachers as “she” and mechanics as “he.”
When the input included the employee’s pronouns it typically added more criticism for women as well.
This scenario presents litigation risk, as performance reviews can often become fodder for discrimination suits.
“Companies like OpenAI are making a lot of effort to check that their systems are not biased, but ultimately these systems are large, black boxes that are by their fundamental nature opaque.” said Anton Korinek, who researches AI at the Brookings Institution.
But even with the uncertain level of liability risk, waiting too long could put a company’s HR department behind that of its competitors.
OpenAI predicts that approximately 80% of the US workforce could have at least 10% of their work tasks affected by the introduction of chatbots, according to a research paper it published March 17.
“If companies that focus on any type of cognitive work don’t think about how to incorporate these language models into their workflows, then they’re really going to be at a disadvantage,” Korinek said.
Meanwhile, nearly half of human resource leaders polled by consulting firm Gartner said they’re in the process of formulating guidance on employees’ use of AI-powered chatbots.
Some companies, including major US financial institutions like JPMorgan Chase & Co., have opted to ban the tool.
One of the biggest concerns employers have is that employees will input confidential or proprietary information into a chatbot, such as client data or source code. Data security service Cyberhaven released a report in February that found 3.1% of employees of their clients have put confidential company data into ChatGPT.
Chatbot makers generally state that there is no promise of privacy for any entered data, given that the tools build upon themselves as they are used in the field.
And bugs do happen: OpenAI CEO Sam Altman said March 22 that the company identified a bug that resulted in “a small percentage” of users being able to see the titles of other users’ conversation history.
Employees may have to be reminded through training that entering commercial information into tools like ChatGPT amounts to a violation of their confidentiality agreements, according to Karla Grossenbacher, a partner at Seyfarth Shaw LLP.
“I don’t think you specifically need to call out ChatGPT, but I think it would be worth reminding employees that putting it in ChatGPT is in fact a disclosure,” she said.
To contact the reporter on this story:
To contact the editors responsible for this story: Rebekah Mintzer at email@example.com;